Comparative Effectiveness of Levothyroxine, Desiccated Thyroid Extract, and Levothyroxine+Liothyronine in Hypothyroidism

Abstract Introduction Studies comparing levothyroxine (LT4) therapy with LT4 + liothyronine (LT3) or desiccated thyroid extract (DTE) did not detect consistent superiority of either treatment. Here, we investigated these therapies, focusing on the whole group of LT4-treated hypothyroid patients, while also exploring the most symptomatic patients. Methodology Prospective, randomized, double-blind, crossover study of 75 hypothyroid patients randomly allocated to 1 of 3 treatment arms, LT4, LT4 + LT3, and DTE, for 22 weeks. The primary outcomes were posttreatment scores on the 36-point thyroid symptom questionnaire (TSQ-36), 12-point quality of life general health questionnaire (GHQ-12), the Wechsler memory scale-version IV (VMS-IV), and the Beck Depression Inventory (BDI). Secondary endpoints included treatment preference, biochemical and metabolic parameters, etiology of hypothyroidism, and Thr92Ala-DIO2 gene polymorphism. Analyses were performed with a linear mixed model using subject as a random factor and group as a fixed effect. Results Serum TSH remained within reference range across all treatment arms. There were no differences for primary and secondary outcomes, except for a minor increase in heart rate caused by DTE. Treatment preference was not different and there were no interferences of the etiology of hypothyroidism or Thr92Ala-DIO2 gene polymorphism in the outcomes. Subgroup analyses of the 1/3 most symptomatic patients on LT4 revealed strong preference for treatment containing T3, which improved performance on TSQ-36, GHQ-12, BDI, and visual memory index (VMS-IV component). Conclusions As a group, outcomes were similar among hypothyroid patients taking DTE vs LT4 + T3 vs LT4. However, those patients that were most symptomatic on LT4 preferred and responded positively to therapy with LT4 + LT3 or DTE.

Hypothyroidism, or underactive thyroid function, is a common condition characterized by cognitive and metabolic impairments (1). It is estimated that 3.7% of the general US population is affected by hypothyroidism based on The National Health and Nutrition Examination Survey (1999)(2000)(2001)(2002) (2).
Various forms of thyroid extracts were originally developed in the 1880s, with desiccated thyroid extract (DTE) becoming the dominant form and commercially available in the early 1900s. During the 1970s, there was a switch to therapy with levothyroxine (LT4), which to this day is the standard of care, considered safe and effective (3). The switch occurred rapidly, but some anecdotal evidence emerged that a few patients did not feel well on LT4 and asked to return to DTE (4). The existence of residual symptoms in LT4-treated patients was first documented through a patient survey (5) and later through questionnaires that evaluated thyroid-specific symptoms and quality of life (QoL) (6), as well as sophisticated cognitive tests (7). Several clinical trials addressed the question of whether residual symptoms could be resolved through the use of combination therapy with LT4 and liothyronine (LT3), but evidence of consistent superiority of combination therapy was not obtained (8,9). A case could be made that, in some instances, patients prefer the combination approach (10,11). Indeed, in a recent randomized, double-blind, crossover study (12), we confirmed that QoL of hypothyroid patients was similarly improved with LT4 or DTE, but the latter was associated with modest weight loss (~4 pounds); nearly 50% of the study patients preferred treatment with DTE over LT4.
Most professional medical guidelines (3) recommend LT4 monotherapy "as the preparation of choice" for thyroid hormone replacement, and against the "routine use" of combination treatment with LT4 and LT3 or DTE because of uncertainty about long term efficacy and potential safety concerns. A recent consensus statement by the American, European, and British Thyroid Associations reasoned that previous clinical trials did not focus sufficiently on patients who were symptomatic while on therapy with LT4 (9).
The statement recommends that studies should be appropriately powered to study clinical outcomes, the potential interference of gene polymorphisms affecting thyroid hormone signaling, and include patients dissatisfied with their current therapy. Patient preference should be considered as a secondary outcome.
The current investigation involves the prospective crossover study of 75 hypothyroid patients while on 3 different forms of thyroid hormone replacement therapy. In addition to studying outcomes for each treatment arm, we also performed a subanalysis for those patients who were most symptomatic while on the LT4 treatment arm.

Study patients
The trial was widely advertised in various primary care clinics of our hospital. Inclusion criteria included beneficiaries of the military health care system of either sex and between the ages of 18 to 65 years who had been diagnosed with primary hypothyroidism and were on a stable dose of LT4 for at least 6 months. Other criteria included: (1) patient weight of 50 to 100 kg, (2) prestudy LT4 dose of 1.2 to 2.2 mcg/kg/d or a daily LT4 dose of 75 to 250 mcg, and (3) or equivalent dose in terms of combination therapy or DTE (Table 1).
Patients were excluded for the following reasons: pregnancy, plan for pregnancy in the next 12 months, cardiac disease (particularly coronary artery disease), chronic obstructive lung disease, malabsorption disorder, gastrointestinal surgeries, significant renal or liver dysfunction, seizure disorders, thyroid and nonthyroid active cancers, uncontrolled psychosis, and certain medications, including psychotropic medications, corticosteroids, amiodarone, chemotherapy for cancer, iron supplements, sucralfate, proton pump inhibitors, and cholestyramine. Pregnancy was excluded by urine or serum human chorionic gonadotropin test and by taking patient history of missing a menstrual period.

Study design
The proposed study design was a prospective, randomized, double-blind, crossover study, registered with ClinicalTrials.gov (NCT02317926). After informed consent was obtained, patients were carefully counseled not to make any changes in diets, exercise regimen, or any medications, including lipid-lowering drugs.
All study participants, physician investigators, those administering the neurocognitive tests, and those analyzing test results were blinded throughout the study.
Baseline assessments were obtained before randomization, whereas patients were taking either LT4, LT4 + LT3, or DTE (Table 1). Subsequently, patients were randomized to 1 of 3 groups and started on the study medications (Table 2). All medications were given once daily only, in the morning on an empty stomach with water only; other medications, if any, were given at least 1 hour later. Adherence was evaluated through interviews, by counting the number of remaining capsules, and monitoring refills, as well as evaluating the thyroid function tests.

e4403
Each study medication was administered during a study period of 22 weeks, as follows: after the first 6 weeks on the medication, TSH levels were checked, and the dose adjusted to maintain TSH level between 0.27 and 4.20. The adjustment was performed by a physician who had no other contact with the study patients. Adjustments were made by increasing or decreasing the dosage of the study medication as per Table 2. If the TSH was out of range, the next higher or lower dosage was used. Similarly, the physician doing the evaluations did not know the dose of the study medications the patients were taking. With the serum TSH level within the desired range, patients continued on the study medication for an additional 16 weeks, completing the study period. Subsequently, patients were crossed over to the next randomized study medication and followed for another 22-week study period. Predefined equivalence ratios among LT4, LT4 + LT3, and DTE were used ( Table  2). The same steps were followed after crossover to the third study medication. The doses of each study medication at the beginning and end of each study period were: LT4, 115 ± 25 vs 115 ± 25 mcg/d; DTE, 77 ± 16 vs 77 ± 17 mg/d; LT4 + LT3, 84 ± 20/8.8 ± 1.7 mcg/d vs 84 ± 20/8.8 ± 1.8 mcg/d (Table 3).

Assessments
All assessments were performed at baseline and at the end of each study period.

Clinical assessment
Body weight, resting heart rate, and blood pressure were performed between 07:30 and 09:00 am, while patients were fasting. Also performed were a complete physical examination and a baseline electrocardiogram.

Clinical biochemistry
Blood samples were collected in the morning between 7 and 10 am, predose in fasting state. Serum TSH, free T4, total T3, and SHBG were measured by clinical diagnostic kits (ECLIA, Cobas 8000, Roche Diagnostics, Indianapolis, IN). Serum lipid panel was measured by homogeneous enzymatic colorimetric methods (Cobas 8000, Roche Diagnostics). Serum total T4 and T3 resin uptake were measured by enzyme immunoassay (AU 5400, Beckman Coulter, TX). Serum free T4 by direct dialysis and reverse T3 were measured by radioimmunoassay (Cal-Biotech, CA, and Radim, Italy, respectively). DNA was extracted from blood samples using manufacturer's protocol (DNeasy kit, Qiagen). Genotyping was performed in duplicates per allelic discrimination protocol from real-time PCR machine (Applied Biosciences) using TaqMan reagents and rs225014 SNP primer (13,14).

Questionnaires and cognitive testing
Patients underwent memory testing using the Wechsler memory scale-version IV (WMS-IV) (15,16), Beck Depression Inventory (BDI) (Pearson, PsychCorp, San Antonio, TX) (17,18), and were asked to respond to a QoL general health questionnaire-12 (GHQ-12) (19), and a thyroid symptom questionnaire (TSQ) (17,18). Because there are no well-validated and specific instruments to quantify the severity of hypothyroidism, we designed our own TSQ (12,17), a health-related QoL questionnaire, that was modeled after the hypothyroid-specific questionnaires developed by Jaeschke et al (20) and Cooper et al (21). This questionnaire consisted of 12 questions, presented in the same format as the GHQ-12, that asked patients how they felt over the past 22 weeks. The WMS-IV The contents of each one of the 9 capsules are shown; capsules were prepared by the research pharmacy. Abbreviations: DTE, desiccated thyroid extract; LT3, liothyronine; LT4, levothyroxine. included 5 subdomains: auditory memory index, visual memory index (VMI), visual working memory index, immediate memory index, and delayed memory index (15,16). The BDI is a self-rating scale of 21 items, in which scores of 10 or less indicate normal mood variation and scores of 11 or more reflect increasing levels of depression; clinically important depression is associated with scores of 20 or more (17,18). At the end of the study, each patient was asked which treatment (the first, the second, or the third) he or she preferred.

Adverse events
At all visits, patients were specifically asked about side effects, including palpitation, tremor, sweating, and other clinically relevant hypo/hyperthyroid symptoms.

Statistical analysis
On the basis of a previous study by Clyde et al (17), using the TSQ index as the outcome measure, the means of the 2 groups (LT4 and combination LT4 + T3) were 58 and 50 with the respective standard deviations of 23 and 12.
Sample size for this crossover study was based on a paired t test with a 5%, 2-sided significance level and assumed a standard deviation of 23 and a within-subjects correlation of 0.5. A sample size of 67 is required for 80% power to detect a difference of 8 points on the TSQ. Accounting for a dropout rate of up to 25%, the necessary sample size is estimated to be 90 enrollees.

Primary outcome
Differences between treatments were evaluated using mixed effects models. The primary outcome model included a fixed effect for treatment and a random effect of subject. Models were run with and without the inclusion of baseline scores to isolate between treatment differences. In follow-up analysis of the primary outcome, scores for the LT4 monotherapy condition were also treated as baseline in an adjusted model to isolate effects in the other 2 experimental conditions. The difference between treatments was tested using linear contrasts of the marginal means with Tukey adjustments for between treatment comparisons.

Secondary outcomes
Prespecified secondary comparisons included the mixed effects comparison of other collected dependent measures.
No multiplicity correction was applied to the large number of these secondary outcome measures. Additionally, models to evaluate patient preference for each treatment were prespecified. To evaluate these preferences, we used a Plackett-Luce model (22). We also examined the distribution of genotype and clinical indications.
The various dependent measures (including the primary outcome) were tested to identify possible interactions with either patient preference or genotype/ clinical characteristics.

Subanalysis
Following the logic that various subsets of patients might react differently following treatment, we created terciles of various measures (eg, TSQ, GHQ, BDI,) in the LT4 alone condition and compared patients across those terciles for changes under either DTE or combination LT4/LT3. To evaluate significant differences within those groups, we used the Kruskal-Wallis test and Dunn test for pairwise comparisons, with a Holm adjustment for multiplicity within each measure. One sample exact Wilcoxon tests were used to evaluate whether the change in measures for individual quantiles were different from 0. No omnibus multiplicity correction was applied to these subanalyses, run on a large number of secondary outcome measures.
Categorical comparisons, such as the relationship between gene patterning and diagnosis, were evaluated with χ 2 tests or the Fisher exact test, as appropriate. All statistical analyses were completed using R (R Core Team, Vienna, Austria). Alpha was set at 0.05 for all analyses.

Primary Outcome Measures
No differences were observed in the TSQ-36 and GHQ-12 questionnaires, or in the BDI and the 5 subdomains of WMS-IV test assessments (Table 4).

Secondary Analyses
Serum TSH remained within the normal refence range and varied minimally among the 3 treatment arms, although levels were slightly higher in DTE-treated patients (Table  4). In contrast, serum T3 and T4 levels were substantially affected by both treatment arms that contained T3:fasting total T3 was 30% to 50% higher and serum T4 levels approximately 30% lower when patients were on therapy with DTE or LT4 + LT3 (Table 4). There was a 38% drop in the total T4/T3 ratio after the patients crossed over to LT4 + LT3, and a 56% drop after they crossed over to DTE. The changes in T4/T3 ratio reflect the changes in serum T4 and T3 levels and remained within the normal reference range (22.7-150). These values, however, are not fully representative of the 24-hour period for the short half-life of T3.
Serum reverse T3, T3 resin, and free T4 were also affected, as expected (Table 4). Body weight, serum lipid levels (ie, total cholesterol, low-density lipoprotein and high-density lipoprotein cholesterol), as well as SHBG and leptin serum levels were not different during the different treatment arms (Table 4). Although blood pressure was not different among the three treatment arms, heart rate was minimally increased while patients were on the DTE treatment arm (Table 5).
Treatment preference Most patients indicated a treatment preference at the end of the trial (Table 6) but 4 patients had no preference; 2 patients ranked LT4 as the best, and ranked both DTE and LT4/LT3 combination in second place; and 1 patient ranked both DTE and LT4/LT3 as best and LT4 as second. In addition, 1 patient ranked LT4 and DTE as best and ranked LT4/LT3 in second place. There were no significant differences among the 3 treatment groups.
Etiology of hypothyroidism and Thr92Ala-DIO2 polymorphism No differences in outcomes were identified when patients with autoimmune hypothyroidism vs nonautoimmune disease were compared (Table 7) or when the Thr92Ala-DIO2 polymorphism was considered (Table 8). Of note, the number of subjects with homozygous DIO2 polymorphism was likely too small for meaningful evaluation.

Subanalyses
There is now a consensus that a fraction of the LT4-treated hypothyroid patients with normal serum TSH levels remains symptomatic (6,7,23,24). To focus our analyses on symptomatic patients, we first ranked the 75 patients, while they were on LT4, according to their scores on the cognitive tests and questionnaires and stratified them in 3 terciles: low (L), medium (M) and high (H). For TSQ-36, GHQ-12, and BDI, the H group contained the most symptomatic patients, whereas for the WMS-IV (VMI subdomain), the L group contained the patients with the most substantial cognitive impairment. An analysis of the patients' characteristics in L, M, or H groups did not reveal differences
Notably, the treatment preference values mass away from LT4 in the M and H quantiles (Table 8).
Next, we asked how patients in each tercile (L, M, or H) performed after they were switched to LT4 + LT3 or DTE, using the strict criterion of P < 0.001 to reject the null hypothesis. For the TSQ-36, patients in the H tercile (worst performers on LT4) had statistically different changes than participants in the other L or M terciles, showing substantial improvement after they were switched to LT4 + LT3 (P = 0.0005, Fig. 2) or DTE (P = 0.0008, Fig. 3).
A similar profile was obtained for BDI (Figs. 2 and 3). The worst performers (tercile H) were those that improved the most from the switch to LT4 + LT3 (Fig. 2) or to DTE (Fig. 3). The pattern was also similar for the GHQ-12, with patients on the L tercile (best performers) being slightly worse after switching to LT4 + LT3, whereas patients on the H tercile (worst performers) did substantially better (Fig. 2).
For the GHQ-12, the switch to DTE had a numerically, but not statistically positive impact on scores in the M and H terciles, improving their relative performance in both cases (Fig. 3). The findings that TSQ-36, GHQ-12, and BDI exhibited similar outcomes were unsurprising given that these tests use similar parameters to assess well-being and QoL, and their scores exhibited a high degree of correlation (Fig. 4).
For the WMS-IV subdomains, only those patients on the L tercile of the VMI (worst performers) improved by switching to LT4 + LT3 (Fig. 2) or DTE (Fig. 3). There was good correlation among the scores in the WMS-IV subdomains (Fig. 4), but only fewer substantial changes were seen with auditory memory index and delayed memory index after patients switched to DTE (Fig. 5). No statistically significant differences in tercile outcomes were observed for visual working memory index and IMI. Likewise, none of the biochemical and metabolic parameters assessed responded favorably to LT4 + LT3 or DTE, even when the L, M, and H terciles were considered.

Adverse drug reactions
No adverse effects were reported with any of the treatments. All patients tolerated the treatments equally well. None was withdrawn from the study because of side effects.

Discussion
To our knowledge, this is the first randomized, double-blind, crossover study that compares LT4 vs DTE vs LT4/LT3 therapy in hypothyroid patients. The results confirmed previous trials in which no major differences were observed between monotherapy and combination therapy, including our own previous trial comparing LT4 and DTE (12). Treatment preference was not different and there were no interferences of the etiology of hypothyroidism or Thr92Ala-DIO2 gene polymorphism in the outcomes. However, a subanalysis revealed that the 1/3 most symptomatic patients on LT4-treated hypothyroid patients improved significantly after switching to combination therapy containing T3, either LT4 + LT3, or DTE. They also preferred therapy containing T3 or with DTE. These individuals were identified through poor scores on mood (BDI) and cognitive (VMI) assessments as well as through 2 QoL questionnaires, TSQ-36 and GHQ-12, whereas they were on LT4. Notably, the etiology of the hypothyroidism and the presence of the Thr92Ala-DIO2 polymorphism did not affect these outcomes. These findings are remarkable, as they confirmed anecdotal reports that only a fraction of the LT4-treated hypothyroid patients respond positively to combination therapy containing T3. Previous studies that compared monotherapy with LT4 and combination therapy containing T3 had mixed results (8,9). This suggests that "responsiveness to therapy containing T3" depends on multiple factors, including genetic background, presence of comorbidities/autoimmune disorders, as well as other yet unidentified biological/environmental factors (25). In general, trials of combination therapy have not been designed to specifically recruit dissatisfied patients on LT4 monotherapy. In fact, some trials have excluded patients with mental illness, affective disorders, or untreated depression (11,26,27). Other trials prescreened and stratified patients for fatigue (28) or depressive symptoms (29). Although we did not specifically recruit symptomatic patients on LT4, here we did use GHQ-12 and TSQ-36, and cognitive dysfunction using WMS-IV evaluation to quantify dissatisfaction and residual symptoms while on LT4 monotherapy. The patients identified through these methods consistently preferred and performed better while on therapy containing T3, either LT4 + LT3, or DTE.
The rationale behind thyroid hormone replacement therapy is to administer thyroid hormone in a way that restores thyroid hormone signaling in all tissues. Serum TSH has traditionally been used to adjust the dose of levothyroxine that presumably achieves this goal. However, normalization of serum TSH is achieved through a slightly elevated serum T4 and reduced serum

e4409
T3 levels (25). TSH was higher in the DTE arm but still within the normal range. It was difficult to make DTE and LT4 doses bioequivalent. It is not known whether these small changes in serum T3 and T4 modify thyroid hormone signaling. The observation in the present investigation that replacement therapy containing T3 or DTE (which also contains T3) slightly elevates serum T3 and reduces serum T4, whereas mitigating the residual symptoms of hypothyroidism suggests a causal relationship. Nonetheless, our findings are clear that no correlation between serum T3 and outcomes could be established in the present investigation, not even in the subanalyses. This apparent conundrum suggests that intracellular T3 levels, which are affected by both serum T3 and locally generated T3, might be playing a role (30). It is well known that thyroid hormone plays a role in the development of an adult brain, as documented by the modifications in mood, behavior, and cognition observed during the transition from hypo-to euthyroidism and to hyperthyroidism (31,32). In fact, the human temporal pole responds promptly to minimal changes in thyroid hormone signaling (33). Unique about the brain is that thyroid hormone signaling exhibits additional layers of complexity; thus, there are multiple mechanisms that can fail and compromise thyroid hormone signaling. Although plasma T3 can be taken up by the brain, most T3 molecules in the brain   are produced locally in the glial cells through deiodination of T4 via DIO2. T3 exits the glial cells and subsequently functions in a paracrine fashion to activate neuronal gene expression (34). That some patients on LT4 monotherapy remain symptomatic and prefer therapy containing T3 suggests that therapy with LT4 might not restore cerebral thyroid hormone in all patients. Hypothetically, this could be explained by defects in thyroid hormone transporters, the DIO2 pathway, or by the relatively low serum T3 levels seen in LT4-treated patients (25), but none of these factors segregated in the most symptomatic LT4-treated patients. The possibility that the relatively higher plasma T4 levels seen in LT4-treated patients might play a role remains to be investigated.
Thus, the simplest way to interpret the present results is to consider that thyroid hormone signaling in the brain was not fully restored in some LT4-treated patients as a result of one or more mechanisms yet to be identified, and that the elevation in serum T3 resulting from treatment with LT4 + LT3 or DTE mitigated this problem. Importantly, despite the increase in serum T3 levels seen with LT4 + LT3 or DTE, there were no associated cardiovascular adverse reactions or changes in blood pressure; heart rate was only minimally accelerated by therapy with DTE. This is in  agreement with findings of other studies in which patients treated with liothyronine or DTE were analyzed (3).
In the subanalyses, we did not use baseline values, but rather values on LT4. So, data obtained at the end of DTE or LT4 + LT3 study periods were compared to data obtained at the end of the LT4 study period. While "regression to the mean" may emerge as a possible explanation, we do not believe this is happening because it is unlikely that most cognitive tests would follow the same tendency randomly; in addition, these very same patients were the ones that preferred DTE or combination therapy.
This study was powered to detect a within-subjects difference in TSQ scores between treatments for hypothyroidism, and therefore recruited and studied a group of 75 hypothyroid patients. Thus, a significant limitation of the present investigation was that the most exciting findings were obtained through a subanalysis that was not included in the primary outcomes of the study, and that used a smaller subset of patients to conduct a between-subjects analysis. Again, it is conceivable that regression to the mean explains some of the observed effects; patients that tended to do poorly on the LT4 treatment showed larger improvements when switched to other treatments. However, several observations may ameliorate that possibility. First, the parameters we identified as being responsive to therapy containing T3 (ie, QoL questionnaires and cognitive tests) exhibited intrinsic consistency, and thus are unlikely to reflect random findings. Second, patients who did poorly on LT4 showed, generally, much less preference for LT4, indicative of at least some participant knowledge of the efficacy of the treatment, which may indicate that these groupings are more than simply a statistical artifact. Third, patients who did poorly on LT4 did not necessarily show consistent improvement when switched to either of the other 2 treatments, suggesting that mean reversion does not completely account for the finding. Future studies designed to primarily study symptomatic patients on LT4 should expand and clarify these findings, with a design specifically powered to address the between-subject difference that arises from the idiosyncratic success of patient responsiveness to different therapies.
In conclusion, some patients on LT4 therapy remain symptomatic despite normal serum TSH levels. In the present investigation, the number of these patients was not sufficiently large to affect the outcomes of the whole group. However, in the subset analysis, these patients were identified as having the worst performance on QoL questionnaires and cognitive tests. These patients were also the ones that preferred and benefitted from switching to therapy containing T3, either DTE or LT4 + LT3 combination, suggesting that thyroid hormone signaling in a minority of the patients on LT4 remain subnormal and can be improved (perhaps restored) with therapy containing T3.