TSH and FT4 Reference Interval Recommendations and Prevalence of Gestational Thyroid Dysfunction: Quantification of Current Diagnostic Approaches

Context: Guidelines recommend use of population-and trimester-specific thyroid-stimulating hormone (TSH) and free thyroxine (FT4) reference intervals (RIs) in pregnancy. Since these are often unavailable, clinicians frequently rely on alternative diagnostic strategies. We sought to quantify the diagnostic consequences of current recommendations. Methods: We included cohorts participating in the Consortium on Thyroid and Pregnancy. Different approaches were used to define RIs: a TSH fixed upper limit of 4.0 mU/L (fixed limit approach), a fixed subtraction from the upper limit for TSH of 0.5 mU/L (subtraction approach) and using nonpregnancy RIs. Outcome measures were sensitivity and false discovery rate (FDR) of women for whom levothyroxine treatment was indicated and those for whom treatment would be considered according to international guidelines. Results: The study population comprised 52 496 participants from 18 cohorts. Compared with the use of trimester-specific RIs, alternative approaches had a low sensitivity (0.63-0.82) and high FDR (0.11-0.35) to detect women with a treatment indication or consideration. Sensitivity and FDR to detect a treatment indication in the first trimester were similar between the fixed limit, subtraction, and nonpregnancy approach (0.77-0.11 vs 0.74-0.16 vs 0.60-0.11). The diagnostic performance to detect overt hypothyroidism, isolated hypothyroxinemia, and (sub)clinical hyperthyroidism mainly varied between FT4 RI approaches, while the diagnostic performance to detect subclinical hypothyroidism varied between the applied TSH RI approaches. Conclusion: Alternative approaches to define RIs for TSH and FT4 in pregnancy result in considerable overdiagnosis and underdiagnosis compared with population-and trimester-specific RIs. Additional strategies need to be explored to optimize identification of thyroid dysfunction during pregnancy.

Optimal maternal thyroid hormone availability is important for facilitating the physiological gestational increase of metabolism as well as the growth and (neuro)development of the fetus.Thyroid function test abnormalities, such as (sub)clinical hypothyroidism, isolated hypothyroxinemia, and (sub)clinical hyperthyroidism have been associated with adverse pregnancy outcomes including gestational diabetes, preterm birth, small for gestational age at birth, and suboptimal neurodevelopment of the offspring (1)(2)(3)(4)(5)(6).Thyroid-stimulating hormone (TSH) and free thyroxine (FT4) concentrations considerably change during the course of pregnancy.This is primarily driven by agonistic action of human chorionic gonadotropin on the TSH receptor, changes in thyroid binding proteins, placental type 3 deiodinase expression, and the placental transfer of T4 (7)(8)(9).Therefore, reference intervals for nonpregnant individuals are not considered to adequately identify euthyroidism during pregnancy, complicating the diagnosis of thyroid disorders.
Current international guidelines primarily advocate for the establishment of laboratory-and trimester-specific reference intervals for TSH and FT4 (10)(11)(12).Despite this primary recommendation being in place for over a decade, there is a lack of systematic data evaluating the diagnostic implications of employing pregnancy-specific reference intervals.Furthermore, practical constraints often preclude the calculation of locally derived reference intervals, necessitating reliance on universal fixed upper limits for TSH and the adoption of nonpregnancy reference intervals for FT4.Several studies have highlighted the pitfalls of employing universal fixed cut-offs, as they tend to culminate in misdiagnoses when applied to diverse local populations (13)(14)(15), most likely because TSH and FT4 measurements differ due to various methodologies (assay, preanalytical handling (16)) as well as patient characteristics (body mass index, ethnicity, gestational age (8,(17)(18)(19)).However, these investigations were either single-center studies or reliant on aggregated data, limiting their generalizability and applicability for incorporation into guidelines (20).As such, current recommendations of international guidelines on the definition of thyroid dysfunction during pregnancy are largely based on single-center studies and their subsequent extrapolation of physiology (7-13, 21, 22).In order to improve future recommendations and diagnostic policies, robust assessment of the ramifications of current diagnostic approaches is critical, particularly in cases that warrant clinical intervention (eg, clear indication or consideration for medication-based treatment).
In this individual participant data meta-analysis, we aimed to quantify the performance of commonly used alternative diagnostic approaches to laboratory-and trimester-specific reference intervals.These alternatives include (1) use of a fixed upper limit for TSH, (2) employing a modified upper limit of TSH by subtracting from the nonpregnant upper limit of TSH, and (3) utilizing unadjusted nonpregnancy reference intervals for TSH and FT4 as a historical benchmark.We focused on discerning the impact of these alternatives on clinically consequential decisions such as indications or considerations for treatment as per prevailing international guidelines.

Study Eligibility and Selection
Studies eligible for inclusion were those participating in the Consortium on Thyroid and Pregnancy (https://www.consortiumthyroidpregnancy.org), an international research collaboration dedicated to investigating gestational thyroid (dys)function and its determinants, physiology, and clinical risk profiles.Cohorts included in the consortium are identified through an ongoing systematic review described previously (1).The criteria for inclusion in the current study were prospective population-based cohort studies without selection criteria related to health status with data on TSH, FT4, and thyroid peroxidase antibody (TPOAb) concentrations during the first and second trimesters in pregnancy.We excluded participants with pre-existing prepregnancy thyroid disease, those using thyroid (interfering) medication and those with multiple gestation.Cohorts were excluded if fewer than 120 participants were available after exclusions for reference interval calculations.The study adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines for Individual Patient Data and we included the preregistered study protocol (CRD42021270078) along with an outline of protocol deviations, which can be found elsewhere (additional material (23)).Study quality and risk of bias were assessed using the Newcastle-Ottawa scale (additional material (23)).

Defining Reference Intervals, Treatment Indications, and Treatment Considerations
Reference intervals for TSH and FT4 and (the prevalence of) thyroid function test abnormalities (overt and subclinical hypothyroidism, overt and subclinical hypothyroidism with TPOAb positivity, isolated hypothyroxinemia, overt and subclinical hyperthyroidism) were defined uniformly in a cohort-specific manner.Reference intervals were calculated per trimester, defined as <13 weeks, 13 to 27 weeks, and >27 weeks of gestation.For each cohort, trimester-specific TSH and FT4 reference intervals were calculated using the 2.5th to 97.5th percentiles in TPOAb-negative women.TPOAb positivity was defined according to cut-offs provided by the manufacturer.For cohorts with repeated measurements, we used the first available sample for each trimester.Nonpregnancy reference intervals were either published or communicated by the principal investigator of the included cohorts and were assay specific.Information on assays and iodine status per cohort (measured or presumed on the basis of local or international reports) can be found elsewhere (additional materials (23)).
Thyroid function test abnormalities and prevalences were subsequently defined according to 4 different diagnostic approaches (7) (of which a visual description can be found elsewhere; Figure 1 (23)).Using (1) calculated trimester-specific reference intervals (trimester-specific approach), (2) nonpregnancy reference intervals with a 4.0 mU/L fixed upper limit for TSH (fixed limit approach), (3) nonpregnancy reference intervals with a 0.5 mU/L subtraction from the upper limit of TSH (subtraction approach), (4) unadjusted nonpregnancy reference intervals as a historical benchmark (nonpregnancy approach).Since international guidelines only recommend fixed TSH cut-offs but no fixed FT4 cut-offs, we additionally quantified the role of gestational age-specific FT4 reference intervals by comparing calculated reference intervals as follows: using (5) trimester-specific reference limits for TSH and nonpregnancy reference limits for FT4, and (6) nonpregnancy reference limits for TSH and trimester-specific reference limits for FT4.Treatment indications were defined according to the 2017 American Thyroid Association guidelines; overt hypothyroidism or subclinical hypothyroidism with either a TSH > 10 mU/L or with concomitant TPOAb positivity.A treatment consideration was defined as a TSH between 2.5 mU/L and the upper reference limit with concomitant TPOAb positivity or subclinical hypothyroidism without TPOAb positivity.Treatment of hyperthyroidism was outside the scope of this study, since gestational hyperthyroidism is often considered physiological and we do not have data available to differentiate between gestational transient thyrotoxicosis and Graves hyperthyroidism (10).
The result of each approach was compared to the trimesterspecific approach, currently considered the gold standard.Percent stacked bar plots and Sankey diagrams were used to visualize the diagnostic shift, including those between thyroid function test abnormalities, of participants when comparing approaches.A shift in diagnosis was highlighted in the Sankey diagrams (orange flows) when the treatment indication or consideration changed (eg, participants diagnosed with overt hypothyroidism with the reference approach but diagnosed with isolated hypothyroxinemia with the approach investigated).

Statistical Analyses
Prevalence estimates were aggregated using random intercept logistic regression models, utilizing maximum likelihood to model between-study heterogeneity.This approach was chosen over conventional 2-step inverse-variance approaches due to its preference in sparse event datasets (24,25).Prediction intervals are presented elsewhere (additional materials to indicate between-study heterogeneity (23,26)).For each alternative approach, the sensitivity (probability of a positive test result, conditioned on the individual truly being positive) and false discovery rate (proportion of false positives, among positive findings; eg, FDR = FP/(FP + TP)) were calculated compared with the trimester-specific approach.The FDR was chosen over specificity, as it is more sensitive to false positives in instances of sparse outcomes.Outliers were only removed if values were deemed to result from measurement error (outside detectable range; n = 21).All analyses were conducted using R 4.2.2 for Windows (27), employing the meta (28), ggplot2 (29), and ggalluvial (30) packages.

Results
Out of the 25 cohorts with first and/or second trimester data participating in the Consortium on Thyroid and Pregnancy, 18 fulfilled the eligibility criteria (Fig. 1).After exclusions, the final study population comprised 52 496 participants (Fig. 1) of whom 8.6% were TPOAb positive (range in cohorts 5.7-17.1%).Detailed maternal demographics, cohort-specific prevalences, and an overview of cohort-specific reference limits can be found elsewhere (Tables 1, 2-5, and 6 respectively ( 23)).

Prevalences
Pooled prevalences are presented in Table 1 and elsewhere  (Table 7 (23)).In the first trimester, the trimester-specific approach was associated with a higher pooled prevalence of total thyroid function test abnormalities than all other approaches (Table 1; Tables 7-8 (23)).The only exception was that a trimester-specific approach was associated with a lower prevalence of subclinical hyperthyroidism (prevalence 1.15%, prediction interval 0.54-2.40)than the alternative methods (prevalence 8.30%, prediction interval 3.60-18.01;Table 8 (23)).In the second trimester, a similar trend could be observed, with higher pooled prevalences for all thyroid function test abnormalities except for subclinical hyperthyroidism (Table 1; Table 8 (23)).In general, heterogeneity was highest for the alternative approaches compared with the trimester-specific approach, reflected by the relatively wide prediction intervals for the alternative approaches (Table 8 (23)).

Diagnostic Performance of Alternative Approaches: Treatment Indication, or Consideration
For identifying women with a treatment indication in the first trimester, a fixed limit approach was associated with a better sensitivity and FDR (0.77 and 0.11) than the subtraction approach (sensitivity 0.74, FDR 0.16) and the nonpregnancy approach (sensitivity 0.60, FDR 0.11; Table 2), but CIs overlapped greatly.Similarly, for identifying women with a treatment consideration in the first trimester, the fixed limit approach (sensitivity 0.70, FDR 0.27) was associated with better pooled estimates than the subtraction approach (sensitivity 0.63, FDR 0.35) and the nonpregnancy approach (sensitivity 0.64, FDR .33;Table 2) while CIs were similar.For the second trimester a similar trend can be observed, with largely overlapping CIs around the diagnostic performance estimates (Table 2).

Shift in Biochemical Diagnosis Between Methods
The shifts in treatment recommendation and thyroid function test abnormalities when employing different approaches are visualized in Figs.2-4 and elsewhere (Tables 11-30 ( 23)) (provided as a benchmark).In the first trimester and compared with the trimester-specific approach, using either the fixed limit approach, the subtraction approach, or the nonpregnancy approach would reclassify 34.9%, 34.8%, and 44.5% of women with a treatment indication to a category without a treatment indication, respectively (30.6%, 30.6%, and 39.2% to a category with a treatment consideration, and 4.2% 4.3%, and 5.3% to a category without a treatment recommendation; Fig. 2; Tables 11, 13, and 15 (23)).
As an example, using the fixed limit approach in the first trimester, out of all women with overt hypothyroidism, 11.9% were reclassified as euthyroid, 36.8% as subclinical hypothyroid, and 5.2% as isolated hypothyroxinemia (Fig. 3; Table 23 (23)).In comparison, with the use of the subtraction approach, out of all women with overt hypothyroidism 13.5% would be reclassified as euthyroid, 35.2% as subclinical hypothyroid, and 5.2% as isolated hypothyroxinemia (Fig. 3; Table 25 (23)).Out of all women with subclinical hypothyroidism in the first trimester, with the use of the fixed limit approach, 43.6% were reclassified as euthyroid; 2.1% as overt hypothyroidism, and 0.2% as isolated hypothyroxinemia (Fig. 3; Table 23 (23)).In comparison, with the use of the subtraction approach, 42.5% were reclassified as euthyroid, 2.1% as overt hypothyroidism, and 0.2% as isolated hypothyroxinemia (Fig. 4; Table 25 (23)).Results for the second trimester for overt hypothyroidism were similar, with the exception that using a fixed limit approach resulted in lower rates of reclassification of overt hypothyroidism to euthyroid compared with the subtraction approach (7.3% vs 9.1% respectively) and isolated hypothyroxinemia (3.6% vs 10.9%, respectively; Tables 24, 26 (23)).

The Role of Pregnancy and Trimester-Specific FT4 Reference Intervals
Alternative approaches specify an upper limit cut-off for TSH but no limits for FT4, yet diagnoses in clinical practice need to be made using the FT4 concentration as well.Therefore, nonpregnancy FT4 reference intervals are typically used in clinical practice.In the first and second trimester, the combination of nonpregnancy FT4 reference intervals with trimester-specific reference intervals for TSH compared with all trimester-specific reference intervals was associated with sensitivities ranging from 0.97 to 1.00 to detect a treatment indication or consideration, and FDRs ranging from 0.03 to 0.14 (Table 3).In contrast, the use of nonpregnancy reference intervals for TSH resulted in a lower sensitivity (0.65-0.72) to detect both a treatment indication and consideration, and was associated with a higher FDR for a treatment consideration (0.08-0.32;Table 3).For thyroid function test abnormalities in the first trimester, the combination of nonpregnancy FT4 reference intervals with trimester-specific reference intervals for TSH was associated with a sensitivity of 0.62 to detect overt hypothyroidism, 0.59 for isolated hypothyroxinemia, and 0.73 for overt hyperthyroidism, while sensitivity for subclinical hypothyroidism was 0.99 (Table 3; Table 10 (23)).In comparison, when using a trimester-specific FT4 reference interval with a nonpregnancy TSH reference interval, the sensitivity for diagnosing subclinical hypothyroidism was 0.58 and the FDR was 0.07 (Table 3), while the sensitivity was 0.83 for overt hypothyroidism, 0.95 for isolated hypothyroxinemia and 1.00 for both overt and subclinical hyperthyroidism.

Discussion
Accurately diagnosing thyroid dysfunction in pregnancy remains challenging.While calculation of population-and pregnancy-specific TSH and FT4 reference intervals is considered the optimal approach, this is often not feasible.Our study highlights the suboptimal sensitivity and the FDR that common alternative approaches, such as using a fixed TSH upper limit of 4.0 mU/L or subtracting 0.5 mU/L from the TSH upper limit, have to detect specific thyroid function test abnormalities.Moreover, it is clear from these data that maximizing sensitivity often comes at the cost of a higher FDR, which constitutes a difficult tradeoff.We also identify that the use of nonpregnancy FT4 reference intervals was a primary contributor to diagnostic inaccuracy, especially in the detection of overt hypothyroidism-a condition where prompt management is warranted to mitigate adverse maternal and fetal outcomes (31).
These data provide insights into the extent by which diagnostic accuracy of gestational thyroid function test Treatment consideration is defined as TSH between 2.5 mU/L and upper reference limit with positive TPOAb; TSH between RI upper limit and 10 mU/L with negative TPOAb).Fixed limit approach: nonpregnancy reference intervals with a 4.0 mU/L fixed upper limit for TSH.Subtraction approach: nonpregnancy reference intervals with a 0.5 mU/L subtraction from the upper limit of TSH.Nonpregnancy approach: unadjusted nonpregnancy reference intervals as a historical benchmark.All definitions are based on the 2017 American Thyroid Association guidelines.

6
The abnormalities can be influenced by different strategies for defining TSH and/or FT4 cut-offs.This information can be used to weigh the pros and cons of future policy recommendations.
An important result from this study is the poor diagnostic accuracy and high FDR with the use of the alternative approaches to identify thyroid function test abnormalities with a treatment indication in the first and second trimester.
Two main concepts about the use of alternative approaches arise from these data: (1) The large percentage of overdiagnosis (FDR) in general.While the harms related to unnecessary   Using trimester specific reference intervals for TSH and nonpregnancy reference intervals for FT4 as a means to quantify sensitivity and FDR due to variation between the trimester specific and nonpregnancy reference intervals of FT4.
b Similar methodology but vice versa to quantify sensitivity and FDR due to variation between the trimester specific and nonpregnancy reference intervals of TSH.A treatment indication was defined as either overt hypothyroidism, or subclinical hypothyroidism with TSH > 10 or with concomitant TPOAb positivity, a treatment consideration was defined as a TSH > 2.5 mU/L with concomitant TPOAb positivity or subclinical hypothyroidism without TPOAb-positivity.Abbreviations: FT4, free thyroxine; TPOAb, thyroid peroxidase antibody; TSH, thyroid-stimulating hormone.
medicalization and overtreatment are generally difficult to study, they are inevitably present (32).This is particularly relevant for relatively prevalent thyroid function test abnormalities with a high FDR and for whom treatment is either indicated or should be considered, such as subclinical hypothyroid women, making especially this group prone to harm due to suboptimal diagnosis.( 2) Clinical studies that assess the risk of adverse outcomes typically use laboratory and trimester-specific TSH and FT4 reference intervals.Therefore, the large diagnostic gap with alternative approaches used in clinical practice makes the generalizability of the results from studies on clinical outcomes likely poor.To verify these 2 concepts, future studies should assess the risk of adverse pregnancy outcomes according to different diagnostic strategies.
Another notable observation was that the diagnostic performance of nonpregnancy TSH and FT4 reference intervals was on average only slightly inferior to recommended alternative strategies with greatly overlapping confidence intervals (eg, TSH upper limit of 4.0 mU/L or 0.5 mU/L subtraction from the nonpregnancy limit).The general trend for the first trimester was that nonpregnancy reference intervals were associated with slightly lower sensitivity and slightly higher FDRs for thyroid function test abnormalities with a treatment indication/consideration compared with alternative approaches.And while the alternative diagnostic recommendations assessed in our study perform suboptimally compared with the reference standard of trimester-specific reference intervals, the concept of implementing modified nonpregnancy reference intervals has some clear advantages.It would be easier to implement worldwide, since nonpregnancy reference intervals are universally available and are laboratory specific, and it could also provide a reference interval for FT4.Furthermore, use of an adaptable rule based on nonpregnancy reference intervals would leave beneficial effects of international laboratory-specific standardization and harmonization efforts intact (33,34).
Too little attention has been given to the issue that alternative strategies do not include a recommended FT4 reference interval.Interestingly, we identified that the use of a nonpregnancy reference limit for FT4 mainly reduced the accuracy for the diagnosis of overt hypothyroidism, isolated hypothyroxinemia, and (subclinical) hyperthyroidism while the use of a TSH nonpregnancy reference interval reduced accuracy for the diagnosis of subclinical hypothyroidism.While fixed FT4 reference limits cannot be universally recommended due to large interassay differences in absolute FT4 values, our data indicate that a considerable part of the missing diagnostic accuracy could be accounted for by optimizing gestational FT4 reference intervals.
In this study, there were wide prediction intervals for diagnostic accuracy of the alternative approaches.This reflects the large between-study variability of prevalences and diagnostic performance of immunoassays.One reason is the varying sensitivity of various FT4 assays to increased concentrations of thyroxine binding globulin during pregnancy (35,36).Moreover, another probable reason for interstudy and intrastudy variability is the varying difference between nonpregnancy reference limits, often supplied by the manufacturer and not necessarily reflective of the local population, and the locally derived pregnancy reference limits which are inherently population specific.Thyroid function test-influencing factors such as iodine status or smoking status presumably differ between populations leading to differences in laboratory results.The large between-study variability highlights the challenge for future guidelines to make "a one size fits all" recommendation.Instead, future recommendations could focus on improving local diagnostic assessment rather than defining universally applicable reference limits.

Strengths and Limitations
To the best of our knowledge, this is the first individual participant data meta-analysis studying the prevalence of thyroid dysfunction in pregnancy according to various commonly used diagnostic approaches.We were able to systematically quantify the consequences of different recommendations related to TSH and FT4 reference intervals as well as diagnosis and prevalence of thyroid dysfunction in pregnancy using a unique individual participant dataset of worldwide prospective cohort studies.Our results are in line with a recent aggregate data meta-analysis which identified the prevalence of thyroid dysfunction in the first trimester (20).We restricted our study to the first and second trimester, since we had only limited data available in the third trimester.Since the majority of clinically meaningful decision making takes place in the first or second trimester, we feel this affected the relevance of the current manuscript only minimally.Furthermore, the results of this study cannot be generalized to populations with iodine deficiency or excess since we only included studies with (presumed) adequate or mild to moderately deficient iodine status.It could be debated that an effect of mild to moderate iodine deficiency on thyroid function test distributions could be present, for instance in the case of local fluctuations in iodine status.However, when meta-analyzing small proportions such as prevalences of thyroid dysfunction, larger numbers of studies per iodine status are required for reasonable power and reliable effect estimates to detect differences between methods.For this reason, stratification by iodine status was not feasible in the current study.

Conclusion
In conclusion, the current alternative approaches for defining thyroid function reference intervals during pregnancy are markedly inferior than trimester-specific reference intervals.The application of nonpregnancy reference intervals and other alternative approaches yield similar diagnostic inaccuracies.The use of alternative diagnostic recommendations on the methodology to define the upper limit of TSH primarily affected the diagnostic accuracy of thyroid function test abnormalities with a treatment indication/consideration, except for the diagnostic accuracy overt hypothyroidism, which is primarily impacted by recommendations on the methodology to define FT4 reference limits.These results can be used to optimize clinical decision strategies including recommendations made in the setting of clinical guidelines, and for the design of future trials to avoid misinterpretation of relevant thyroid function test abnormalities.The optimal method for simulating trimester-specific reference intervals, however, may very well differ from the current advice.And while individual centers should optimally strive for establishing trimester-specific reference intervals, future efforts should focus on identifying alternative strategies that can identify women with an abnormal thyroid function based on pregnancy-specific reference intervals if these are unavailable.

Figure 2 .
Figure 2. Figure shows participants with a treatment recommendation according to the reference standard (top row, based on trimester-specific reference intervals using 2.5th and 97.5th percentile in TPOAb negative women).Going down the figure shows the proportion of the same group of participants which has a changed treatment recommendation with alternative diagnostic approaches.A treatment indication is defined as overt hypothyroidism, subclinical hypothyroidism with either TSH >10 mU/L or concomitant thyroid peroxidase antibody [TPOAb] positivity).Treatment consideration is defined as TSH between 2.5 mU/L and upper reference limit with positive TPOAb; TSH between RI upper limit and 10 mU/L with negative TPOAb).Fixed limit approach: nonpregnancy reference intervals with a 4.0 mU/L fixed upper limit for TSH.Subtraction approach: nonpregnancy reference intervals with a 0.5 mU/L subtraction from the upper limit of TSH.Nonpregnancy approach: unadjusted nonpregnancy reference intervals as a historical benchmark.All definitions are based on the 2017 American Thyroid Association guidelines.

Figure 3 .
Figure 3. Change in diagnosis comparing the trimester-specific reference intervals (left; using 2.5th and 97.5th percentile in TPOAb-negative women) and the fixed limit approach (right; nonpregnancy reference intervals with a 4.0 mU/L fixed upper limit for TSH).Labels indicate proportion of women for that specific thyroid function test abnormality who change to a certain other label.Orange labels and flow indicate a change in treatment recommendation, white labels indicate a change in biochemical diagnosis but with the same treatment recommendation, blue labels indicate proportion with the same biochemical diagnosis between methods.

Figure 4 .
Figure 4. Change in diagnosis comparing the trimester-specific reference intervals (left; using 2.5th and 97.5th percentile in TPOAb-negative women) and the subtraction approach (right; nonpregnancy reference intervals subtracting 0.5 mU/L from the upper limit for TSH).Labels indicate proportion of women for that specific thyroid function test abnormality who change to a certain other label.Orange labels and flow indicate a change in treatment recommendation, white labels indicate a change in biochemical diagnosis but with the same treatment recommendation, blue labels indicate proportion with the same biochemical diagnosis between methods.

Table 1 . Pooled prevalence of gestational thyroid functional test abnormalities according to different reference interval methods
A treatment indication was defined as either overt hypothyroidism, or subclinical hypothyroidism with TSH > 10 or with concomitant TPOAb positivity, a treatment consideration was defined as a TSH > 2.5 mU/L with concomitant TPOAb positivity or subclinical hypothyroidism without TPOAb positivity.Abbreviation: TSH, thyroid-stimulating hormone; TPOAb, thyroid peroxidase antibody.

Table 3 . Diagnostic performance of FT4 and TSH nonpregnancy reference intervals Treatment indication Treatment Consideration Overt hypothyroidism Overt hypothyroidism and TPOAb+ Subclinical hypothyroidism
a