Maternal Diet and the Serum Metabolome in Pregnancy: Robust Dietary Biomarkers Generalizable to a Multiethnic Birth Cohort

ABSTRACT Background Advances in metabolomics are anticipated to decipher associations between dietary exposures and health. Replication biomarker studies in different populations are critical to demonstrate generalizability. Objectives To identify and validate robust serum metabolites associated with diet quality and specific foods in a multiethnic cohort of pregnant women. Design In this cross-sectional analysis of 3 multiethnic Canadian birth cohorts, we collected semiquantitative FFQ and serum data from 900 women at the second trimester of pregnancy. We calculated a diet quality score (DQS), defined as daily servings of “healthy” minus “unhealthy” foods. Serum metabolomics was performed by multisegment injection-capillary electrophoresis-mass spectrometry, and specific serum metabolites associated with maternal DQSs were identified. We combined the results across all 3 cohorts using meta-analysis to classify robust dietary biomarkers (r > ± 0.1; P < 0.05). Results Diet quality was higher in the South Asian birth cohort (mean DQS = 7.1) than the 2 white Caucasian birth cohorts (mean DQS <3.2). Sixty-six metabolites were detected with high frequency (>75%) and adequate precision (CV <30%), and 47 were common to all cohorts. Hippuric acid was positively associated with healthy diet score in all cohorts, and with the overall DQS only in the primarily white Caucasian cohorts. We observed robust correlations between: 1) proline betaine—citrus foods; 2) 3-methylhistidine—red meat, chicken, and eggs; 3) hippuric acid—fruits and vegetables; 4) trimethylamine N-oxide (TMAO)—seafood, meat, and eggs; and 5) tryptophan betaine—nuts/legumes. Conclusions Specific serum metabolites reflect intake of citrus fruit/juice, vegetables, animal foods, and nuts/legumes in pregnant women independent of ethnicity, fasting status, and delays to storage across multiple collection centers. Robust biomarkers of overall diet quality varied by cohort. Proline betaine, 3-methylhistidine, hippuric acid, TMAO, and tryptophan betaine were robust dietary biomarkers for investigations of maternal nutrition in diverse populations.


Introduction
High-throughput metabolomic profiling technology has rapidly advanced clinical medicine in recent years (1). Its application in largescale epidemiological studies also offers novel insights regarding how dietary exposures influence chronic disease risk (2)(3)(4). This approach could shift studies of diet and health away from a reliance on FFQs as the dietary assessment tool of choice for large-scale population-based studies (3,5). Although FFQs can broadly stratify people as either high or low consumers of certain foods and nutrients (6), they fare less well at estimating exact intakes of many nutrients (7), and can produce biased estimates of true intake because participants rely on memory rather than recording information in real time (8), and their responses are subject to social desirability bias (9). Furthermore, most FFQs lack detailed information on food preparation methods while not reflecting variable rates of digestion and absorption of nutrients via the gastrointestinal tract, and biotransformation by the liver and gut microbiota (5). Food metabolites that are not subject to large interindividual differences in metabolism have great potential to reflect true food consumption more accurately, avoiding the limitations of the self-reported FFQ (5).
Biomarkers of food intake can be sensitive and specific to changes in dietary patterns in free-living populations (10,11). Previous large studies of food-metabolite associations have been conducted predominantly in white nonpregnant populations, either including only men, or postmenopausal women. It is well described that pregnancy consists of a series of small, continuous physiological changes that affect the metabolism of all nutrients. For example, adjustments in the metabolism of nitrogenous compounds are in place by the second quarter of pregnancy, and these serve to promote positive nitrogen balance during the final quarter of pregnancy when fetal demands are greatest (12). Changes in maternal dietary patterns during gestation can augment the physiological adaptations. However, the substantial variability in food intakes makes it difficult to assess using conventional assessment tools. Though some studies have described metabolic phenotype changes across a healthy pregnancy (13)(14)(15) or adverse pregnancy conditions (16,17), few have reported associations in dietary intake in pregnancy based on circulating metabolites that are generalizable in a multiethnic population.
A recent publication that summarized the results of an NIHorganized workshop on "Omics Approaches to Nutritional Biomarkers" (18) highlights that future work to "test […] a variety of foods and dietary patterns across diverse populations to identify universal candidate biomarkers" is necessary. Thus, replication studies involving candidate dietary biomarkers are key to translational epidemiology that can impact population health (19). Here we report the association between self-reported dietary intake using a semiquantitative FFQ and 47 serum metabolites consistently measured with high frequency in a multiethnic population of 900 pregnant women from 3 independent birth cohorts. Specifically, in pregnant women in their second trimester our objectives were to: 1) identify serum metabolites associated with maternal adherence to a high-or low-quality diet, and 2) determine correlations between selected food groups with putative circulating metabolites associated with diet quality across 3 birth cohorts from Canada, appropriately considering variations in ethnicity, regional location, and fasting status.

Participants
This investigation was conducted on data and serum samples obtained from 3 prospective birth cohorts of pregnant women conducted in different geographical regions of Canada, and enrolling women of diverse ethnicities that comprised the NutriGen Birth Cohort Alliance (20). This cohort consortium includes mother-infant pairs from the SouTh Asian biRth cohorT (START) study (21) recruited from the Peel Region (Ontario); the Family Atherosclerosis Monitoring In earLY life (FAMILY) study (22) recruited from the city of Hamilton (Ontario); and the Canadian Healthy Infant Longitudinal Development (CHILD) cohort study (23) recruited from 4 cities/regions across the country (Winnipeg-Morden, Manitoba; Vancouver, British Columbia; Hamilton, Ontario; and Toronto, Ontario). Participants were recruited between 2004 and 2012 and follow-up is ongoing. For more details on these cohorts, please refer to the Supplemental Methods.
Detailed dietary information was available from a semiquantitative FFQ from each cohort. The START and FAMILY birth cohorts used FFQs specifically designed and validated for use in Canadian South Asians and white Caucasians, respectively (24). The CHILD study used a "Canadianized" multiethnic FFQ derived from a validated instrument created by the Fred Hutchinson Cancer Research Center (25). We excluded women who did not complete an FFQ sufficiently (>5% of questions left blank), or who reported implausible energy intake (<500 or >6500 kcal/d). This left 5001 eligible for serum metabolomics analyses, of which 900 were selected for the present analyses based on contrasting diet quality, as described below.

Diet quality assessment
The FFQs were harmonized to create 36 common food groups as described previously (26). Though others have used the Healthy Eating Index (27,28) or Alternative Healthy Eating Index (AHEI) (29) to characterize dietary patterns, our FFQs did not capture all of the AHEI components with sufficient precision for direct use. We therefore developed a diet quality score (DQS), calculated as the sum of daily servings of "healthy" foods (fermented dairy, fish and seafood, leafy green vegetables, cruciferous vegetables, legumes, fruits, nuts, and whole grains) less the sum of daily servings of "unhealthy foods" (processed meats, refined grains, French fries, snacks, sweets, and sweet drinks), described in Supplemental Table 1. These foods were chosen because they have been widely used to characterize healthy dietary patterns (i.e., prudent diet) that reduce chronic disease risk (30)(31)(32)(33)(34)(35)(36)(37). Our DQS correlates well with a modified version of the AHEI previously derived in these cohorts (26,38).
Within each cohort, those with a DQS >90th percentile of the cohort were considered "high" diet quality; those with a DQS <10th percentile of the cohort were considered "low" diet quality, and the remaining participants were considered "intermediate" diet quality. We then selected 100 participants from each of these 3 groups at random from each cohort (300 per cohort), to create a cohort of 900 pregnant women for serum metabolomics analyses (Supplemental Figure 1).
The DQS is a single aggregate metric that considers both healthy foods and unhealthy foods, and therefore it could lack specificity for serum metabolites. To address this issue, we separated the diet score into a healthy diet subscore and an unhealthy diet subscore, and reassessed the associations of each individually, and mutually adjusted, with serum metabolites. The subscore is the number of servings for each of the "healthy" and "unhealthy" items in the score. The healthy score is the sum of servings of fermented dairy, fish and seafood, leafy green vegetables, cruciferous vegetables, legumes, fruits, nuts, and whole grains. The unhealthy score is the sum of servings of processed meats, refined grains, French fries, snacks, sweets, and sweet drinks. The DQS ranges from −41.1 to 66.6, after constraining influential leverage points in the components of the DQS (unhealthy and healthy diet scores >3 × IQR were winsorized at the fifth and 95th percentiles). The distribution of these scores is presented in Supplemental  Figures 2-5.

Covariates
In addition to ethnicity and whether the sample was collected in the fasting or nonfasting state, and region in Canada (CHILD), we also used maternal age and gestational age at time of recruitment, sociodemographic information, prepregnancy BMI, parity, multivitamin use, smoking history, height, weight, and medical history (including gestational diabetes and hypertension during the current pregnancy) from existing data files.

Biospecimen collection and metabolomic analysis
Serum samples were collected from all pregnant women and stored in liquid nitrogen at the Hamilton Clinical Research Laboratory. In 2 of the cohorts, the sample was collected after an overnight fast (START, FAMILY) and in 1 a random nonfasting sample was collected (CHILD).

Maternal serum metabolome analyses
A validated, high-throughput platform based on multisegment injection-capillary electrophoresis-mass spectrometry (MSI-CE-MS) was used for the identification and quantification of polar/ionic metabolites measured consistently in serum filtrate samples with stringent quality control (QC) (10,(39)(40)(41). This multiplexed separation platform is described in more detail in the Supplemental Methods, including a standardized method protocol for characterization of the maternal serum metabolome. Briefly, the number of serum metabolites that satisfied selection criteria for analysis in START, FAMILY, and CHILD were 67, 66, and 47, respectively; of these, 47 serum metabolites were measured consistently across all 3 cohorts when using MSI-CE-MS under 2 configurations with positiveand negative-ion mode detection. An iterative data workflow was used to reject spurious signals, redundant peaks, and background ions when performing targeted and nontargeted metabolite profiling based on analysis of a pooled serum sample that also served as QC for assessing technical precision (40). Furthermore, serum metabolites were analyzed only if they satisfied 2 additional criteria: 1) the metabolite was detectable in ≥75% of individual samples in a cohort (i.e., frequency filter), and 2) the technical precision for metabolites measured in repeat QC samples (i.e., reproducibility filter) had a CV <30% (or 40% for low-abundance metabolites with signal-to-noise <10). Nondetectable values were replaced with a missing value input corresponding to half of the minimum response measured for a serum metabolite in each cohort. Also, a robust QC-based batch correction algorithm was used to correct for long-term signal drift when using MSI-CE-MS, as described elsewhere (40). In this work, most serum metabolites were unambiguously identified (level 1) by their comigration and accurate mass (<5 ppm) after spiking with an authentic standard in a pooled QC sample, and subsequently quantified (micromolar) using a calibration curve, where ion responses were normalized to a single internal standard (i.e., relative peak area, RPA). Reference concentrations for serum metabolites for second-trimester pregnant women from different birth cohorts are reported elsewhere (). Otherwise, all serum metabolites were annotated based on their characteristic accurate mass and relative migration time (RMT) under positive (p) or negative (n) ion mode (m/z:RMT:mode). Also, unknown serum metabolites were further annotated based on their most likely molecular formula (level 4), with most compounds putatively identified (level 2 or 3) following acquisition of high-resolution tandem MS spectra at different collision energies (42). This stringent process ensured that only fully authenticated serum metabolites reliably measured in most serum samples were correlated to habitual dietary patterns to reduce false discoveries and data overfitting.

Statistics
For objective 1 (identification of dietary biomarker candidates), we performed 2-tailed t tests to compare mean natural logarithm-transformed metabolite concentrations between pregnant women with low (n = 100) compared with high (n = 100) diet quality within each cohort. We considered batch-corrected serum metabolite response (RPA) differences nominally significant at P < 0.10 (without correction for multiple testing) candidates for multivariate analyses (43). In multivariable linear regression models, natural logarithm serum metabolite RPAs were regressed on the continuous diet score within each cohort (n = 300), adjusted for prepregnancy BMI, gestational age, total energy (kcal), maternal age, maternal ethnicity (in CHILD only, because it was the only cohort with multiple ethnicities), and center (in CHILD only, because it was a multicenter study). DQS-serum metabolite associations were considered significant at P < 0.05, tested independently with no correction for multiple testing. To understand whether it was the presence or absence of "healthy" or "unhealthy" foods driving associations, we fit similar multivariate linear regression models for the healthy and unhealthy subscores separately, each additionally adjusted for the opposing diet subscore.
For objective 2 (linking serum metabolites to self-reported habitual intake of specific foods), we selected food groups for which there is moderate to strong evidence of a metabolite biomarker in the published literature, as summarized by Exposome Explorer (http://expo some-explorer.iarc.fr; see Supplemental Methods). Food group variables represented as servings per day were natural log transformed to correct for skewness prior to analysis. Analysis included: 1) reporting unadjusted pairwise Pearson correlation coefficients between the  serum metabolite and specified food group; 2) assessing the association between serum metabolite concentration and DQS and foods, using multivariable linear regression (using log-transformed metabolites), adjusted for prepregnancy BMI, gestational age, total energy intake, maternal age, and ethnicity (in CHILD, because it was the only cohort with multiple ethnic groups), prior to meta-analysis, and region in Canada (CHILD only-Toronto, Edmonton, Winnipeg, and Vancouver); 3) combining the results of the 3 cohorts using inverse-variance random-effects meta-analyses; and 4) for significant diet-metabolite pairs we conducted random-effects metaregressions to explore/evaluate the moderating effect role of sample fasting status.
To assess the robustness of our regression models, we conducted kfold crossvalidation by dividing the dataset into 10 equal-size subsets (i.e., k = 10) (44). For each iteration, we combined k − 1 subsets to serve as the training set, and the 1 remaining subset served as the test set. Every sample served as a test data point once and only once, with higher Pearson correlation coefficient values (r) considered evidence of a robust association. All analyses were completed in R (v3.6.3; R Foundation). Model assumptions and missing data handling are described in the Supplemental Methods.

Participant characteristics
Nine hundred participants were included in the discovery analysis (300 women from each of the 3 cohorts) ( Table 1). Participants in the metabolomics study were generally representative of pregnant women in each of the cohorts with a mean (± SD) age of 31.2 ± 4.7 y, and a prepregnancy BMI of 25.1 ± 5.3 kg/m 2 . All women in START were South Asian; and >97% were white Caucasian in FAMILY and CHILD. White Caucasian women from CHILD were overrepresented in our sample compared with the full CHILD cohort (97.7% compared with 72.9%). Overall, 44% of women were primiparous, >89% used a prenatal vitamin, 79% had never smoked, 18% were former smokers, and 3% were current smokers. All START mothers were lifelong never smokers, and the smoking profiles were similar in CHILD and FAMILY. The cohorts had different chronological and gestational ages at recruitment (with START mothers being youngest, and FAMILY being oldest), and different prepregnancy BMI (with START mothers being lowest, and FAMILY being highest). START mothers were most likely to have gestational diabetes (26.2% of mothers), and least likely to be employed at the time of the survey (54.2%).

<0.004
Potassium, mg  Independent samples t test comparing mean value of measure between "high" and "low" diet quality groups; adjusted for total energy (rationale in footnote 2).  The Spearman rank coefficient (ρ) was 0.76 (P < 0.0001) between the DQS and the modified Alternative Healthy Eating Index (mAHEI) for the entire data set (n = 900), which was consistent across each cohort (r = 0.66 in START, 0.76 in FAMILY, and 0.68 in CHILD; n = 300 each) ( Table 2). The mean maternal DQS in pregnancy differed significantly between the cohorts and was highest in START (7.1 ± 8.1), lowest in FAMILY (1.6 ± 6.5) and intermediate in CHILD (3.2 ± 8.6) ( Table 1, Supplemental Figures 2-5). Across the cohorts, a higher DQS was consistently associated with higher mA-HEI, higher total fiber, protein, vitamin A, vitamin C, folate, calcium, and potassium intakes, a higher polyunsaturated:saturated fat ratio, as well as lower saturated and trans-fat and cholesterol intakes ( Table 2), reflecting a nutrient-rich and health-promoting maternal dietary pattern.

Associations between diet quality index and serum metabolites
Candidate serum metabolites passing the initial P < 0.10 threshold using the extreme-ends approach (Supplemental Tables 2 and 3) included 14 from START, 14 from FAMILY, and 9 from CHILD. Collectively, these 29 metabolites were then entered into multivariable linear regression models to assess the associations with the DQS, and the healthy and unhealthy indices separately. The initial screen identified significant and high magnitude of differences in serum metabolic responses (RPA) between high and low DQS for methylhistidine, choline, arginine, and tryptophan betaine in START; hippuric acid, hypoxanthine, methylhistidine, and gluconic acid in FAMILY; and hippuric acid, proline betaine, and monomethylarginine in CHILD, as shown in the volcano plots (Supplemental Figure 6a-c).
In adjusted multivariate linear models, 8 serum metabolites in START, 10 in FAMILY, and 3 in CHILD were significantly associated with DQS, after adjusting for prepregnancy BMI, maternal age, total energy (kcal), and gestational age (Supplemental Table 4). In START, higher DQS was associated with higher circulating concentrations of arginine, choline, serine, tryptophan betaine, 2-hydroxybutyric acid, and an unknown singly charged cation annotated by its m/z:RMT:mode and most likely molecular formula [334.688.0.805:p; C 20 H 47 N 18 O 6 S], whereas serum 3-methylhistidine and uric acid were inversely correlated to the DQS. In FAMILY, a higher DQS was associated with higher circulating concentrations of aminoadipic acid, dimethylglycine, gluconic acid, hippuric acid, monomethylarginine, trimethylamine-Noxide (TMAO), and 2-hydroxybutyric acid. In contrast, DQSs were inversely correlated to serum hypoxanthine, pyruvic acid, and a singly charged cation annotated by its m/z:RMT:mode and most likely molecular formula [129.066.0.739:p; C 5 H 8 N 2 O 2 ]. In CHILD, a higher DQS was associated with higher circulating concentrations of hippuric acid, proline betaine, and an unknown singly charged anion [145.0142:0.866:n; C 5 H 10 N 2 O 3 ]. However, there was little overlap between the cohorts for these serum metabolites as shown in the Venn diagram, suggesting that though there are some common metabolites, the associations did differ by cohort-level factors (Figure 1).
The healthy diet score and unhealthy diet score component analysis within each cohort is shown in Supplemental Table 4. Of the 14 serum metabolites identified in START, 7 (3 positively, 4 negatively) were associated with the healthy diet score and 7 (0 positively, 7 negatively) with the unhealthy diet score. Of the 14 serum metabolites iden-tified in FAMILY, 4 (2 positively, 2 negatively) were associated with a healthy diet score and 6 (2 positively, 4 negatively) with an unhealthy diet score. Of the 9 serum metabolites identified in CHILD, 2 (2 positively, 0 negatively) were associated with healthy diet score and 1 (1 negatively) with an unhealthy diet score. Serum hippuric acid was associated with a healthy diet score in the 2 largely white Caucasian cohorts, FAMILY and CHILD (Supplemental Table 4).

Associations between specific food groups and serum metabolites
Robust correlations (i.e., those with a meta-analysis pooled association |>| 0.1 and P < 0.05 across all 3 cohorts) were observed between the self-reported intake of several food groups and circulating metabolite concentrations measured in pregnant women (Figure 2). Citrus fruits and citrus juice were robustly correlated with serum proline betaine, when grouped together (random effects meta-analysis pooled Pearson r = 0.29; P < 0.0001; Figures 2 and 3) and separately for citrus fruits (r = 0.42; P < 0.0001) and citrus juice (r = 0.36; P < 0.0001). Red meat (r = 0.21; P < 0.0001), chicken (r = 0.26; P < 0.0001), and eggs (r = 0.18; P < 0.0001) were each positively correlated with serum 3methylhistidine. Vegetable (r = 0.16; P < 0.001) and fruit ( r = 0.18; P < 0.0001) intake were each positively correlated with serum hippuric acid. Seafood (r = 0.12; P < 0.0001), meat (r = 0.10; P = 0.003), red meat ( r = 0.09; P = 0.009), and eggs ( r = 0.11; P = 0.001) were each directly associated with serum TMAO concentrations ( Supplemental Table 5). Additionally, total intake of nuts, seeds, peanuts, and legumes was modestly correlated with tryptophan betaine (r = 0.15; P = 0.03). The correlation of the combined intake of nuts, seeds, and peanuts (peanuts were not assessed separately from other nuts and seeds in our cohorts) with tryptophan betaine was nonsignificant, but this metabolite only passed QC for detection in START and FAMILY (r = 0.13; P = 0.30). Scatterplots of food groups against selected metabolites, and boxplots of the correlation between selected metabolites and food groups by diet quality tertile (i.e., low/medium/high) are presented in Supplemental  Figures 7-11.
No significant correlations were observed for serum carnitine and total protein or chicken, but a weak association was found with eggs (r = 0.06; P = 0.06) and red meat (r = 0.09; P = 0.007). Interestingly, serum uric acid was not associated with consumption of total meat, red meat, chicken, eggs, or total protein. No significant associations were observed for lactate, pyruvate, or 2-hydroxybutyrate and total carbohydrate intake or sugar-sweetened beverages, or for glycine and total protein intake (Supplemental Table 5). Our 10-fold crossvalidation of significantly correlated food item-metabolite pairs yielded poor results (R 2 ranging from 0 to 0.16, Supplemental Table 6). Model predictions were better in FAMILY than in START and CHILD in most cases, and models pooling our fasting studies (START + FAMILY) performed better than models merging data for fasting and nonfasting studies in 7/8 cases. More complex models combining multiple food items should be tested to improve the prediction with serum metabolites concentrations.
Fasting status influenced the food group to metabolite correlations of uric acid with fruit and vegetables, and lactic acid with carbohydrates. For those serum metabolites, which were significantly correlated with specific foods/food groups, we investigated the influence of fasting status that also coincides with longer delays in blood processing across multiple centers in CHILD by adjusting for this variation in a metaregression (Supplemental Table 7).

Discussion
In this study, we examined self-reported dietary and quantitative metabolomics data from 3 unique cohorts of women in their second trimester of pregnancy. We demonstrate that the serum metabolomic phenotypes can reflect complex dietary patterns when foods are classified as predominantly healthy or unhealthy, and that several serum metabolites are also associated with the average intake of specific food items as applied to a multiethnic Canadian population. The correlations between dietary scores and circulating metabolites are generally modest (r ∼0.2-0.4), though robust correlations exist between certain foods and certain serum metabolites that are consistent across cohorts irrespective of fasting status, age, prepregnancy BMI, ethnicity, and/or region.
Maternal metabolism changes substantially during pregnancy. In this diverse cohort of pregnant women from across Canada, we replicated previously described food-metabolite associations, notably citruscontaining fruits and juices with circulating concentrations of proline betaine ( Figure 3) because it is an exogenous compound prevalent in citrus juices (45), as well as red meat, chicken, and eggs with methylhistidine because both 1-and 3-methylhistidine positional isomers are present in muscle and other dietary sources of histidine, such as eggs (46). Also, the average intake of vegetables and/or fruit was associated with serum hippuric acid (a major metabolite of flavonoids prevalent in fruits and vegetables) (47), whereas self-reported consumption of seafood, meat, and eggs was also correlated with serum TMAO (present in free form in fish and animal flesh, and also generated from the actions of host and gut microflora cometabolism of carnitine from intake of meat or eggs) (48,49). Additionally, intake of nuts/legumes was associated with circulating concentrations of tryptophan betaine (which accumulates in the seeds of most Erythrina species) (50), and was previously reported to be associated with peanut intake (51). In future investigations of maternal diet and child health outcomes these serum metabolites can be combined to constitute a "metabolic signature" reflecting a healthy diet, and generalizable to a multiethnic population. Specific food group-metabolite associations were robust across cohorts despite differences in DQS distributions in maternal populations sampled from multiple regions across Canada (Figure 2).
Prior investigations of the relation between dietary intake in pregnancy and the serum metabolome are limited. Cross-sectional studies have explored urinary untargeted metabolomics in healthy pregnant women (15), and 2 studies investigated targeted metabolomics longitudinally in healthy pregnancies (14,52), but neither investigated the relation between a standardized dietary score and blood metabolite profile. In these studies, compared with nonpregnant women, all lipoprotein subclasses and lipids are increased in pregnant women, notably the intermediate-density, low-density, and high-density lipoprotein triglyceride concentrations. Large differences are also seen for many fatty acids and amino acids. Pregnant women also have higher concentrations of low-grade inflammatory marker glycoprotein acetyls and IL-18 and lower concentrations of IL-12p70 (15). The plasma concentrations of several essential and nonessential amino acids, long-chain PUFAs, carnitines, phosphatidylcholines, and sphingomyelins have been reported  to change as a function of gestational period (14). Though previous studies have shown that characterization of the human metabolome can reflect differences in contrasting dietary patterns (10,53), there have been few studies analyzing the maternal serum metabolome during pregnancy (13,15), and these are limited due to their small size and lack of generalizability to diverse populations without dietary associations to semiquantitative FFQs.
Though there were several consistent food-metabolite associations demonstrated across cohorts, we also highlight some differences. These differences suggest that dietary biomarkers discovered in largely white Caucasian populations of men and postmenopausal women might not transfer to other ethnicities with distinctive dietary patterns, and possibly life-stages, such as major physiological adaptations occurring during pregnancy. These differences can arise for several reasons: 1) differences in the number of foods within categories on the FFQ (e.g., 20 vegetables on the South Asian FFQ; 18 vegetables on the white Caucasian FFQ); 2) composition of the foods that make up food groups that contribute to the dietary scores across cohorts (e.g., roti, paratha, and chapatti as sources of whole grains in South Asians compared with whole wheat bread and rolls in white Caucasians); 3) differences in cooking methods (e.g., potatoes are usually curried with spices or stir-fried among South Asians whereas potatoes are mainly boiled, mashed, or baked among white Caucasians); and/or 4) between-subject differences in absorption and metabolism of foods and nutrients. Despite efforts to create a standardized DQS that would be associated with presumably similar serum metabolites across cohorts, differences, presumably due to the unique foods eaten by each cohort, led to lack of consistency in the association between the DQS and specific serum metabolites. Another study limitation was that CHILD was the most heterogeneous birth cohort in terms of serum sampling procedures that were performed under nonfasting conditions for pregnant women recruited from multiple centers in Canada. This resulted in fewer and more variable serum metabolites measured within CHILD compared with the 2 other fasting birth cohorts (START, FAMILY), while also introducing confounding from recent dietary intake rather than habitual (i.e., long-term) dietary consumption patterns that better match self-reported FFQs (54).
Few studies have examined metabolomic markers associated with habitual dietary patterns. In a secondary analysis of a controlled feeding study, a classifier using metabolites that differed between diets was able to correctly differentiate between a low-fat (20%), very-lowcarbohydrate (10%), and low-glycemic-index diet (glycemic index = 32.9) in 60 of 63 cases (>95% accuracy) (53). A recent analysis of the Dietary Approaches to Stop Hypertension (DASH) trial, comparing the intervention with the control group, showed that the healthy intervention diet levels of proline betaine and tryptophan betaine were significantly higher in the healthy diet group (55). In a randomized controlled trial we conducted (10,56), fasting plasma and single-spot urinary proline betaine and 3-methylhistidine trajectories differentiated a "Western" from a "Prudent" dietary pattern in nonpregnant, free-living adults following 2 wk of food provision.
Our work provides evidence that metabolomics can be used to assess habitual intake of specific foods applicable to diverse populations of pregnant women with highly variable and complex dietary patterns. By examining associations of food groups with metabolites established in prior studies, we confirm similar associations exist in a multiethnic cohort of pregnant women in their second trimester for citrus fruits and juices, legumes (including peanuts), meat protein, fruit and vegetables, and seafood, eggs, and meat. Proline betaine, tryptophan betaine, TMAO, 3-methylhistidine, and hippuric acid have been identified in previous studies in nonpregnant adult, and adolescent populations. These serum metabolites could be used in future studies as robust dietary biomarkers reflective of healthy and unhealthy diets that complement FFQ assessments. However, dietary biomarkers are not immune to misclassification errors. For example, because our DQS increases (becomes "healthier") with increased fruits and vegetables, those who eat a lot of fruits and vegetables, but exclude citrus fruits (i.e., enriched with proline betaine), might still be misclassified by biomarker pattern alone. This fact emphasizes that few dietary biomarkers are entirely specific to certain foods and are more often associated with habitual eating patterns of distinctive collections of food categories. Dose-response asso-ciations in nutrition are often difficult to demonstrate (57). Here, we show increases in circulating metabolites proportionate to intake of the studied foods within each cohort, and when all 3 cohorts are pooled, as shown in Figure 2 and Supplemental  Inferring direct associations between food intake and biomarkers in observational studies is difficult. Indeed, larger studies of metabolomic markers as independent predictors of cardiovascular disease have shown little correlation to the putative (self-reported) food sources of that compound (58). This is likely because the FFQ and the metabolite reflect different time windows of exposure-the FFQ typically the previous year, and the metabolite typically days if not hours, depending on fasting status. Detection of compounds that are largely not produced endogenously, such as proline betaine, indicate high consumption of that food (e.g., citrus) with high certainty. Detection of compounds produced from body protein catabolism and food sources, such as methylhistidine, might not exclusively reflect consumption of that food (e.g., meat) with any certainty. Well-controlled feeding studies are needed to validate the dose-response of putative dietary biomarkers from observational studies (59) in conjunction with characterization of their abundances in various foods to establish their specificity (60). Also, combinations of dietary biomarkers can improve robustness and plausibility instead of single compounds.(61)

Strengths and limitations
Our work addresses an important limitation in previous metabolomic studies in nutrition by evaluating an understudied population. Maruvada et al. (18) and the NIH group, in addition to noting the importance of testing food intake biomarkers across diverse populations to identify universal candidate biomarkers, also emphasize the importance of "replication of initial biomarker studies in different populations," which is "often necessary to generalize the results, to accommodate population heterogeneity, and to properly account for food choice diversity and dietary patterns." This sentiment is echoed by leaders in this field (62), who note "the large inter-individual variation in response to foods mak[es] it difficult to identify biomarkers that respond reproducibly across populations." Our work involving a multiethnic cohort of pregnant women addresses this major knowledge gap and contributes data on the interpopulation differences in food-metabolite associations.
Although a single independent external study was not selected for replication analysis, the simultaneous assessment of 3 independent birth cohorts from the same country provided independent populations in which to assess consistency, using a validated analytical platform for metabolomic analyses with stringent QC and batch correction adjustment. Limitations include the measurement of diet using a self-reported FFQ; however, this is widely used in epidemiological research studies. Additionally, there was a difference in sample collection timing (i.e., fasting compared with nonfasting) across studies, including delays to processing blood samples across multiple centers. To counter these factors, we used only validated FFQs, and fasting status was adjusted for in the meta-regression analysis. Also, nontargeted metabolite profiling by MSI-CE-MS used in this study was limited to the analysis of polar/ionic metabolites in serum and not lipid classes, which require use of nonaqueous buffer conditions (11). Also, we found that not all metabolites were consistently detectable across cohorts. Finally, we acknowledge that associations in an observational study such as ours are always subject to the possibility of confounding, which is a serious threat to causal inference. We attempted to reduce the likelihood of known confounders through our multivariable adjustment approach, which is also further supported by independent metabolite-dietary associations in nonpregnant populations. However, we cannot exclude the possibility of residual confounding of these associations owing to unmeasured confounders.

Conclusions
In a multiethnic cohort of pregnant women, DQSs are associated with concentrations of specific circulating serum metabolites that reflect higher intakes of both healthy and unhealthy foods. Proline betaine, 3methylhistidine, hippuric acid, TMAO, and tryptophan betaine were robust dietary biomarkers associated with habitual intake of specific foods, and can be used for investigations of maternal nutrition in multiethnic populations that can also tolerate preanalytical variations in blood collection in diverse settings (e.g., fasting status, delays to processing).