-
PDF
- Split View
-
Views
-
Cite
Cite
Ville-Petteri Mäkinen, Mari Karsikas, Johannes Kettunen, Terho Lehtimäki, Mika Kähönen, Jorma Viikari, Markus Perola, Veikko Salomaa, Marjo-Riitta Järvelin, Olli T Raitakari, Mika Ala-Korpela, Longitudinal profiling of metabolic ageing trends in two population cohorts of young adults, International Journal of Epidemiology, Volume 51, Issue 6, December 2022, Pages 1970–1983, https://doi.org/10.1093/ije/dyac062
- Share Icon Share
Abstract
Quantification of metabolic changes over the human life course is essential to understanding ageing processes. Yet longitudinal metabolomics data are rare and long gaps between visits can introduce biases that mask true trends. We introduce new ways to process quantitative time-series population data and elucidate metabolic ageing trends in two large cohorts.
Eligible participants included 1672 individuals from the Cardiovascular Risk in Young Finns Study and 3117 from the Northern Finland Birth Cohort 1966. Up to three time points (ages 24–49 years) were analysed by nuclear magnetic resonance metabolomics and clinical biochemistry (236 measures). Temporal trends were quantified as median change per decade. Sample quality was verified by consistency of shared biomarkers between metabolomics and clinical assays. Batch effects between visits were mitigated by a new algorithm introduced in this report. The results below satisfy multiple testing threshold of P < 0.0006.
Women gained more weight than men (+6.5% vs +5.0%) but showed milder metabolic changes overall. Temporal sex differences were observed for C-reactive protein (women +5.1%, men +21.1%), glycine (women +5.2%, men +1.9%) and phenylalanine (women +0.6%, men +3.5%). In 566 individuals with ≥+3% weight gain vs 561 with weight change ≤−3%, divergent patterns were observed for insulin (+24% vs −10%), very-low-density-lipoprotein triglycerides (+32% vs −6%), high-density-lipoprotein2 cholesterol (−6.5% vs +4.7%), isoleucine (+5.7% vs −6.0%) and C-reactive protein (+25% vs −22%).
We report absolute and proportional trends for 236 metabolic measures as new reference material for overall age-associated and specific weight-driven changes in real-world populations.
We developed new statistical techniques that allowed us to investigate temporal trends in circulating biomarkers in their physical measurement units on a population scale.
We report previously unavailable quantitative trends for >200 circulating metabolic measures, including lipoprotein subclass lipids, branched-chain amino acids and cardiometabolic biomarkers.
We isolated associations between metabolic changes and weight gain in a carefully constructed design in which weight loss and weight gain subgroups were matched at baseline.
Our analyses revealed that weight change explains much of the metabolic variation (e.g. insulin, amino acids, lipoprotein subclass lipids), although indicators such as waist–hip ratio, low-density-lipoprotein cholesterol and creatinine may reflect an overall ageing trend that affects all individuals.
Introduction
Increased longevity and falling birth rates across the world are creating an unprecedented situation in human history in which the old will outnumber the young.1 As age is the primary driver of chronic disease, understanding the nature of ageing within human populations is a prerequisite for mitigating the health consequences from shifting demographics. Age-driven change in metabolism is particularly topical due to the high burden of cardiometabolic diseases, the responsiveness of metabolism to personal and societal interventions, and recent advances in metabolomics profiling of morbidity and mortality.2–6 There is, however, a knowledge gap: large-scale metabolomics studies of ageing in human populations have remained out of reach due to the rarity of long-term longitudinal data and technical challenges from long time gaps between sample collections.7
A cross-sectional survey of young and old people is the easiest to organize but it yields the weakest epidemiological evidence;8 only longitudinal designs can provide well-grounded information on life-course trajectories.9–15 So far, most metabolomics ageing studies have been cross-sectional.16–20 Given the caveats of cross-sectional modelling, the value of longitudinal data is well recognized.7,14,15,21–23 Metabolic profiling across multiple time points produces more robust evidence and longitudinal designs allow better detection and control of confounders.24 On the other hand, new biases may arise when samples are collected from surveys separated by decades, including updates to collection protocols, storage effects and changes in biochemical assay methodology.9,10,24,25 Hence new statistical methods that can remove the biases are essential for longitudinal designs.
In this study, we developed new methods to manage the biases and analysed two longitudinal cohorts that included altogether 4789 participants with at least two time points that were a decade or more apart. To get an accurate picture of metabolic health and how it changed during follow-up, we investigated >200 variables that included anthropometric indicators, clinical biomarkers and nuclear magnetic resonance (NMR) metabolomics. We also used advanced sampling techniques to isolate metabolic associations with weight change and to characterize sex differences in temporal trends. Combined, our extensive results comprise a useful resource to understand how the metabolome changes due to ageing in human populations.
Methods
Cardiovascular Risk in Young Finns Study (YFS)
The YFS is a population-based prospective cohort study.26 It was conducted at five medical schools in Finland (Turku, Helsinki, Kuopio, Tampere and Oulu) with the aim of studying the levels of cardiovascular risk factors in children and adolescents in different parts of the country. The baseline study in 1980 included 3596 children and adolescents aged between 3 and 18 years. Results from clinical examination and fasting samples were used in the present study. Metabolomics data were available from three visits in 2001 (1239 women and 1007 men), 2007 (1186 women and 974 men) and 2011 (1112 women and 927 men).
Northern Finland Birth Cohort 1966 (NFBC1966)
The NFBC1966 is a longitudinal birth cohort established to study factors affecting pre-term birth and consequent morbidity in the two northernmost provinces of Finland, Oulu and Lapland.27 The NFBC1966 includes 12 231 births (12 058 alive) covering 96% of all eligible births in this region during January–December 1966. Data collections in 1997 (at age of 31 years) and 2012 (age 46 years) including clinical examination and fasting serum sampling were used in the present study. Metabolomics data were available from the 31-year (2962 women and 2749 men) and 46-year visits (3237 women and 2549 men).
Metabolomics and clinical biomarkers
A high-throughput NMR spectroscopy metabolomics platform was used to quantify >200 lipid and metabolite measures from serum samples collected during the visits.28 The platform applies a single experimental set-up, which allows simultaneous quantification of standard clinical lipids, 14 lipoprotein subclasses and individual lipids (triglycerides, phospholipids, free and esterified cholesterol) transported by these particles, multiple fatty acids, glucose and various glycolysis precursors, ketone bodies and amino acids in absolute concentration units. In addition, glucose, insulin, triglycerides, low-density-lipoprotein (LDL) cholesterol, high-density-lipoprotein (HDL) cholesterol, C-reactive protein and creatinine were assessed using standard clinical assays.
Sample quality control
To ensure the best possible estimates for age-associated changes in absolute concentrations, we employed a two-stage pre-processing protocol. First, we constructed multi-variate regression models of biomarkers that were available from both metabolomics and clinical assays (glucose, triglycerides, total cholesterol, HDL cholesterol and serum creatinine). Each of the five NMR measures was predicted from the combination of the five clinical biomarkers; our rationale was that the biological correlations between the biomarkers29,30 provide an additional source of consistency that would be broken by poor quality. The residuals from these models were then collected into a matrix of five columns (one column per model). The final quality score was defined as the first principal component of the residual matrix (Supplementary Figure S1, available as Supplementary data at IJE online). Hence samples with unusual and correlated residuals in multiple biomarkers were likely to get an extreme quality score. The cut-offs for acceptable deviation were set at the points where there was a >5% chance that the observed quality score was from the expected normal distribution. Consequently, 1765 (9.8%) samples out of a total of 16 177 available in the YFS and NFBC1966 were excluded.
Calibration between visits
The second pre-processing step was aimed at bias between visits. We exploited longitudinal biomarker data in YFS to calibrate consecutive visits. We assumed a priori that subpopulations of the same sex, average age and body mass have identical average metabolic profiles. This means that if subsets of individuals from two consecutive visits have identical average features, we expect the subset averages of the biochemical data to be identical as well. Similarly, we expect that controlling for body mass and waist circumference simultaneously controls metabolically relevant lifestyle exposures at the population level. We chose body mass for matching due to its strong associations with metabolic measures and ageing12,31 and easy and standardized method of measurement.
Given consecutive visits A and B in the YFS cohort, a subset of participants was selected from visit A and another mutually exclusive subset from visit B. The selection was optimized so that the subsets had matching age, sex (225 men and 225 women) and body mass (Supplementary Table S1, available as Supplementary data at IJE online). We then defined a scaling factor C = exp{mean[log(subset of B) − log(subset of A)]} for each metabolic variable that was not a ratio. It is plausible that batch effects manifest in the form of constant multipliers of concentrations since all the samples within a batch would have been collected, handled, stored and measured in the same way. Lastly, the scale factor was applied to all data from visit B to equalize the measurement scale (Supplementary Figure S2, available as Supplementary data at IJE online). The procedure was applied first to the 2001 and 2007 visits, and then to the 2007 and 2011 visits. The same procedure was not possible within the NFBC1966 due to the birth cohort design (i.e. no age spread). Instead, we calibrated the NFBC1966 data according to the YFS 2001 (NFBC1966 31-year visit) and YFS 2011 (NFBC1966 46-year visit).
Statistical analyses
Longitudinal associations were quantified based on pair-wise differences that were calculated separately for every individual between consecutive visits. Statistical significance was estimated using 10 000 permutation cycles and 95% confidence intervals (CI95) by 10 000 bootstrapping samples. The permutation analysis was summarized by the Z-score that indicates the distance between the observed difference and the simulated random differences in standard deviations of the simulated null distribution. Two-tailed P-values were calculated from the Z-scores by inverse standard normal distribution. These techniques were used throughout the study unless otherwise indicated.
To elucidate associations with bodyweight changes, we stratified men and non-pregnant women, respectively, into those who maintained their weight change within ±3% per decade and waist–hip ratio (WHR) change between −0.05 and +0.015 (Stable), those who lost weight beyond the stable definition (Loss) and those who gained weight (Gain). To mitigate confounding from the baseline status, we matched the weight gain and loss subsets according to baseline body mass index (BMI), WHR and clinical assays for insulin, glucose, C-reactive protein, triglycerides, and LDL and HDL cholesterol (Supplementary Table S2, available as Supplementary data at IJE online).
Multiple testing
Principal component analysis of the biochemical data revealed that the first 48 principal components explained 99% of the total variance when all data were pooled. These results were compatible with earlier work.32 For consistency, we set the multiple testing threshold at the more conservative P < 0.0006 to match the previous paper (equivalent to Bonferroni adjustment for 83 independent tests at a 5% type 1 error rate). All statistical analyses were conducted in the R environment (https://www.R-project.org).
Results
The age structure of the participants is illustrated in Figure 1A and the basic characteristics are listed in Supplementary Table S3 (available as Supplementary data at IJE online). The numbers of individuals included in different study settings after excluding participants with missing or low-quality samples are listed in Figure 1B. Women who were pregnant at the time of sample collection were excluded (N = 202 in NFBC1966, N = 107 in YFS). Mean BMI varied between 24.2 and 25.9 kg/m2 across the cohorts and visits. Smoking was more prevalent in the first visit (21.2% in NFBC1966 and 24.7% in YFS) compared with the last visit (17.9% in NFBC1966 and 14.8% in YFS, P = 5.8 × 10−14 for the combined difference between visits over both cohorts). The prevalence of diabetes was ≤3.0% across the visits.

Overview of data resources and quality control. (A) Longitudinal data sets included the Northern Finland Birth Cohort 1966 (NFBC1966) and the Cardiovascular Risk in Young Finns Study (YFS). (B) A two-stage protocol was applied to ensure the high quality of nuclear magnetic resonance (NMR) metabolomics data. First, metabolic measures that were available from both clinical assays and NMR quantification were compared to identify samples with discordant values (see also Figure 2). Second, a new calibration method was applied to reduce systematic confounding between consecutive visits in the YFS and NFBC1966 cohorts (see also Figure 3).
Sample quality
Biochemical sample quality was verified by comparing the clinical and NMR assays as described in the ‘Methods’ section. Briefly, we used measures that were available from both clinical and NMR assays to determine whether the aggregate discrepancy across multiple biomarker concentrations from the two sources was greater than could be expected by random measurement noise (Figure 2A and B). Importantly, the residuals from multiple biomarkers were analysed together to emphasize sample quality over isolated measurement errors in a single biomarker: if there is a consistent difference between NMR and clinical assays over multiple biomarkers, then it is likely that the biological material of the sample had been altered in some way rather than a technical error in a specific NMR quantity or biochemical assay.

Assessing the technical quality of serum samples based on comparisons between clinical and nuclear magnetic resonance (NMR) metabolomics measurements in the Northern Finland Birth Cohort 1966 (NFBC1966) and the Cardiovascular Risk in Young Finns Study (YFS). (A) A schematic illustration of the comparison between two measurements of the same metabolite. For high-quality samples, the two measurements are expected to agree (data points near the diagonal line). By measuring the residual distance to this line, it is possible to detect outlying samples. (B) To increase the robustness of the estimation, we extended the univariate comparison to multiple biomarkers simultaneously (details in the ‘Methods’ section). The multi-variate comparison produces a quality score that follows the standard normal distribution for samples that are consistent within the inherent accuracy of the assay methods. (C)–(G) Distributions of quality scores for each cohort visit (coloured area) and the corresponding standard normal model (solid line). The percentages indicate the proportion of samples that passed the quality control at a 5% error rate.
To quantify the aggregate discrepancy, we developed a quality score that follows the standard normal noise distribution when all samples are of high quality (indicated as solid curves in Figure 2C–G). Samples that were deemed too far from the normal model were excluded. Of note, the 2012 collection of NFBC1966 was designed specifically to accommodate NMR metabolomics and the observed quality score distribution was almost identical to the predicted curve (Figure 1G).
Calibration between visits
Preliminary comparisons between the two follow-up periods within YFS revealed systematic bias between time points that was extreme (>3 median absolute deviations) and biologically implausible (Figure 3C). For this reason, we applied a calibration procedure according to non-biochemical characteristics (age, sex, BMI, weight, height and WHR) to remove bias between visits from the biochemical data (Figure 3A, B, technical details in the ‘Methods’ section). We assumed that two subsets of people picked from the population who are identical in these characteristics (Supplementary Table S1, available as Supplementary data at IJE online) would also have the same average metabolic profiles. This makes it possible to calculate a scaling factor between visits and use it to remove batch effects (Figure 3B).

Calibration of metabolic measures between visits in the Northern Finland Birth Cohort 1966 (NFBC1966) and the Cardiovascular Risk in Young Finns Study (YFS). (A) A schematic illustration of age histograms between two visits for the same individuals. Factors that influence the metabolic data collected from the two visits include biological ageing, environmental exposures and technical biases related to sample storage, handling and evolving laboratory protocols. To exclude non-biological factors, we first identify subgroups from these visits that are as similar as possible regarding biology (i.e. same age and clinical characteristics). (B) The subgroups are then compared with respect to the metabolic variable of interest, depicted here as distributions of values. Due to the matching in Plot A, we can expect that any differences in these distributions are caused by non-biological factors. Consequently, by adjusting the scale of the data from Visit 2, we mitigate the technical biases and obtain more reliable statistics of the biological ageing trends. (C) Without calibration, the two follow-up periods of the YFS produce highly divergent longitudinal changes. (D) After calibration, the two periods were concordant and the clinical and metabolomics measurements were in agreement. Median absolute deviations (MAD) were calculated from the pool of all YFS and NFBC1966 data.
The metabolic changes in calibrated concentrations were coherent and biologically meaningful, including consistent increases in glucose, triglycerides, cholesterol and glycoprotein acetyls, and consistent adjustments between NMR and clinical assays (Figure 3C). Please note that the results in the next section represent the combined trends over both YFS periods. The impact of calibration on temporal trends calculated this way are included in Supplementary Figure S3 (available as Supplementary data at IJE online).
Age-associated changes
Temporal changes in selected metabolic measures are presented in Figure 4 and full statistics and confidence intervals for all measures are available in Supplementary Table S4 (available as Supplementary data at IJE online). To strengthen conclusions about longitudinal changes, we used robust pair-wise median statistics (details in the ‘Methods’ section) and developed further evaluation criteria to summarize the evidence obtained from four types of analyses. First, we required that age associations within the calibrated YFS data satisfied P < 0.0006. Second, we checked the concordance between calibrated and non-calibrated associations within the YFS. Third, we checked the concordance between calibrated and non-calibrated associations within the NFBC1966. Lastly, we checked the concordance between longitudinal change and cross-sectional age correlation within the YFS. These four criteria were incorporated visually into Figure 4A–M. All findings mentioned in the main text represent calibrated median rates of change per decade in the combined YFS and NFBC1966 data set and satisfy P < 0.0006 unless otherwise indicated.

Longitudinal population-wide changes in metabolic measures in the Northern Finland Birth Cohort 1966 (NFBC1966) and the Cardiovascular Risk in Young Finns Study (YFS). (A)–(M) Medians and 95% confidence intervals for the rate of change per decade in the YFS and NFBC1966 cohorts. Median absolute deviations (MAD) were calculated from the pool of all YFS and NFBC1966 data. Lipoprotein subclasses were characterized by size-specific (denoted by XS, S, M, L, XL, XXL and XXL) measures for total lipid content (L), cholesterol (C), triglycerides (TG) and phospholipids (PL) for very-low-density lipoproteins (VLDL), intermediate-density lipoproteins (IDL), low-density lipoproteins (LDL) and high-density lipoproteins (HDL). Fatty acid (FA) measures included polyunsaturated (PUFA), monounsaturated (MUFA) and docosahexaenoic acid (DHA).
The majority of the YFS participants (71%) and NFBC1966 participants (83%) gained weight during the follow-up period and the combined median rate was +4.0 kg (+5.9%) per decade. Accordingly, both BMI (+1.35 kg/m2) and WHR (+0.048) increased. Simultaneously, clinical biomarkers deteriorated with increases in blood pressure (systolic +5 mmHg and diastolic +5 mmHg), glucose (+0.23 mmol/L, +4.6%), triglycerides (+0.16 mmol/L, +17%) and LDL cholesterol (+0.35 mmol/L, +12%).
The general pattern of increasing circulating lipids was reflected in NMR-based measures. These included lipoprotein measures related to apolipoprotein B such as medium LDL lipids (+0.086 mmol/L, +11%, Figure 4B and D), lipoprotein triglycerides such as very-low-density-lipoprotein triglycerides (+0.12 mmol/L, +23%, Figure 4F) and fatty acids such as omega-3 (+0.059 mmol/L, +11%, Figure 4H). The lipids in the large HDL subclass decreased (−0.055 mmol/L, −7.9%, Figure 4B), although the association was not consistent across the evaluation criteria. We observed increases in most of the amino acids (Figure 4J) including alanine (+0.0254 mmol/L, +6.3%), glutamine (+0.0074 mmol/L, +1.3%), glycine (+0.0103 mmol/L, +3.5%) and tyrosine (+0.0041 mmol/L, +7.8%).
Metabolic changes stratified by weight change
To isolate associations with weight change, we stratified the study participants into those who gained or lost weight, or who were stable (definitions in the ‘Methods’ section). The weight loss and gain subgroups were matched at baseline (Supplementary Table S2, available as Supplementary data at IJE online). Body weight decreased −2.5 kg (−3.5%) in the former whereas it increased +6.0 kg (+8.2%) per decade in the latter. Both subgroups showed an increase in the waist–hip ratio, but weight gain was associated with a faster increase (+0.031 vs +0.058 per decade, Figure 5A). Insulin decreased with weight loss but increased with weight gain (−10% vs +24%, Figure 5K) and the pattern was similar for C-reactive protein (−22% vs +25%, Figure 5L). Triglycerides, cholesterol, other lipids and most amino acids showed similar separations in the rate of change, albeit small increases rather than decreases were observed in the weight loss subgroup. The opposite pattern was observed for large HDL lipids (Figure 5B) and HDL cholesterol (Figure 5E).

Longitudinal weight-associated changes in metabolic measures in the Northern Finland Birth Cohort 1966 (NFBC1966) and the Cardiovascular Risk in Young Finns Study (YFS). The combined YFS and NFBC1966 data were divided into subgroups that gained or lost weight. Results for the subgroup with stable weight are available in Supplementary Figure S4 (available as Supplementary data at IJE online). Median rates of change and 95% confidence intervals are illustrated. Median absolute deviations (MAD) were calculated from the pool of all YFS and NFBC1966 data. Lipoprotein subclasses were characterized by size-specific (denoted by XS, S, M, L, XL, XXL and XXL) measures for total lipid content (L), cholesterol (C), triglycerides (TG) and phospholipids for very-low-density lipoproteins (VLDL), intermediate-density lipoproteins (IDL), low-density lipoproteins (LDL) and high-density lipoproteins (HDL). Fatty acid (FA) measures included polyunsaturated (PUFA), monounsaturated (MUFA) and docosahexaenoic acid (DHA).
In the stable subgroup (Supplementary Figure S4, available as Supplementary data at IJE online), clinical measures for total cholesterol (+0.35 mmol/L, +7.2% per decade), HDL cholesterol (+0.052 mmol/L, +4.5%), glucose (+0.20 mmol/L, +4.0%) and blood pressure increased (systolic +4.0 mmHg and diastolic +3.5 mmHg). The NMR data revealed increases in the two smallest very-low-density-lipoprotein (VLDL), intermediate-density-lipoprotein (IDL) and all LDL subclass lipids and most fatty acids. Alanine increased (+3.1%), whereas histidine (−3.3%) and creatinine (−2.9 µmol/L, −4.1%) decreased.
Differences between sexes
To compare men and women, we visualized proportional changes with respect to the baseline (Figure 6A–L). Women gained more weight (+6.5% per decade vs +5.0% in men), but experienced milder changes in multiple clinical biomarkers including insulin (+10.1% vs +17.7%, Figure 6J), triglycerides (+14.9% vs +20.1%, Figure 6F) and C-reactive protein (+5.1% vs +21.1%, Figure 6K). Substantial divergence was observed in the absolute levels of circulating lipoprotein lipids (Supplementary Figure S5, available as Supplementary data at IJE online), although most of these differences disappeared when normalized by baseline concentrations (Figure 6B and E–H). Diverging trends were observed in multiple amino acids (Figure 6I): glutamine (+1.9% vs +0.8%) and glycine (+5.2% vs +1.9%) increased more in women, whereas leucine (+0.6% vs +3.4%) and phenylalanine (+0.6% vs +3.5%) increased in men.

Associations between sex and rate of metabolite change in the Northern Finland Birth Cohort 1966 (NFBC1966) and the Cardiovascular Risk in Young Finns Study (YFS). (A)–(L) Medians and 95% confidence intervals for fold changes per decade with respect to baseline values for men and women, calculated from the combined YFS and NFBC1966 data. (M)–(W) To investigate sex–weight interactions, logistic regression models were fitted for each metabolic measure. Median absolute deviations (MAD) were calculated from the pool of all YFS and NFBC1966 data. Lipoprotein subclasses were characterized by size-specific (denoted by XS, S, M, L, XL, XXL and XXL) measures for total lipid content (L), cholesterol (C), triglycerides (TG) and phospholipids (PL) for very-low-density lipoproteins (VLDL), intermediate-density lipoproteins (IDL), low-density lipoproteins (LDL) and high-density lipoproteins (HDL). Fatty acid (FA) measures included polyunsaturated (PUFA), monounsaturated (MUFA) and docosahexaenoic acid (DHA).
Interactions between sex and weight change
We used multi-variate linear regression to investigate interactions between sex and weight change (Supplementary Figure S6, available as Supplementary data at IJE online). For most metabolic measures, the baseline value (due to regression towards the mean) was the strongest regressor over baseline body mass, change in weight or sex (Supplementary Table S5, available as Supplementary data at IJE online). Interaction effects were observed for 35 measures and the regression coefficients revealed distinct patterns of associations. Changes in insulin were explained by weight change rather than sex (Figure 6V). Changes in VLDL and HDL lipids were mainly explained by weight change but with substantial sex effects (Figure 6M). Changes in isoleucine and leucine were associated with both weight change and sex (Figure 6U). Finally, changes in creatinine were mainly associated with sex (Figure 6W).
Discussion
In this study, we investigated ageing trends of metabolic measures in two large longitudinal cohorts of young and middle-aged adults. We developed new statistical techniques that allowed us to investigate how circulating biomarkers changed over time and we report previously unavailable quantitative trends for >200 measures, including lipoprotein subclass lipids, branched-chain amino acids and cardiometabolic biomarkers. Furthermore, we isolated associations between metabolic changes and body mass in a carefully constructed design in which weight loss and weight gain subgroups were matched at baseline. Our analyses revealed that weight change explains much of the metabolic variation especially regarding insulin, VLDL and HDL particles, amino acids and C-reactive protein. On the other hand, other indicators such as waist–hip ratio, blood pressure, LDL cholesterol and creatinine may represent an underlying stable trend that affects all individuals.
Confounding in longitudinal population data
Longitudinal designs are superior to cross-sectional ageing studies7,9–13,24 but long gaps in sample collection (i.e. batch effects) can lead to severe confounding as we observed in the YFS. Batch correction is typically focused on non-biological variation that results from instrument drift and other technical variation within workflows, and sample stability that depends on the collection, handling, storage and human operators.25,33 These effects can be minimized by rigorous workflows (e.g. the automated NMR pipeline has excellent reproducibility28) by including standardized control aliquots at regular intervals within a measurement series, by adding known concentrations of stable reference molecules or by applying statistical adjustment techniques.33
Batch effects from study visits separated by long time gaps cannot be corrected by any of the previously mentioned statistical techniques if there are only two or three time points available. Furthermore, if the trait of interest (change in metabolite concentration between two measurements) and the source of the batch effect (differences between the two measurements due to non-biological factors) are perfectly correlated—which they are in this study—removing batch effects in the usual way will also remove all biological trends from the concentration values.
New methodology for longitudinal studies
To address the challenge of batch effects that are correlated with the trait of interest, we developed an approach that first removes as much of the biological difference between the visits as possible (matched subsets in Figure 3A). This step allows conventional batch correction to be applied as if the traits and batches were uncorrelated (Figure 3B). The fundamental idea is widely applicable to studies that suffer from batch effects that are correlated with clinical outcomes.
Ageing trajectories and weight change
We estimated temporal rates for >200 metabolic measures in large longitudinal population-based cohorts, which provides new understanding on the direction and magnitude of age-associated metabolic changes and how some of these changes may be driven by weight change. Reliable literature on longitudinal metabolic trajectories is scarce and direct comparisons are difficult, but the observed changes in insulin, glucose, various lipid measures and most of the amino acids are compatible with the previously reported patterns in young and middle-aged individuals.9,10,12,34 We also observed consistent patterns between NMR-based and clinical assays. The combination of sample quality control, calibration and robust pair-wise median-based statistics that we developed proved to be a reliable framework to determine temporal trends in physical measurement units in omics studies that involve long time gaps.
When we controlled for weight change, waist–hip ratio and creatinine emerged as covariates of ageing (possibly pointing to loss of muscle vs adipose35). We also observed consistent increases in cholesterol and blood pressure that are two classical cardiovascular risk factors. These shifts can be explained by modifiable life circumstances such as reduced physical activity36,37 but they also fit with the genetic programme of ageing and the molecular clock concept as observed in laboratory settings.38
Strengths and weaknesses
Two independent cohorts of matching ethnicity, socio-economics and time period provide us with robust data and high statistical power but these strengths also mean that the results may not generalize outside the Northern European context. Furthermore, the results apply to adults under the age of 50 years and further studies are needed to establish explicit links to late-life phenomena or how diet, exercise and genetics may influence the ageing trajectories of metabolite concentrations.13,36,39
The combination of clinical and NMR assays gives us confidence that the data are coherent and of high technical quality across decades. We are also confident that the quality-control method that we introduced can detect non-informative samples accurately, as was demonstrated by the notable difference between the 2012 visit of NFBC1966 (short storage period, sample collection was designed to support metabolomics) and the other collections that were not specifically optimized for metabolomics studies (Figure 2). We used a two-layer approach to assess sample quality (Supplementary Figure S1, available as Supplementary data at IJE online) since it gave us additional information on whether an outlier was due to a single extreme measurement in a single biomarker, or whether all the biomarkers were systematically off by a small amount. This qualitative information may not be always necessary and direct calculation of multi-variate metrics such as the Mahalanobis distance may be a more practical choice.40
There is less clear evidence on how well the calibration technique works, although the consistency between statistical tests in Figure 4 is promising. Our matching procedure is likely to be highly effective against technical cohort effects. On the other hand, matching by body mass may miss some period effects such as diet trends and changing environmental exposures, although studies that have these data available can use them for matching, which makes the calibration concept broadly applicable for epidemiological research. We have to wait for further studies with more numerous time points to fully assess the accuracy of the reported ageing trends and until such data become available, we urge caution when interpreting population-wide temporal shifts in absolute measurement units due to the inherent biases that come with samples separated by decades.
Conclusions
We solved technical and analytical challenges regarding longitudinal omics studies and the new techniques allowed new ways to study temporal trajectories in large-scale human populations. We found that weight gain drives most of the changes in systemic metabolism and the good news is that these changes are likely to be reversible by lifestyle adjustments as opposite trends were observed in the weight loss subgroup. This study also provides new reference information on how absolute concentrations of these and other indicators change over time in humans and what magnitude of changes are typical and achievable in real-world populations.
Ethics approval
The NFBC1966 was approved by the Northern Ostrobothnia Hospital District, Finland. The YFS was approved by the five universities with medical schools in Finland that were involved in the study (Turku, Helsinki, Tampere, Kuopio and Oulu). All participants gave written informed consent.
Data availability
The data sets used in the current study are available from the cohorts through an application process for researchers who meet the criteria for access to confidential data (https://www.oulu.fi/nfbc/ and http://youngfinnsstudy.utu.fi). Regarding the YFS data, the ethics committee has concluded that under applicable law, the data from this study cannot be stored in public repositories or otherwise made publicly available. The data controller may permit access on a case-by-case basis for scientific research, although not to individual participant-level data, but to aggregated statistical data, which cannot be traced back to the individual participants' data.
Supplementary data
Supplementary data are available at IJE online.
Author contributions
V.P.M. and M.A.K. conceived of and designed the study. V.P.M. conducted the statistical analyses and wrote the first draft. J.K., T.L., M.K., J.V., M.P., V.S., M.R.J., O.T.R. and M.A.K. collected samples and produced the data. All authors reviewed the text and participated in the editing process.
Funding
This work was supported by the Academy of Finland, Novo Nordisk foundation, Oulu Health and Wellfare Center, Social Insurance Institution of Finland; Competitive State Research Financing of the Expert Responsibility area of Kuopio, ERDF European Regional Development Fund, EU Horizon 2020, EU Research Council and following foundations: Sigrid Juselius, Finnish Cardiovascular Research, Juho Vainio, Paavo Nurmi, Finnish Cultural, Tampere Tuberculosis, Emil Aaltonen, Yrjö Jahnsson, Signe and Arne Gyllenberg, and Finnish Diabetes Research. The Young Finns Study has been financially supported by the Academy of Finland grants 322098, 286284, 134309 (Eye), 126925, 121584, 124282, 129378 (Salve), 117787 (Gendi) and 41071 (Skidi); the Social Insurance Institution of Finland; Competitive State Research Financing of the Expert Responsibility area of Kuopio, Tampere and Turku University Hospitals (grant X51001); Juho Vainio Foundation; Paavo Nurmi Foundation; Finnish Foundation for Cardiovascular Research; Finnish Cultural Foundation; The Sigrid Juselius Foundation; Tampere Tuberculosis Foundation; Emil Aaltonen Foundation; Yrjö Jahnsson Foundation; Signe and Ane Gyllenberg Foundation; Diabetes Research Foundation of Finnish Diabetes Association. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements nos 848146 for To Aition and grant agreement 755320 for TAXINOMISIS; European Research Council (grant 742927 for MULTIEPIGEN project); Tampere University Hospital Supporting Foundation and Finnish Society of Clinical Chemistry.
Acknowledgements
M.A.K. thanks Deborah Lawlor (University of Bristol, UK) for initial critical discussions on the caveats of cross-sectional epidemiological data. All authors acknowledge the continued contribution from the cohort participants and healthcare professionals involved in the resource collection.
Conflict of interest
V.S. has consulted for Novo Nordisk and Sanofi, and received modest honoraria from these companies. He also has ongoing research collaboration with Bayer Ltd (all unrelated to the present study). All other authors declare no competing interests.