Methods for Estimating Personal Disease Risk and Phylogenetic Diversity of Hematopoietic Stem Cells

Abstract An individual's chronological age does not always correspond to the health of different tissues in their body, especially in cases of disease. Therefore, estimating and contrasting the physiological age of tissues with an individual's chronological age may be a useful tool to diagnose disease and its progression. In this study, we present novel metrics to quantify the loss of phylogenetic diversity in hematopoietic stem cells (HSCs), which are precursors to most blood cell types and are associated with many blood-related diseases. These metrics showed an excellent correspondence with an age-related increase in blood cancer incidence, enabling a model to estimate the phylogeny-derived age (phyloAge) of HSCs present in an individual. The HSC phyloAge was generally older than the chronological age of patients suffering from myeloproliferative neoplasms (MPNs). We present a model that relates excess HSC aging with increased MPN risk. It predicted an over 200 times greater risk based on the HSC phylogenies of the youngest MPN patients analyzed. Our new metrics are designed to be robust to sampling biases and do not rely on prior knowledge of driver mutations or physiological assessments. Consequently, they complement conventional biomarker-based methods to estimate physiological age and disease risk.


Introduction
Somatic aging is characterized by an unrelenting accumulation of genetic variants that become a mutational burden with potentially significant health consequences, particularly in tissues with high cellular turnover (Fancello et al. 2019;Sha et al. 2020).Hematopoietic stem cells (HSCs) bearing newly evolved variants can increase in numbers due to proliferative and/or survival advantage, leading to clonal expansions.Such clonally expanded HSCs can result in clonal hematopoiesis of indeterminate potential (CHIP) in circulating blood cells.CHIP increases with age, paralleling the increased risk of hematological neoplasms and cardiovascular disease (Bick et al. 2020;Nachun et al. 2021;Younes et al. 2023).This pattern is often associated with driver mutations in some genetic loci, such as DNMT3A and TET2 (Buscarlet et al. 2017;Bailey et al. 2018).
Many studies identify CHIPs by the presence of such driver mutations, with their reported prevalence rates ranging from ∼1% in young people (<40 yr old) to >15% in those over 65 (Groarke and Young 2019;Bick et al. 2020;Fabre et al. 2022).These drivers emerge from a background of somatic mosaicism that develops continuously over time (Groarke and Young 2019;Bick et al. 2020;Fabre et al. 2022).However, few studies have focused on using putative passenger (nondriver) variation to assess disease risk.Advancements in single-cell sequencing of HSC genomes now provide base-resolution profiles of all genetic changes in somatic cells (Lee-Six and Kent 2020;Van Egeren et al. 2021;Fabre et al. 2022;Mitchell et al. 2022), offering new avenues for developing quantitative models to complement the analysis of driver mutations.
In this study, we measured the accumulation of single nucleotide alterations (SNAs) in somatic genomes of HSCs and assessed temporal changes in HSC phylogenetic diversity with age.We present novel approaches to measure the decay of phylogenetic diversity of HSC genomes generated from single-cell sequencing data.We explored the relationships between the number of SNAs and the decay of phylogenetic diversity with the age-related incidence of cancer in populations (Fig. 1).Based on the observed patterns, we developed a model to estimate HSC phylogeny-derived age (phyloAge) to predict the Mol.Biol.Evol.41(1):msad279 https://doi.org/10.1093/molbev/msad279Advance Access publication December 20, 2023 1 increased risk of disease.We applied these measures to HSC phylogenies of individuals suffering from myeloproliferative neoplasms (MPNs) to estimate increased cancer risk independent of commonly used panels of driver mutations or other biomarkers.

Accumulation of Genetic Variation in HSCs in the Healthy
We utilized the most comprehensive single-cell sequencing data publicly available for HSCs (Mitchell et al. 2022).This consists of whole-genome sequences of DNA obtained from 3,579 colonies of cells derived from single immunophenotypic HSCs (Lin−CD34+CD38−CD45RA−) that were sorted using flow cytometry and then cultured to produce single-cell-derived hematopoietic colonies from 10 healthy individuals spanning the human lifespan.
We calculated the number of genetic differences (GDs) between HSC sequences for each individual.We divided these by 2 to generate per-lineage estimates for each pair of HSCs.The distribution of these GDs was unimodal in infants, consistent with the rapid initial HSC expansion during embryogenesis, which results in limited HSC lineage divergence and thus similar GDs (Fig. 2a).With increasing age, the GDs increase due to the steady accumulation of new variants, resulting in longer GDs in 38-yr-olds and 63-yr-olds (Fig. 2b and c).These adult distributions are characterized by long tails of shorter GDs, representing the gradual accumulation of more recently diverged subclonal HSCs that are more closely related to one another than to any of the founder HSCs.In an 81-yr-old individual, these developed into a secondary peak reflecting the increasing prevalence of subclonal HSCs (Fig. 2d), consistent with CHIPs that arise after the establishment of the initial population of HSC clones during embryogenesis and become more prominent with age (Groarke and Young 2019;Ayachi et al. 2020;Fabre et al. 2022).

The Tempo of Sequence Variation Accumulation in HSCs
Figure 3a shows the relationship between the central values of primary peaks in GD distributions and the ages of  MBE 7 healthy adults.The relationship is nearly linear, but the null hypothesis of a linear fit was rejected at P < 0.01 when compared with a second-order polynomial fit.The curvilinear relationship was also found for SNAs, which were detected directly by comparing HSC genomes with the germline genome (P < 0.01; Fig. 3b).These findings differ from previous reports of a constant molecular clock for HSC evolution (Osorio et al. 2018;Brown et al. 2019;Dietlein et al. 2020;Mitchell et al. 2022;Williams et al. 2022).Importantly, the curvilinear relationship between age and SNA counts and GDs does not explain the exponential increase in the age-related incidence of blood cancers (Fig. 3c).

Decay of Phylogenetic Diversity with Age
To seek an explanation (or at least a better correlation) for the exponential age-related increase in cancer risk, we explored phylogenies reconstructed using SNAs in HSC genomes as an alternative method for characterizing patterns of age-related change in HSC.An individual's total number of HSC lineages has been shown to be established during embryogenesis and remains relatively stable throughout life (Lee-Six et al. 2018;Jaiswal and Ebert 2019;Ayachi et al. 2020;Mitchell et al. 2022).As expected then, the HSC phylogeny of a 38-yr individual exhibits extensive polyclonality, visible as many early-branching lineages, with only rare instances of evolutionary bifurcations (Fig. 4a).In contrast, the HSC phylogeny of an elderly individual (81 yr old) has many expanding lineages (CHIPs) with many descendant subclones (Fig. 4b).In this case, subclonal HSCs comprise over one-third of all HSCs, which means that subclonal HSCs that originated after birth appear to replace ancestral HSC lineages.
Since the ancestral HSCs diversified early and evolved independently of others, they represent a greater phylogenetic diversity than the more recently diverged subclonal lineages that share a genetic history with closely related subclonal HSCs.Therefore, the displacement of ancestral HSC lineages by subclonal HSCs represents a decay in phylogenetic diversity with age (Mitchell et al. 2021).
The lineages-through-time (LTT) plots reveal these phenomenological trends in biodiversity loss based on the temporal distribution of branch points in the phylogenetic trees.In the LTT plot of the 38-yr-old (Fig. 4c), the total number of HSC lineages is established early and remains similar over time.In contrast, the LTT plot of the elderly individual shows an additional phase of HSC diversification and the loss of phylogenetic diversity (Fig. 4d).

New Phylogenetic Measures of Biodiversity Decay
We explored the use of 2 biodiversity metrics to quantify the decline in phylogenetic biodiversity with age in singlecell HSC phylogenies: HSC phylogeny imbalance (π) and the number of HSC lineages (n).Phylogeny imbalance (Colless 1982) is the sum of absolute differences in the sizes of the descendant clades for every internal node in the tree.Computationally, the absolute size difference between the 2 descendant clades of a node is calculated for every node, and these values are summed for all the internal nodes in the phylogeny to obtain the imbalance metric.Due to the CHIP events, the imbalance grows in the HSC phylogeny with age.However, π and n depend on the number of HSCs sampled, as shown in Fig. 5a and c, respectively.In these analyses, we randomly removed subsets of HSC lineages from the HSC phylogeny of a 38-yr-old healthy individual (KX002) at 10% intervals, resulting in a set of 10 phylogenies ranging from 100% to 10% of the total lineages sampled.The probability of a given lineage appearing in a phylogeny is called "sampling fraction," which may be treated as a measure of data richness in the HSC phylogeny.We then calculated phylogenetic imbalance and lineage counts for each.The imbalance of HSC phylogenies (π) scaled linearly with data richness (R 2 = 0.98; Fig. 5a) as did the number of tips (n) present in the HSC phylogeny (R 2 = 1; Fig. 5c).
So, we developed 2 novel metrics (α and β) to quantify the decline in phylogenetic biodiversity with age in singlecell HSC phylogenies.These metrics are designed to be robust to the number of HSCs sampled.The first metric (α) measures the standardized change in π over an individual's (1) The contemporary value of π is estimated using the whole HSC phylogeny.The ancestral imbalance (π a ) is obtained by cropping the phylogeny at a time point equivalent to the inflection point in its LTT plot, where the initial HSC lineage diversification ends at or near birth (see Materials and Methods).The α metric has a minimum value of 0 at birth because π and π a will be equal.A base-2 logarithm is used because phylogenetic branching is typically a doubling process.HSC phylogenies' α estimates were not significantly correlated with the number of HSCs sampled (R 2 = 0.16; P > 0.05), unlike the estimate of π (compare Fig. 5a and b).
The second metric (β), based on the number of HSC clonal lineages (n), is also intrinsically normalized.It is the logarithm of the ratio of observed lineages (n) and inferred ancestral lineages that retained (n a ): The β metric is 0 for young, healthy individuals who have not lost any ancestral HSC lineages.However, it increases over time as expanding subclones displace the ancestral HSCs, reducing overall diversity.We find no significant relationship between β and data richness (R 2 = 0.16; P > 0.05 and R 2 = 0.16; P > 0.05; Fig. 5c and d).
Relationship of α and β with Age-Related Increase in Blood Cancer We estimated α and β metrics using HSC phylogenies of 8 healthy adults.Both exhibit an exponential relationship with age (Fig. 6).The diversity decay increases exponentially, remaining small until about 65 yr and increasing rapidly thereafter.These patterns were highly concordant with that observed for the cancer risk, with the correlation coefficients between the MPN risk and the biodiversity metrics of R 2 = 0.89 (P < 0.01) for α and R 2 = 0.87 for β (P < 0.01).

Estimating PhyloAge of HSC Phylogenies
The relationships of α and β estimates with the age of healthy adults and their cancer risk prompted us to develop a model to estimate what we call HSC phyloAge.In this case, both α and β were used as phylogenetic markers analogous to epigenetic markers of age (Hannum et al. 2013;Horvath 2013;Simpson and Chandra 2021).We optimized the values of parameters using a numerical Gauss-Newton algorithm, aiming to minimize the sum of squared differences between the observed and the predicted age (see Materials and Methods).
In leave-one-out (LOO) analyses, the phyloAge model predictions achieved a correlation coefficient of 0.82 with the chronological age.The absolute difference between phyloAge and chronological age, referred to as the residualAge, was 6.6 yr on average (Fig. 7).The predictions for younger individuals, where α and β do not change much over time, had higher residualAge (3.8 to 15.0 yr).The residualAge was <3.7 yr for individuals 65 yr and Only one-third of the tips are direct descendants of the embryonic HSCs.c) The LTT plot from the 38-yr-old exhibits an initial rapid diversification in HSCs, followed by a period of growth, with minimal increase in total lineages thereafter.d) The LTT plot from an 81-yr-old healthy individual exhibits an initial rapid diversification in HSCs, a period of growth, and then a second period of increase later in life.
older.We also explored the use of Shannon's biodiversity index (Whittaker 1972;Mitchell et al. 2022) but found that its inclusion in the phyloAge model did not improve the accuracy of predictions.
The performance of the phyloAge model was comparable with that reported for epigenetic clocks in some recent studies.For example, GrimAge2 was trained using thousands of methylation profiles to predict chronological age and showed a correlation of 0.78 to 0.95 between true and predicted ages (Lu et al. 2019(Lu et al. , 2022)), whereas the correlation was 0.94 for phyloAges.DeepMAge, a deep learning method trained on 4,930 methylation profiles (Galkin et al. 2021), was reported to predict age with a median error of 2.8 yr, which is better than phyloAges for young people but similar for older people.These results suggest that phyloAge could yield results comparable with existing methods, with the possibility of improvement when singlecell HSC-sequencing data become available for additional healthy adults.

Estimates of PhyloAges for the HSC Phylogenies of MPN Patients
Next, we applied our metrics and methods to the HSC phylogenies of 12 MPN patients aged 23 to 83 yr (Williams et al. 2022).These data were acquired similarly to those of healthy individuals (Mitchell et al. 2022).The MPN phyloAge estimates were consistently older than the chronological ages (Fig. 8a).For example, a 49-yr-old MPN patient received a phyloAge of 94.4 yr, indicating that their HSC biodiversity had decayed by an additional 45.4 yr beyond their chronological age.This is evident upon comparing their HSC phylogeny (Fig. 8b) with that of the oldest healthy person (81 yr old; Fig. 4b).In fact, the MPN patient has lost all but 3 ancestral HSC lineages, which represents a greater loss in genetic diversity and a greater change in phylogeny shape than that of a healthy person almost 4 times their age (Fig. 4b).In general, young MPN patients' LTTs, and even cellular phylogenies, resemble those of much older healthy individuals.That is, MPN is associated with HSC diversity loss earlier in Estimating Personal Disease Risk and Phylogenetic Diversity of HSCs • https://doi.org/10.1093/molbev/msad279MBE life due to the dominance of subclonal HSC lineages.Overall, residualAges discriminated between the healthy adults and MPN patients until the age of 80, with the residualAges > 33 for MPN patients <80 yr old (Fig. 8a).
We sought to compare the performance of residualAge based on the phyloAge model for MPN patients with those reported for biological ages derived from methylation biomarkers, which have been widely used since their introduction (Horvath 2013).One study that used DNA methylation markers to estimate physiological ages in MPN patients specifically found an average residualAge of just 0.5 yr.However, the residualAge estimates based on DNA methylation vary extensively among cancer types.Weidner et al. (2014) reported small differences for aplastic anemia (average 11.7 yr) and dyskeratosis congenita (average 16.5 yr), whereas Zhu et al. (2019) reported much larger differences for many cancers (up to 50 yr).DeepMAge predicted an average residualAge of 1.7 yr for ovarian cancer.Therefore, residualAge differences can be large for some types of cancers, including MPN examined in this study.
Unlike residualAges of MPN phylogenies, we did not see a significant difference between the counts of cancerassociated driver variants between the healthy individuals and those with MPN, as the average number of drivers in healthy individuals and MPN patients were quite similar (4.5 and 3.9 per individual, respectively) and not significantly different (P > 0.75).Therefore, while drivers are frequently implicated in causing CHIPs (Brown et al. 2019;Dietlein et al. 2020), their counts do not discriminate between healthy and diseased individuals.However, we found SBS9 mutational signatures in CHIP lineages of elderly MPN patients only (ages > 80 yr), which showed limited discrimination when using phyloAges.SBS9 is a key cancer-associated mutational signature characterized by T > G mutations and is induced by somatic hypermutation, which is often reported in the lymphoid samples and myeloid cancer cells (Alexandrov et al. 2020;Degasperi et al. 2022).SBS9 was not present in any of the healthy individuals.Therefore, SBS9 may be useful as an additional biomarker for detecting the emergence of MPN in conjunction with phyloAge when the residualAge is small.But, more data are needed to test these suggestions.

Assessing Increased Cancer Risk Using PhyloAge
We found the residualAge, the difference between the HSC phyloAge and chronological age, to be naturally related to the fold increase in cancer risk due to excess aging, because  (3) This equation shows that the increased risk is a function of the difference between phyloAge and chronological age.It translates premature HSC aging into an estimate of increased cancer risk.Using this framework, it should be possible to develop similar equations for other types of cancers and diseases that are influenced by changes in the HSCs.
Applying the above equation to the HSC phylogeny in MPN patients predicts that a 23-yr-old person, whose phyloAge is 64.76, will have a 34-fold increase in their probability of developing MPN.Similarly, individuals aged 40 to 65 exhibited a 27-fold increase in risk, while those aged 65 to 81 showed a more than 2-fold increase in risk.Based on these results, phyloAge shows promise as a tool for forecasting MPN risk based solely on the phenomenological metrics of phylogenetic diversity decay.

Conclusion
HSCs give rise to myeloid cells that differentiate into red blood cells, platelets, neutrophils, basophils, monocytes, eosinophils, and lymphoid progenitors that include T and B lymphocytes and natural killer cells (Ogawa 1993;Mikkola and Orkin 2006).CHIP appears to arise from the acquisition of somatic mutations in HSC genomes that permit and/or drive clonal expansion over time to produce an age-dependent increase in blood and immune cell mosaicism (Nachun et al. 2021;Ahmad et al. 2023;Goldman et al. 2023;Singh and Singh 2023).Consequently, CHIP is associated with an increased risk of hematological malignancies and cardiovascular and pulmonary diseases (Jaiswal et al. 2017;Wong et al. 2023).
We have presented 2 novel phylogeny shape metrics (α and β), based on traditional ecological biodiversity measures, to capture the decay of diversity of HSC phylogenies reconstructed using SNAs.These measures increase exponentially with age and are concordant with the age-related increases in cancer incidence, fulfilling the need for quantitative phenomenological descriptions of the timedependent structure of HSC phylogenies.We have also presented a method to estimate HSC phyloAge using α and β that significantly exceeds the chronological age of HSCs in MPN patients (average 59%, up to 182% in extreme cases).
This age difference was found to distinguish between healthy and diseased individuals more effectively than other quantitative descriptions utilizing DNA methylation (Hannum et al. 2013;Horvath 2013;Weidner et al. 2014;Bell et al. 2019;Lu et al. 2019Lu et al. , 2022;;Galkin et al. 2021;Seale et al. 2022;Dabrowski et al. 2023).HSC phyloAge was more effective at predicting physiological ages and discerning between healthy people and those with blood cancer than the GD or total number of SNAs accumulated.Further, while few of the somatic SNAs an individual acquires in their lifetime are expected to be cancer-associated driver mutations (Bailey et al. 2018;Brown et al. 2019;Nussinov et al. 2019), even restricting our analysis to these drivers was not sufficient to discern between healthy people and those with cancer, as the counts of drivers in HSC genomes of individuals suffering from MPN were not sufficiently different from those in healthy individuals (Mitchell et al. 2022;Williams et al. 2022).Interestingly, the difference between phyloAge and chronological age forecasts a fold increase in cancer risk.This could find application in clinical and research settings to track the temporal rate of As the field of biological aging research continues to evolve, we anticipate the development of new and more useful approaches, such as chronic sterile inflammation (Franceschi et al. 2018), glycomics (Borelli 2014;Krištić et al. 2014), and lipidomics (Beyene et al. 2020;Slade et al. 2021).These tools are already showing a promising ability to identify many serious diseases (Horvath and Ritz 2015;Ding and Rexrode 2020;Miyoshi et al. 2020;Wang et al. 2020;Gaunitz et al. 2021).There are even many new tools being developed to give consumers access to their blood health data (Csordas et al. 2022), though these have not yet been able to bridge the gap between physiological age and disease risk.
Regarding phyloAge, we anticipate that advancements in single-cell sequencing technology and the availability of richer cancer incidence data with age will enable more accurate HSC phylogeny-based evolutionary modeling for age and risk assessment.With an increase in singlecell sequencing data from younger healthy individuals, where changes in genomic diversity are more subtle, we expect to understand the natural trajectory of HSC evolution better.These analyses and predictive models will also characterize aging and blood cancers.

Data Acquisition
HSC sequences and phylogenies for 10 healthy people (age range infant to 81) were retrieved from Mitchell et al. (2022) and for 15 individuals with MPN (age range 20 to 83) from Williams et al. (2022).An alignment was not available for one healthy individual (KX007), but the phylogeny was so it was used.HSC sequences were composed of genomic positions with SNAs.For MPN patients, the first sampling event for each individual was considered in order to eliminate bias stemming from cancer treatments.Two healthy infants were excluded from rate estimation and model fitting because they were still experiencing rapid HSC diversification among founder lineages and thus unrepresentative of the stable adult phase of HSC growth that we aimed to model.The incidence of MPN by age was obtained from Hultcrantz et al. (2020).

Genetic Distance Estimation
We produced counts of sequence differences using MEGA to calculate pairwise GDs between HSC sequences (Tamura et al. 2021).In all these comparisons, positions with missing data were ignored (pairwise deletion option).This number was divided by 2 to generate per-sequence GD estimates.To determine the peak of the GD distribution, we generated a histogram with 300 GD bins and then identified the bin containing the highest point in the resulting GD distribution.We also counted the number of SNAs in an HSC sequence by comparing it with the respective germline genomes.

Constant Versus Relaxed HSC Molecular Clocks
We used a likelihood ratio test (Wilks 1938;Glover and Dixon 2004) to compare fits between linear and polynomial regression models for GDs and SNAs accumulated in the HSCs of 7 adults.The polynomial models fit the data significantly better (P < 0.01) with the following models obtained for GDs and SNAs:  (Colless 1982), we used apTreeshape (Bortolussi et al. 2006) in R (R Core Development Team 2020).We plotted a LTT plot for each phylogeny using the R package ape (Paradis and Schliep 2019) and identified the point of inflection where lineage diversification ceased, corresponding to the end of the embryonic phase of HSC diversification (Lee-Six et al. 2018;Mitchell et al. 2022).Because our metrics assess the number and branching patterns of phylogenetic tips, an ultrametric HSC phylogeny was only required to make the initial crop of LTT.No estimates of an absolute time scale were necessary, which avoids biases associated with molecular dating approaches that may not work well when the evolutionary rates converge throughout the tree.
We generated the ancestral (embryonic) tree by cropping the HSC phylogeny at this inflection point and estimated the ancestral estimates of π and n.For some HSC phylogenies, apTreeshape failed to provide any estimates because the embryonic phylogeny contained fewer than 4 ancestral HSC lineages.In this case, we assumed log(π a ) = 0 such that imbalance was the smallest.Meanwhile, this approach will work even when the HSC tree contains only the subclonal lineages because they will be distinguishable due to a long stem branch connecting them to the germline reference.
For individuals who were sampled multiple times in the progression of their cancer, we take only the first sampling event as their adult phylogeny condition.This avoids the confounding effects of the different treatments these individuals underwent to eliminate cancerous HSCs in their blood, necessarily impacting their HSC phylogenies.

Modeling HSC Phylogeny Age
Both α and β increase exponentially with age (Fig. 6).Therefore, we fitted nonlinear models for estimating age using α and β separately.The model parameters were optimized using a numerical Gauss-Newton algorithm, aiming to minimize the sum of squared differences between the observed and predicted age of healthy individuals.Next, we combined these 2 models using a meta-regression framework (Viechtbauer 2010) with a maximum likelihood approach (Hardy and Thompson 1996).This process predicted the age and standard error for each healthy individual by utilizing random-effects meta-analysis in the metafor Craig et al. • https://doi.org/10.1093/molbev/msad279MBE package in R (Viechtbauer 2010).We validated the combined model, predicted the age, and estimated the prediction interval for each healthy individual using the LOO approach, in which individual ages were estimated using a model that excluded the individual of interest.

Mutational Signature Analysis
We annotated a clade of many cells separated from the other cells with a long branch in a phylogeny.We identified singletons in cell genomes from the selected clade and inferred mutational signatures using Signal (Degasperi et al. 2020).

Fig. 1 .
Fig. 1.The incidence of blood cancer by age per 100K individuals.The incidence of MPN increases exponentially by age according to the following function: incidence (per 100K) = 0.04×e 0.08×Age .Rates of incidence were obtained from Hultcrantz et al. (2020).This pattern mirrors the known incidence rates of other blood cancers such as leukemia and myeloma (Cancer Research UK 2016 to 2018, International Classification of Diseases [ICD] codes ICD-10 C91 to C95 and 2016 to 2018, ICD-10 C90, respectively).

Fig. 2 .
Fig. 2. Distributions of GDs between HSCs.Distributions are shown for a) an infant, b) a 38-yr-old, c) a 63-yr-old, and d) an 81-yr-old.Subclones arising from later in life are marked by an arrow, which forms a tail or a distribution to the left of the primary peak corresponding to the HSCs that arose during embryogenesis.

Fig. 3 .
Fig. 3. Accumulation of sequence variation over time in HSCs.Relationship of a) GDs and b) SNA counts with age for 7 healthy adult individuals.These relationships are curvilinear, as a second-degree polynomial fits the data better than a linear regression in both cases (P < 0.01).c) Tempos of SNA and GD increases do not explain the much more rapid increase in the age-related incidence of MPN.

Fig. 4 .
Fig. 4. The HSC phylogenies of 2 healthy individuals and changes in their phylogenetic biodiversity.a) The HSC phylogeny of a 38-yr-old healthy individual, which exhibits <10 HSC lineage divergences after the embryonic phase.b) The HSC phylogeny of an 81-yr-old healthy individual, which has many large clades containing subclonal HSCs.Only one-third of the tips are direct descendants of the embryonic HSCs.c) The LTT plot from the 38-yr-old exhibits an initial rapid diversification in HSCs, followed by a period of growth, with minimal increase in total lineages thereafter.d) The LTT plot from an 81-yr-old healthy individual exhibits an initial rapid diversification in HSCs, a period of growth, and then a second period of increase later in life.

Fig. 5 .
Fig. 5. Normalized metrics account for lineage sampling.a) The relationship of the imbalance of adult HSC phylogenies (π a ) with data richness (R 2 = 0.98) and other info.b) The standardized phylogeny imbalance (α) shows a low correlation with data richness (R 2 = 0.16; P > 0.05).c) The relationship of the number of tips present in the adult HSC phylogeny (n) is the same as the sampling fraction (R 2 = 1).d) The standardized tip count (β) is also not highly correlated with data richness (R 2 = 0.16; P > 0.05).

Fig. 7 .
Fig. 7.The relationship of the estimated phyloAge with the chronological age of healthy adults.We estimated phyloAges using the composite model incorporating α and β, with 95% confidence intervals derived from the LOO analysis.

Fig. 8 .
Fig. 8. Application of the phyloAge model to the phylogenies of MPN patients.a) PhyloAges of individuals with MPN are substantially greater than their true chronological ages.A 1:1 relationship (dashed line) is shown for comparison.b) The HSC phylogeny of a 49-yr-old individual with MPN.Only 3 of 92 HSCs trace their direct origin back to the root, and one subclonal lineage gives rise to a vast majority of HSCs present at age 49, which displayed almost all the primary HSC lineages.c) The logged LTT plot from a 49-yr-old individual with MPN exhibits an initial rapid diversification in HSCs, a period of growth, and then a second period of increase later in life, much like the older healthy person, but much sooner.