A Polygenic Risk Score to Predict Future Adult Short Stature Among Children

Abstract Context Adult height is highly heritable, yet no genetic predictor has demonstrated clinical utility compared to mid-parental height. Objective To develop a polygenic risk score for adult height and evaluate its clinical utility. Design A polygenic risk score was constructed based on meta-analysis of genomewide association studies and evaluated on the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. Subjects Participants included 442 599 genotyped White British individuals in the UK Biobank and 941 genotyped child-parent trios of European ancestry in the ALSPAC cohort. Interventions None. Main Outcome Measures Standing height was measured using stadiometer; Standing height 2 SDs below the sex-specific population average was considered as short stature. Results Combined with sex, a polygenic risk score captured 71.1% of the total variance in adult height in the UK Biobank. In the ALSPAC cohort, the polygenic risk score was able to identify children who developed adulthood short stature with an area under the receiver operating characteristic curve (AUROC) of 0.84, which is close to that of mid-parental height. Combining this polygenic risk score with mid-parental height or only one of the child’s parent’s height could improve the AUROC to at most 0.90. The polygenic risk score could also substitute mid-parental height in age-specific Khamis-Roche height predictors and achieve an equally strong discriminative power in identifying children with a short stature in adulthood. Conclusions A polygenic risk score could be considered as an alternative or adjunct to mid-parental height to improve screening for children at risk of developing short stature in adulthood in European ancestry populations.

Directly assessing the presence of alleles that influence height has become practical and relatively low cost through the use of genomewide genotyping. With large genotyped cohorts, such as the UK Biobank (14), polygenic risk scores for complex traits of high heritability and polygenicity have been developed and achieved high predictive and discriminative power (15)(16)(17)(18). Although it has been shown that at least 3290 near-independent genomewide significant single nucleotide polymorphisms (SNPs) are associated with adult height (19), a recently constructed polygenic risk score has been able to explain about 40% of the total variance in sexadjusted height z-scores (20).
In this study, we sought to develop and optimize a polygenic risk score for adult height using state-of-the-art methods based on 607 346 individuals of European ancestry from the UK Biobank and the Genetic Investigation of Anthropometric Traits (GIANT) study (21). We then evaluated how well such a polygenic risk score performed compared to mid-parental height in its ability to predict adult height of 941 children from the Avon Longitudinal Study of Parents and Children (ALSPAC) (22,23). Last, we tested the ability of the polygenic risk score to identify children who would later have an adulthood short stature.

Study Cohorts
The UK Biobank study recruited more than 500 000 participants who were enrolled between 2006 and 2010 and were aged between 40 and 69 years, at multiple recruitment centers in the United Kingdom (14). Demographic and anthropometric measurements were collected upon recruitment (Table 1). Shoeless standing height was measured using a Seca 202 mechanical telescopic height measuring rod. Participants of the UK Biobank were genotyped using the Applied Biosystems ™ UK BiLEVE Axiom ™ Array or UK Biobank Axiom ™ Array (14). The genotypes were imputed to the Haplotype Reference Consortium reference panel (24). A total of 25 321 604 SNPs with a minor allele frequency >0.01% and an imputation information score >0.8 were retained. Ethics approval for the UK Biobank study was obtained from the North West Centre for Research Ethics Committee (11/NW/0382). The UK Biobank ethics statement is available at https://www. ukbiobank.ac.uk/the-ethics-and-governance-council/. All UK Biobank participants provided informed consent at recruitment.
The ALSPAC cohort initially recruited 14 541 pregnant women who had an expected delivery date between April 1991 and December 1992 in the Bristol and Avon areas in the United Kingdom. Details of this prospective cohort have been described previously (22,23). Children in the ALSPAC cohort were genotyped using the Illumina HumanHap550 quad genotyping platforms and the genotypes were imputed to the 1 000 Genomes Phase 3 reference panel (25). A total of 8932 samples were available for children of European ancestry. Shoeless adult standing height (at age 24 (26)), or individuals who were originally the children in the cohort, as well as shoeless standing height of both biological parents (available for 1532 children) were measured during clinical visits using a Harpenden stadiometer (Holtain Ltd). Among 3078 genotyped children who had measured adult height, 941 had height measured for both biological parents (Table 1), a further 1305 only had measured maternal height, and 151 only had measured paternal height. Shoeless height and weight measures of children during puberty (between ages 8 and 17) was collected from a "Growing and Changing" questionnaire, which was distributed to participants recurrently between September 1999 and February 2010 (27). These questionnaires were completed by either a parent or a child. The children were asked to stand barefoot straight against a wall to mark at the highest point on the head and to measure the distance from the mark on the floor. No specific instruction was provided for measuring weight. Please note that the study website contains details of all the data that are available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/ out-data/).
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Consent for biological samples has been collected in accordance with the Human Tissue Act (2004). Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time.

Development of a Polygenic Risk Score
Details of polygenic risk score construction and initial evaluation are provided in supplementary notes in (28) and summarized in Figure 1. Briefly, we first performed genomewide association study for height in the UK Biobank, based on 354 058 White British ancestry individuals. Next, we meta-analyzed the UK Biobank-based genomewide association study summary statistics with those provided by the GIANT study (21), which included 253 288 European ancestry individuals. We then leveraged a set of state-of-the-art approaches, including linkage disequilibrium-pruning and P-value thresholding (29), LDpred (16), and least absolute shrinkage and selection operator (30) regression, to develop a series of new candidate polygenic risk scores based on a random training-model selection-test split in the UK Biobank (Table 1; supplementary notes in (28)). We found that a polygenic risk score generated by least absolute shrinkage and selection operator regression that included 33 938 SNPs demonstrated the best predictive power in the UK Biobank model selection set; thus, we adopted this optimized polygenic risk score for all downstream assessments (supplementary notes in (28)).

Comparison of Polygenic Risk Score and Mid-Parental Height
We compared the predictive performance of the polygenic risk score and of the mid-parental height in predicting children's adult height leveraging the ALSPAC cohort. Midparental height was calculated as (31) This adapted mid-parental height predictor was derived from a Swedish population-based study including 2402 children by fitting sex-specific linear regression models (32). Polygenic risk scores were calculated for each of the 3078 genotyped children with measured adult height. SNPs that were included in the UK Biobank-based score but were not present in the ALSPAC data were discarded.
We evaluated the predictive performance of these 2 height predictors by proportion of variance explained and the root mean square error (RMSE). We defined those with relative short stature as being among the 2.3% shortest (corresponding to 2 SDs below the mean in the ALSPAC cohort) in females and males, separately. We performed logistic regression and compared predicted to measured height again using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). We also tested whether linearly combining the polygenic risk score with mid-parental height or with maternal or paternal height only could significantly improve prediction accuracy by likelihood ratio test of linear regression models and by comparing binary prediction metrics. All assessments included sex-specific analyses.

Use of Polygenic Risk Score in Khamis-Roche Predictor
In addition, we calculated Khamis-Roche adult height predictors (13), which incorporate a child's age, current height, current weight, and mid-parental height, for children with self-reported height and weight during puberty. We investigated whether the mid-parental height used in this predictor could be replaced by the genetically predicted height by the polygenic risk score and whether this substitution affected predictive performance.
Possibly due to population specificity, compared to the ordinary mid-parental height, the adapted mid-parental height predictor (see previous discussion on methods) derived from a Swedish population did not demonstrate superior predictive power [adjusted R 2 = 69.3% (95% CI: 66.1%-72.6%)] (Supplementary Figure 3 in (28) (28)). We therefore did not include this adapted midparental height predictor in further analyses.

Polygenic Risk Score Could Substitute Mid-Parental Height in Khamis-Roche Approach for Improved Prediction
Khamis-Roche adult height predictors were derivable for varying sample sizes at 13 time points (Fig. 3). At ages 8, 8.5, 9.5, and 10.5, we tested the performance of substituting mid-parental height with the polygenic risk score in the Khamis-Roche approach. We found that this led to marginally higher proportion of variance explained in adult height as well as slightly lower prediction RMSE ( Fig. 3A and 3B). At ages ≥11 years, the 2 Khamis-Roche predictors displayed no distinguishable difference. These findings were consistent in females and males, respectively (Supplementary Figure 6 in (28)). Although underpowered by the limited number of short stature cases, the polygenic risk score-based Khamis-Roche predictor and the midparental height-based Khamis-Roche predictor had highly consistent discriminative power in identifying children who would have a short stature in adulthood (Supplementary Figure 7 in (28)).

Discussion
Accurate prediction of adult height is important for the management of short stature in childhood. As well as selecting which children may require growth hormone therapy for idiopathic short stature (1)(2)(3), it may also help to inform which children should be further investigated for chronic health conditions affecting growth, such as growth hormone deficiency or genetic syndromes (35).
Given the exceptionally high heritability of height, there have been long-standing interest and efforts to genetically predict adult height. Yet, the most commonly used genetic predictor has been the mid-parental height, which simply averages the height of a child's biological parents and adjusts for the sex of the child. In this study, we constructed efficient polygenic risk scores for adult height, leveraging > 600 000 individuals of European ancestry. Using an external cohort of 941 samples, we demonstrated that, for the first time, a polygenic risk score was able to perform as well as the mid-parental height in predicting children's adult height and in identifying children who would have adult short stature. Further, combining mid-parental height and the polygenic risk score provided better prediction accuracy than either metric alone.
The polygenic risk score might provide clinical utility. For example, parental height may not be available due to various reasons (eg, many children do not know both parents' height), and in many more cases it is inaccurately measured or reported, and thus a potentially erroneous estimate might be supplied. Also, large discrepancies in parental heights render mid-parental height a less valid adult height predictor, particularly when some parents do not achieve their genetically determined height because of disease that is not inherited by the child. Under these circumstances, a polygenic risk score may be utilized instead, not only as a single predictor but also in other accurate multivariate predictive models, such as the Khamis-Roche method (13). Even though the self-reported height and weight measures during puberty in the ALSPAC cohort may contain measurement errors, the superior performance of Khamis-Roche predictors over the mid-parental height or the polygenic risk score alone should encourage further investigations into incorporating an accurate genetic predictor into established scoring schemes. We further demonstrated that even if only 1 parent could provide measured height, a combined predictor with the polygenic risk score could have importantly improved predictive and discriminative power over using the single parental height. We posit that these advantages could be considered when undertaking genetic prediction of adult height both in both clinical and research settings. Since genomewide genotyping has become more affordable (currently priced at approximately US$40 in a research context), is undertaken once in a lifetime, and could be used to predict several health outcomes, genetic prediction of adult height in children and adolescents could be widely and cost-effectively applied in research and clinical practice.
Combining the polygenic risk score and the midparental height could enhance predictive performance. Intuitively, the mid-parental height, while being a crude estimate of the genetic contribution, may additionally capture variance due to unmeasured environmental exposures shared by the parents and the children, particularly when families in a target population live in the same geographical area. This may partially explain the decrease of accuracy in mid-parental height prediction when a large discrepancy exists in the parental height of a child. On the contrary, our polygenic risk score more accurately quantifies the genetic contribution toward height without measuring environmental exposures. Hence, the 2 predictors provide non-overlapping information. However, how proper calibration should be carried out and whether a joint predictor is generalizable will require extensive exploration in future studies.
Our study has important limitations. First, our findings should be considered population-specific. Polygenic risk scores are usually not directly transferrable to a population of a different ethnic background (36)(37)(38)(39). Populationspecific polygenic risk scores should be developed to further explore their utility for height prediction, especially among non-European populations-once again underlining the importance of larger genetic studies of non-European ancestry that are currently scant. It should also be noted that the adult height of children in ALSPAC was, on average, 3 to 4 cm taller than that in the UK Biobank, because the former represents a much younger population. We therefore adopted the definition of short stature as 2 SDs below population mean, instead of referring to a general population-based growth curve. Moreover, despite a high heritability, adult height is also influenced by a wide range of genetic and nongenetic factors not included in the polygenic risk score, including but not limited to chronic conditions retarding growth (eg, genetic syndromes, such as Turner (40)(41)(42) and Prader-Willi syndromes (43)(44)(45), treatment for childhood cancer (46,47), acquired growth hormone deficiency secondary to head trauma (35), etc.), rare pathogenic variants of large effects in pivotal genes (48)(49)(50), and long-term medication (eg, use of inhaled corticosteroids (51)(52)(53)). Whether any interplay exists between the polygenic risk score and these risk factors may be tested with larger sample sizes and by testing the performance of this polygenic risk score in pediatric populations at risk for compromised growth.
Lastly, although not measured in our study cohorts, predictions based on current height and bone age remain powerful indicators of final adult height. In fact, in a few cases where bone age-based predictors were evaluated, the proportion of variance explained by a combination of current height, chronological age, and bone age exceeded 60% at age 3 in both boys and girls (54) and continued increasing toward >99% by age 17, while the mid-parental height or the polygenic risk score alone could only account for approximately 40% of the total variance in sex-specific analyses. Nevertheless, it has been suggested that bone age-based predictors (eg, the Tanner-Whitehouse predictors) might also benefit from adding a genetic predictor, particularly for children with an extremely short or tall stature (54). Notably, a genetically predicted height is not sensitive to pubertal patterns (early or late puberty), which typically affect bone maturation, and this predictor could be computed as early as at birth, whereas adult height predictions based on bone age are not precise before the age of 3 years. For all the previously discussed reasons and given the decreasing costs and increased accessibility of genomewide genotyping, we anticipate that a genetically predicted height may play a larger role in informing clinical decisions in the future.

Conclusions
In summary, we generated a polygenic risk score and demonstrated that it could successfully predict adult height in a population of healthy children. Our findings support the use of this genetic predictor in certain settings as an alternative or in combination with traditional adult height predictors, such as mid-parental height, and possibly also with bone age-based methods to enhance screening for children at risk for short stature in adulthood.

Acknowledgments
This research has been conducted using the UK Biobank resource under Application Number 27449 and the ALSPAC resource under Application Number B3359. We thank all the families who took part in the ALSPAC study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses. This study was enabled in part by support provided by Calcul Québec and Compute Canada. and the University of Bristol provide core support for ALSPAC. A comprehensive list of grants funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). The funding organizations had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Author Contributions. TL conceptualized and designed the study, led acquisition of data, led analysis and interpretation of data, and drafted the manuscript. VF, JRBP, KKO, and NJT acquired data, interpreted results, and revised the manuscript critically for important intellectual content. HW performed analysis and interpreted results and revised the manuscript critically for important intellectual content. CMTG conceptualized and designed the study, interpreted results, and revised the manuscript critically for important intellectual content. DM and JBR conceptualized and designed the study, acquired data, interpreted results, and revised the manuscript critically for important intellectual content. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.