Genome-wide association study of corneal biomechanical properties identifies over 200 loci providing insight into the genetic etiology of ocular diseases

Abstract Corneal hysteresis and corneal resistance factor are parameters that reflect the dynamic biomechanical properties of the cornea and have been shown to be biomarkers of corneal disease. In this genome-wide association study of over 100 000 participants, we identified over 200 genetic loci, all but eight novel, significantly associated with either one or both of these traits. In addition to providing key insights into the genetic architecture underlying normal corneal function, these results identify many candidate loci in the study of corneal diseases that lead to severe visual impairment. Additionally, using Mendelian randomization, we were able to identify causal relationships between corneal biomechanics and intraocular pressure measurements, which help elucidate the relationship between corneal properties and glaucoma.


Introduction
Corneal hysteresis (CH) and corneal resistance factor (CRF) are quantitative measures of corneal biomechanics. Both phenotypes are measured using the ocular response analyzer (ORA), a non-contact tonometer (1) that differs from conventional noncontact tonometry as it records both the inward and outward applanation pressures, from which CH and CRF measures are CH and CRF have been implicated in several conditions of the eye. Previous work (4)(5)(6) has suggested that CH and CRF may be predictors of glaucoma, independent of elevated IOP. CH and CRF are also correlated, both phenotypically and genotypically with CCT and keratoconus (7). CH and CRF are also lower in patients with Fuchs endothelial corneal dystrophy (FECD) (8).
Genetic effects play a large role in CH and CRF variation. CH is highly heritable, with a reported narrow sense heritability of h 2 = 0.77 (9). Additionally, several loci associated with both CH and CRF have been identified in a previous study (7).
This study aimed to explore the genetic architecture of CH and CRF and clarify their genetic correlation with eye and systemic disease. Participants from the UK Biobank were used as a discovery sample in this study. The UK Biobank is a populationbased cohort with 503 325 participants between 40 and 69 years of age (10).

Results
There were 106 041 and 106 030 participants with a 97.7% sample overlap (103 591 participants) included for CH and CRF analyses, respectively, following quality control (see Materials and Methods). The mean age of participants was 57.3 (SD = 7.87), and 53.2% of participants were female. A summary of the participants measured phenotypes, stratified by age group, is provided in Table 1 and the phenotypic distributions stratified by sex are displayed in Supplementary Material, Figure S1a and b. CH distributions and phenotypic associations within the UK Biobank have also been previously described elsewhere (11).
Over 12 million SNPs (see Materials and Methods) were tested for association separately for CH and CRF. The genomic inflation factors (13) were λ CH = 1.31 and λ CRF = 1.43, and the respective LD score regression intercepts (and ratios) were 1.06 (0.10) and 1.08 (0.11), consistent with expectations of polygenicity and large sample sizes (14) indicating thet results are unlikely to be inflated due to population structure. Genome-wide significant association (P < 5 × 10 −08 ) with CH and CRF was observed for 13 945 and 20 020 variants, respectively (Miami plot of results in Fig. 2), clustered within 157 and 181 distinct genomic regions, respectively (Supplementary Material, Tables S1 and S2), including eight regions previously reported for CH and CRF (7); three of these loci (ZNF469, TCF4 and COL6A1) were previously associated only with CRF but were associated with both phenotypes in our analyses. Overall, in these analyses, 111 regions were associated at genome-wide significance with both CH and CRF. Figure 3 plots the effect size of the lead SNP for CH and CRF from each genomic region, and whether they are significantly associated with either one or both CH and CRF.
The stroma constitutes 90% of the corneal thickness and is primarily a complex of type I and type V collagen (17). Significant association was identified near one of type I collagen's two subunits (COL1A1) for CRF (rs180976308, P = 2.0 × 10 −11 ), though for CH, this marker only had suggestive association (P = 2.5 × 10 −6 ). Meanwhile, SNPs adjacent to the gene for one of type V collagen's subunits were associated with both CH and CRF (rs3132302, p CH = 9.0 × 10 −17 , p CRF = 9.4 × 10 −28 ).
Interestingly, a number of loci were specifically associated with CH and not CRF. These specific loci included several loci previously reported to be associated with IOP (18) such as: TMCO1 (rs7518099, p CH = 3.2 × 10 −28 , p CRF = 0.36), CAV2 (rs10233003, p CH = 5.7 × 10 −11 , p CRF = 0.001), and DGKG (rs9853115, p CH = 1.1 × 10 −28 , p CRF = 0.01). These phenotype-specific loci likely result from differences in phenotypic correlation between IOP with CH and CRF. Meanwhile, the CRF-specific loci are primarily collagenrelated genes (as previously discussed), likely as a result of variation in the cornea's physical structure having a greater influence over CRF than CH.
A key feature of Wagner syndrome is an abnormal structure in the vitreous gel, reducing its viscosity. Though CH is indicative of corneal viscosity, it is likely that the viscosity in both the anterior and posterior chambers has some shared genetic pathways.
Several other genes harboring associated SNPs are connected to rare genetic disorders, particularly corneal, ocular, and syndromic disorders of connective tissue (e.g. brittle cornea syndrome and Shprintzen-Goldberg syndrome) based on data available from OMIM (20) (summarized in Supplementary Material, Tables S3 and S4).
Conditional analysis refined these results, identifying 203 and 258 informative SNPs associated with CH and CRF, respectively. The Ensembl Variant Effect Predictor (23) identified that out of the 203 conditional SNPs for CH, 12 are missense variants and 39 are regulatory region variants (Supplementary Material, Table S5). Of the 258 conditional SNPs for CRF, 15 are missense and 40 are in regulatory regions (Supplementary Material, Table S6). Seven missense variants: rs72755233 (ADAMTS17), rs12448432 (HAGHL), rs77542162 (ABCA6), rs77583146 (WNT10A), rs121908120 (WNT10A), rs1042917 (COL6A2) and rs139498917 (MAMDC2) were independently associated with both phenotypes; their missense coding effects make them strong candidates for causal loci. The minor allele for one of these SNPs in particular, rs139498917_A (MAMDC2), is associated with a very large, and clinically relevant increase in both CH and CRF (β CH = 1.04, SE = 0.07; β CRF = 1.14, SE = 0.08), equal to a 0.62 standard deviation increase per allele for both.
Replication of the autosomal conditional SNPs (200 SNPs for CH and 253 for CRF) was conducted with a meta-analysis of two independent cohorts: TwinsUK and EPIC-Norfolk, as previously described (7). Though the replication dataset was less than 10% of the discovery cohort sample size, 17 SNPs from 17 separate regions replicated at Bonferroni-adjusted significance (P < 2.5 × 10 −4 ) (Supplementary Material, Table S7) for CH, while  an additional 94 SNPs were associated with nominal significance (P < 0.05) with the same direction of effect. Two SNPs (rs534975221 and rs544072714) were not present in the replication dataset for CRF. From the remaining 253 markers that were tested, 26 SNPs from 20 distinct regions replicated at Bonferroniadjusted significance (P < 1.98 × 10 −4 ) (Supplementary Material, Table S8), while an additional 106 SNPs had an equal direction of effect with at least nominally significant association. There was very strong correlation of effect sizes between the discovery and replication datasets: r CH = 0.90, P = 3.6 × 10 −73 ; r CRF = 0.89, P = 2.5 × 10 −86 . Figure 4a and b show plots of the effect sizes between datasets for CH and CRF, respectively.
In line with our observations of association for polymorphisms within or near several collagen-coding genes, gene-set enrichment analyses (after adjustment for 0.05 false discovery rate (FDR)) identified enrichment for collagen pathways including the 'Collagens' (24) gene set (FDR adjusted q CH = 9.0 × 10 −4 , q CRF = 9.1 × 10 −3 ). There was noticeable enrichment for gene sets of systemic pathways such as the Gene Ontology (25) 'Skeletal System Development' (FDR-adjusted q CH = 6.0 × 10 −4 , q CRF = 0) (all results are summarized in Supplementary Material, Tables S9 and S10).
Currently, eQTL data for corneal tissue are not publicly available; however, within the cornea, there is a high density of keratocytes (26), a sub-type of fibroblast. Therefore, eQTL data for fibroblasts from the GTEx database (27) were used as a surrogate reference for eQTL effects within the cornea. To check the validity of eQTL-derived results, RNA-seq data (28) were used to check if the implicated mRNAs are expressed in corneal tissue. Significant eQTL effects in fibroblasts were present for 70 CH conditional SNPs and 82 CRF SNPs; 58/70 CH eQTL mRNAs and 69/82 for CRF are known to be expressed within corneal tissue (Supplementary Material, Tables S11 and S12), the mRNAs for the other eQTLs were not included in Carnes et al.'s (28) RNA-seq analysis, so it is unknown whether or not they are expressed in corneal tissue.
In light of the genetic correlations between IOPcc and both CH and CRF, despite IOPcc and CRF having minimal phenotypic correlation, we applied Mendelian randomization (30) to infer any direct causal relationships. A combination of Mendelian randomization approaches indicated that the 'penalized' regression models were most appropriate for these data (see Materials and Methods for details). Results infer that IOPcc has a direct causal effect on both corneal biomechanical phenotypes (bestfitting models summarized in Table 2). However, when testing for causality in the opposite direction (CH and CRF affecting IOPcc), the MR-Egger intercept was significantly different from 0. These results indicate that IOPcc has a unidirectional causal effect over CH and CRF and that IOP influences how the cornea responds to biomechanical change. Full results from all models are provided in Supplementary Material, Tables S17-S20.

Discussion
This GWAS identified a total of 217 distinct genomic regions significantly associated with either one or both corneal biomechanical properties, CH and CRF. These include loci implicated in the corneal diseases of keratoconus (21) and FECD (22), alongside many novel associations, and replicated all previously reported associations for these phenotypes (7).
One of these novel associations was on chromosome X, which is often excluded from GWAS analyses (31). The strongest association on this chromosome was within the biglycan gene (BGN).
The biglycan protein forms a complex acting as a link between type II and VI collagen, with biglycan-deficient animal models having disarrayed collagen fibrils (32). Association at this locus is another component in collagen-related pathways, which were highly enriched in our results.
Previous studies have shown that CH and CRF are significantly different in patients with corneal dystrophies such as keratoconus (33,34) and FECD (8) compared to healthy controls, and in the case of keratoconus, they have a strong predictive value for disease status (34). However, due to these conditions having a reasonably low prevalence, it is difficult to acquire large numbers of cases to provide statistical power in GWAS analyses. Therefore, CH and CRF serve as useful endophenotypes, and the numerous significantly associated loci in our analyses provide suitable candidates for further study of the genetic etiology of corneal diseases.
There is an enrichment of associated variants within or adjacent to genes for multiple collagen subunits for both CH and CRF. As different collagen types are expressed throughout different layers of the cornea, these results indicate that variations in collagens throughout the corneal layers influence corneal biomechanical properties. Interestingly, despite analyzing the same sample for both phenotypes, there is both a greater enrichment of collagen loci and a greater number of significantly associated loci for CRF than for CH. This is possibly a consequence of IOP effects on the corneal dampening capabilities; thus, CH measurements contain an IOP component, and IOP is known to be a relatively 'noisy' phenotype (35). Meanwhile, CRF was designed to have a minimal IOP component (3), which in turn may reduce noise in the data.
Mendelian randomization indicates that IOPcc exerts causal effects over CH and CRF. The directions of effect show that an increase in IOPcc will reduce CH but simultaneously raise CRF. Although counterintuitive at first sight given the strong phenotypic correlation between CH and CRF (r = 0.82, P < 1.0 × 10 −100 ), these results are consistent with the core properties of these parameters. CH is representative of the cornea's dampening capabilities which will be reduced by increased pressure, whereas CRF represents the resistance within the cornea which will increase when subjected to raised pressure. Interestingly, these results infer that CRF is not independent of IOP, despite the formula to derive it being designed to minimize IOP correlation and effects (3); genetic correlation between CRF and POAG (r g = 0.30, P = 1.4 × 10 −10 ) is likely driven by the effect IOP has over CRF. In light of these results, further study is required to determine the utility and independence of CH as a risk factor for POAG progression (4).
This study is the largest to date, and uniform measurement methods were used in a largely homogenous UK population. Our findings provide new insight into the genetic architecture of dynamic corneal properties, which may be useful in aiding our understanding of both normal corneal development and disease pathways involved in a number of corneal conditions.

Materials and Methods
We performed a GWAS using information from the 117 649 participants that took part in the eye and vision component in the UK Biobank (10).

Subjects
Subjects included for this analysis provided full informed consent in accordance with ethical approval granted and overseen by the UK Biobank Ethics and Governance Council. All subjects included were confirmed to be of European ancestry through principal component analysis as described in a previous study (18), and all first-degree relatives, as determined by identity by descent calculations, were excluded.

Phenotyping
CRF and CH were both measured using the ORA device, during an eye examination performed by a trained technician. The ORA uses a 20 ms air pulse to flatten the cornea, while using an electro-optical system to measure the inward (designated P1) and outward (designated P2) applanation pressures (1). CH is defined as the absolute difference between P1 and P2 (1), while CRF is a linear function of P1 and P2 that has been adjusted to minimize correlation with corneal-compensated intraocular pressure (IOPcc) while maximizing correlation with CCT (3). Measurements were excluded from analysis if the participants had a history of glaucoma or corneal surgery, refractive surgery, eye injuries, and those currently using glaucoma medications. In addition, the top and bottom 0.5 percentiles were excluded to remove any outliers that are likely artefact. The mean of measures from the left and right eye was used as the outcome variable. In total, 106 041 and 106 030 participants remained for CH and CRF analysis, respectively, with a 97.7% sample overlap (103 591).
POAG cases and controls were ascertained using ICD10 codes and self-reported questionnaire data. Use of IOP-lowering medication was self-reported.

Genotyping
UK Biobank participants were genotyped on one of two arrays: the Affymetrix UK Biobank Axiom array, and the UK BiLEVE array. Imputation of additional variants was performed using a combined Haplotype Reference Consortium (HRC) (36) and UK10K (37) reference panel; full details for genotyping and imputation are described elsewhere (38). Only variants that were either directly genotyped or imputed using the HRC reference panel were included in our analysis. Variants were then filtered with minor allele frequency >0.1% and an imputation quality score >0.4 cut-off.

Association tests
The CH and CRF measurements were used as the outcome variables in separate linear mixed model regressions, under the assumption of an additive model for allelic effects. Adjustments were made for age, sex and the first five principal components. These analyses were performed in BOLT-LMM (39) using a linear mixed model, that provides additional corrections for population structure and cryptic relatedness. For chromosome X, genotypes were coded as diploid (males coded as 0/2 and females as 0/1/2) to account for dosage compensation.

Phenotypic association tests
Phenotypic associations with age were tested in R (40) using a linear model, adjusted for age and sex. Correlation between CH and CRF was tested with Pearson's correlation coefficient.

Genomic region and significance specification
In this paper, an 'associated region' was defined as a region with genome-wide significantly associated markers, separated from other associated regions by a minimum of 1 million basepairs that are not significantly associated. Genome-wide significance for our study was set at P < 5 × 10 −8 , customary for these analyses.

LD score regression and genetic correlation
An LD score regression intercept was calculated to identify any possible inflation (14) using the LD Hub (41). This approach is more informative for this study than the Devlin genomic inflation lambda (13), on account of the large cohort size and high trait polygenicity. LD Hub was then used to test for genetic correlation between other heritable traits, and CH and CRF, respectively (42).

Conditional analysis
Following the same procedure outlined by Yang et al. (43), conditional analysis was conducted in GCTA to identify variants that are independently associated with CH or CRF.

Gene pathway enrichment analysis
Summary statistics from each phenotype were analyzed by MAGENTA (44) to identify any enriched association in gene pathways for canonical gene sets and Gene Ontology gene sets (25). The original databases used were acquired from the Molecular Signatures Database (version MSigDB v6.1, 45) and were then modified for compatibility. An enrichment cut-off for the 75th percentile was applied as recommended for highly polygenic traits (44). A 5% FDR was applied to results to correct for multiple testing.

Gene-based association
Summary statistics from our GWAS were used as input for analysis by S-PrediXcan (29) to test for association between phenotype and whole gene expression. As with the eQTL analysis, results were limited to fibroblast tissue, as it is a functionally relevant model due to the high density of keratocytes, a sub-type of fibroblast, within the cornea (26). A Bonferroni corrected significance threshold of P < 2.5 × 10 −7 is recommended when testing all gene-tissue pairs. However, as we only included fibroblast tissue in this analysis, this value was adjusted to correct for all tested gene-tissue pairs and set at P < 5.5 × 10 −6 .

Replication of associated markers
Autosomal SNPs with independent association in this study, as identified by conditional analysis, were subject to replication testing with Bonferroni correction for multiple testing. The replication was conducted with a meta-analysis in two independent cohorts, EPIC-Norfolk (46,47) and TwinsUK (48). In brief, GWAS and subsequent meta-analysis was conducted for both CH and CRF separately in these cohorts. In total 9029 participants of European ancestry were included in these analyses, with no known sample overlap with the UK Biobank. Full details of this analysis are described elsewhere (7).

Mendelian randomization
Mendelian randomization was used to test for causal inference between IOPcc with CH and CRF. Summary SNP data for IOPcc were used from Khawaja et al.'s (18) recent meta-analysis (which included the UK Biobank). The lead SNPs from each separate genomic region were selected as genetic instruments for the exposure variable in order to prevent confounding errors resulting from correlated SNPs. Any SNPs significantly associated with the outcome variable (P < 5 × 10 −8 ) were excluded as they may have a common cause with the outcome (the MR independence assumption). Standard regression, penalized regression, robust regression, and penalized robust regression models were applied for these analyses with results indicating that the penalized models were the most suitable. The penalized models downweigh the contribution of variants with heterogenous causal estimates to the analysis; due to the inclusion of a relatively large number of instrument variables in these analyses, a penalized regression is more stable than a standard linear regression (49). For the inverse-variance models, the psi variable was set as the observational correlation between the exposure and outcome phenotypes to account for sample overlap. All analysis was performed in the MendelianRandomization (50) package in R (40).

Funding
The TwinsUK study was funded by the Wellcome Trust

Supplementary Material
Supplementary Material is available at HMG online.