Rare mutations associating with serum creatinine and chronic kidney disease

Gardar Sveinbjornsson1, Evgenia Mikaelsdottir1, Runolfur Palsson2,5, Olafur S. Indridason5, Hilma Holm7, Aslaug Jonasdottir1, Agnar Helgason1,3, Snaevar Sigurdsson1, Adalbjorg Jonasdottir1, Asgeir Sigurdsson1, Gudmundur Ingi Eyjolfsson8, Olof Sigurdardottir9, Olafur Th. Magnusson1, Augustine Kong1,4, Gisli Masson1, Patrick Sulem1, Isleifur Olafsson6, Unnur Thorsteinsdottir1,2, Daniel F. Gudbjartsson1,4,∗ and Kari Stefansson1,2,∗


INTRODUCTION
Chronic kidney disease (CKD) is a common disorder that can progress to kidney failure and is associated with an increased risk of cardiovascular morbidity and death (1). Studies indicate a dramatic increase in the prevalence of CKD with advancing age (2,3). With increased lifespan, the burden of CKD has grown steadily in the Western world (4) and is recognized as a public health problem substantially impacting the health service (5). The cause of CKD is not always known and frequently appears multifactorial, with hypertension and diabetes mellitus being the most important contributing factors (6)(7)(8)(9). Genetic factors may be a direct cause of CKD, but they may also confer vulnerability to other risk factors such as diabetes, hypertension and old age. The glomerular filtration rate (GFR) is considered the best measure of kidney function. Direct measurements of GFR using exogenous or endogenous filtration markers are too cumbersome and expensive for use in routine clinical practice.
In recent years, estimates of GFR (eGFR) obtained using equations based on serum creatinine (SCr) have become the most commonly used indicator of kidney function. However, the relationship between the SCr level and GFR varies between people as creatinine is mostly generated in striated muscle and is therefore affected by muscle mass. Furthermore, creatinine is partly excreted by tubular secretion in the kidney, in addition to glomerular filtration. Current SCr-based equations to calculate eGFR account for age, sex and ethnicity as surrogates for muscle mass (10). CKD is generally defined as a GFR ,60 ml/min/1.73 m 2 and/or other signs of kidney damage, lasting for 3 months or more (11).

RESULTS
To search for sequence variants associating with SCr, we performed a GWAS of 24 million single-nucleotide polymorphisms (SNPs) and insertions/deletions (INDELs; Supplementary Material, Table S1) on a sample of 194 286 Icelanders with SCr measurements. The participants had a median of 4 (range 1 -645) SCr measurements and an average (geometric) of 4.9. The SCr measurements were obtained from the three largest medical laboratories in Iceland (Supplementary Material, Table S2). The mean (arithmetic) of the quantile -quantile standardized SCr values, adjusted for sex and age at the time of measurement for each individual, was used in the analysis. The SNPs and INDELs were found by whole-genome sequencing of 2230 Icelanders. The variants were imputed into 81 656 chipgenotyped individuals using long-range phasing-based imputation (19). In addition, the genotypes of 112 630 relatives of chip-typed individuals with SCr measurements were inferred and used in the analysis.
Sequence variants predicted to cause loss of function (LoF) of a gene, such as stop-gain mutations and frameshift INDELs, have the largest impact of any known class of sequence variants. This class is followed by the class of predicted missense variants. We weighed sequence variants according to their prior probability of affecting gene function by applying thresholds for genome-wide significance that depend on the variant class. We allocated the type I error rate of 0.05 equally between three classes of variants. We tested 6476 LoF variants, 100 502 missense variants and 23 854 999 other variants (Supplementary  Material, Table S1), yielding class-specific Bonferroni genomewide significance thresholds of 2.6 × 10 26 , 1.7 × 10 27 and 7.0 × 10 210 , respectively. Sequence variants at 19 loci associate with SCr (Supplementary Material, Figs S1-S3 and Table S3). SNPs at 15 of the 19 loci have already been reported to associate with kidney function (12)(13)(14)(15)(16)(17), but 4 are novel. One locus has a genome-wide significant signal that is independent of previously reported variants at the locus ( Table 1). All of these five newly discovered variants are coding variants and four of them are rare (MAF ,2%). These variants have substantially larger effects than the previously reported common variants (Fig. 1A), and the effects are inversely proportional to MAF (Fig. 1B). The genes affected by the novel variants encode either solute carriers (SLC25A45, SLC47A1 and SLC6A19) or E3 ubiquitin ligases (RNF128 and RNF186).
We tested the variants for association with CKD in a sample of 15 594 cases and 291 428 controls from Iceland (Table 1). GFR was estimated from the SCr measurements using the MDRD Study equation, and individuals who persistently had eGFRcrea ,60 ml/min/1.73 m 2 for more than 3 months were classified as having CKD (11). Three of the five novel variants associating with SCr also associate with CKD risk (P , 0.05/5 ¼ 0.01). We additionally defined a severe (stage IV and V) CKD phenotype (11) as eGFRcrea ,30 ml/min/1.73 m 2 and tested the five variants for association in a sample of 1716 cases (Supplementary Material, Table S8). While no significant association was detected, three variants showed a directionally consistent effect.
SLC47A1 harbors a missense mutation, p.A465V (rs111653425, MAF ¼ 1.8%), that associates with SCr (effect ¼ 0.105SD, P ¼ 9.5 × 10 214 ) and with risk of CKD (OR ¼ 1.24, P ¼ 4.1 × 10 24 ). SLC47A1 encodes an organic cation/H + electroneutral exchanger involved in the final steps of organic cation extrusion in the kidney and liver (20,21), with creatinine being one of its substrates (22). Knockout of slc47a1 in mice results in significantly increased creatinine and blood urea nitrogen (BUN) levels (23). p.A465V affects a conserved amino acid located within the long cytoplasmic loop between the 12th and 13th transmembrane domains (24), a known location for mutations that alter SLC47A1 transporting properties (25). We found six other Table 1. Sequence variants showing a genome-wide significant association with SCr which are independent of reported markers. SCr measurements were available for 81 656 chip-typed individuals and 112 630 relatives of chip-typed individuals. Their association with CKD in a set of 15 594 cases and 291 420 controls is also shown. For each sequence variant, the reference SNP ID number (rs#), chromosome (Chr), hg18 position, its effect on a gene (coding effect), minor allele frequency (MAF) and imputation information (Info) are provided in addition to the effect on SCr in SDs, the odds ratio (OR) for CKD and the corresponding P-values  (14). This variant is independent of our discovery SNP and p.Pro489SerfsX10 (r 2 ¼ 0.020 and 0.0015, respectively). A splice variant, c.1173+2T-.G (rs142979576, MAF ¼ 0.86%), in SLC6A19 associates with lower SCr (effect ¼ 20.115SD, P ¼ 1.3 × 10 28 ). Mutations in SLC6A19, including c.1173+2T-.G, have been linked to Hartnup disease (26,27), an autosomal recessive disease characterized by impaired reabsorption of neutral amino acids (28). We detected seven additional mutations in SLC6A19, three of which associate with lower SCr (Supplementary Material, Table S4 and Table 2). The missense mutations p.D173N (MAF ¼ 0.32%) and p.P579L (MAF ¼ 0.36%) have also been linked to Hartnup disease (27,29), whereas p.F360S has not been observed outside Iceland and is not known to associate with the disease. Among the 4298 individuals with available data in the Exome Sequencing Project (ESP) (30), the splice variant, c.1173+2T-.G, is only seen in two individuals (ESP MAF ¼ 0.023%) and the missense mutation P.579L is not observed at all. The missense mutation, p.D173M (ESP MAF ¼ 0.34%), is by far the most common disease causing mutation in Europeans (31) and has a similar frequency in Iceland, although there the splice variant, c.1173+2T-.G, is most common. These four mutations are pairwise independent (r 2 , 0.00010)-no sequenced individual carries more than one mutation on the same chromosome-and all associate with lower SCr and have similar effect sizes ( Table 2). The splice variant associates with a decreased risk of CKD (OR ¼ 0.75, P ¼ 0.0019) and the three other mutations show trends in the same direction (Table 2). Collectively, the three additional mutations have a frequency of 1.0% and associate with a decreased risk of CKD A missense mutation that affects isoforms a and b of SLC25A45 (rs34400381, MAF ¼ 5.5%), named p.R285C and p.R243C, respectively, associates with lower SCr levels (effect ¼ 20.048SD, P ¼ 3.3 × 10 29 ). SLC25A45 belongs to a family of mitochondrial transporters that translocate several solutes across the mitochondrial membrane and participate in essential metabolic pathways (32). Analysis of SLC25A45 expression in human tissues revealed high expression in small intestine (Supplementary Material, Fig. S4A), consistent with results of expression studies in the rat (33). Another SNP, rs4014195, located near RNASEH2C and 200 kb upstream of rs34400381 (r 2 ¼ 0.015), has been reported to associate with kidney function (12) (reported P ¼ 1 × 10 27 ), but did not replicate in their follow-up (reported P ¼ 0.14). In our samples, rs4014195 associates with SCr and CKD (P ¼ 4.1 × 10 25 and 6.7 × 10 210 , respectively; Supplementary Material, Table S6).

REPLICATION OF REPORTED VARIANTS
We evaluated 45 SNPs that have been associated with SCr and/or CKD (Supplementary Material, Table S6). The association of 41 SNPs was confirmed (P , 0.05). The association of the variant rs16864170 at SOX11 with CKD did not replicate (P ¼ 0.55); this association also failed to replicate in a previous attempt (12). The association of rs3828890, near MHC with SCr reported in an East Asian population (13), did not replicate in our data (P ¼ 0.49). Finally, rs6431731, near the DDX1 gene (14) and rs6465825, near TMEM60 (12) did not associate with SCr (P ¼ 0.41 and 0.072, respectively).

CONDITIONAL ANALYSIS
We performed conditional analysis to search for secondary signals at the 19 SCr loci with a genome-wide significant association with SCr in our data. Conditioning on the most significant sequence variant at each locus, we searched for signals within a 200 kb window. On average, 0.9 LoF variants, 11.3 missense variants and 1682.5 other variants were tested at each locus, giving class-specific Bonferroni significance thresholds of 1.9 × 10 22 , 1.5 × 10 23 and 9.9 × 10 26 , respectively. In addition to the mutations in SLC6A19 and SLC47A1 described above, we found a second significant sequence variant in three regions (in or near SLC34A1, GATM and UMOD/PDILT) within the 200 kb window around the most significant SNP (Table 3).
In SLC34A1, a rare (MAF ¼ 0.46%) missense mutation, p.Y489C, within 20 kb of the SNP showing the most significant association in the region (rs35716097), associates with SCr when conditioning on rs35716097. We replicated the known association of rs35716097 in SLC34A1 with kidney stone disease (12) and demonstrated the association of the p.Y489C mutation with the disorder (Supplementary Material, Table S7).
For the known loci, where our most significantly associated SNP differed from the reported SNP, we performed a conditional analysis in order to determine if our most significant SNP represents a refinement of the published signal. Several correlated sequence variants were more significantly associated with SCr than the previously reported rs10794720 in WDR37 on chromosome 10p15 (reported P ¼ 1 × 10 28 , MAF ¼ 5.99%, our P ¼ 1.9 × 10 26 , effect ¼ 0.037SD) (12). The most significant association is with rs35772020, located in LARP4B (MAF ¼ 6.7%, effect ¼ 0.047SD, P ¼ 4.1 × 10 210 ), which is correlated with rs10794720 (r 2 ¼ 0.40). A correlated (r 2 ¼ 0.78) stop-gain mutation, rs1044261, in the IDI2 gene also showed a highly significant association with SCr (MAF ¼ 5.97%, effect ¼ 0.45SD, P ¼ 7.1 × 10 29 ). Conditional on the effect of the reported SNP, rs10794720 in WDR37, the association of these SNPs with SCr remained significant (P ¼ 9.2 × 10 27 and 4.3 × 10 28 ).
The most significantly associated sequence variant on chromosome 16, rs77924615, was in PDILT on 16p11, located adjacent to the UMOD gene on chromosome 16p12, which has been associated with CKD (15,16). After conditioning on the previously reported rs4293393, rs77924615 (r 2 ¼ 0.38) remained significant (P adj ¼ 2.4 × 10 214 ). When conditioning on rs77924615, the most significant association within the region was rs60136849, a close correlate of rs4293393 (r 2 ¼ 0.98). Hence, not all of the association with SCr at the UMOD-PDILT locus is captured by the strongest variant in this region.

DISCUSSION
Through whole-genome sequencing and imputation into a large number of individuals with several SCr measurements each, we discovered novel sequence variants associating with SCr and CKD. Because of the large sample of sequenced individuals used to inform imputation of sequence variants, we were able to discover rare variants with relatively large effect on SCr. Taken together, these novel variants add 0.5% to the explained variability of SCr in the Icelandic population.
While the observed association with SCr is convincing, we cannot distinguish between effects of the variants on kidney function and non-GFR determinants of SCr. Accordingly, the implications for CKD are uncertain due to the limitations associated with using SCr and creatinine-based GFR estimates for defining the disorder. Hence, the associations with CKD might not reflect true alterations in GFR. Our study is therefore limited by the lack of other filtration markers such as cystatin C, which would enable us to differentiate between effects on creatinine metabolism and kidney function. Since eGFR data using another filtration marker were not available, a severe (stage IV and V) CKD phenotype was defined in order to eliminate the effects of sequence variants causing altered creatinine metabolism. Limiting the phenotype reduces power to detect Table 2. Coding variants in SLC6A19 and SLC47A1 associating with SCr and CKD. SCr measurements were available for 81 656 chip-typed individuals and 112 630 relatives of chip-typed individuals. Their association with CKD in a set of 15 594 cases and 291 420 controls is also given. For each sequence variant, the reference SNP ID number (rs#), its coding effect on a gene, minor allele frequency (MAF) and imputation information (info) are given in addition to the effect on SCr in SDs, the odds ratio (OR) for CKD and the corresponding P-values significant associations, but a directionally consistent trend would be suggestive of an effect on CKD. Variants in RNF186, SLC6A19 and SLC25A45 showed a directionally consistent effect. Sequence variants that only affect creatinine metabolism, but do not predispose to CKD, may improve the accuracy of equations used for estimating GFR. All the variants alter a protein-coding sequence and affect the function of either solute carriers or E3 ubiquitin ligases. Four of the variants we describe are rare with SCr effects between 0.085 and 0.129SD which are greater than the maximum effect of the previously reported common variants, which all have effects below 0.069SD.
The SLC superfamily encodes membrane-bound transporters that translocate diverse substances across the cell membrane (37). Variants in or near SLC genes have been associated with creatinine production and/or secretion (12) as well as with renal function and CKD (12,17). We found several pairwise independent coding variants associating with decreased SCr in SLC6A19. SLC6A19 is responsible for reabsorption of neutral amino acids in the kidneys and small intestine (28). Homozygotes and compound heterozygotes for mutations in SLC6A19 have Hartnup disease (26,27), an autosomal recessive condition characterized by pellagra-like light-sensitive rash, cerebellar ataxia, emotional instability and aminoaciduria (28). Although SCr is affected primarily by the GFR, there are additional factors that can affect SCr levels without a change in GFR, such as altered creatinine generation or renal tubular secretion. While these non-GFR determinants of SCr may account for the association of SLC6A19 and SCr/CKD, this finding still raises the intriguing possibility that the SLC6A19 mutations confer a heterozygote advantage (38). Thus, it would be important to use other filtration markers or a direct measurement of GFR to establish that the variants do indeed confer protection against CKD.
A common variant in an intron of SLC47A1 has been associated with kidney function (14). We found a rare missense mutation that associates with both increased SCr and an increased risk of CKD. As SLC47A1 encodes a multidrug and toxin extruder, one can speculate that the inability to excrete toxic organic cations may eventually lead to kidney damage and CKD. This notion is supported by a knockout mouse model, in which a concomitant rise in BUN suggests a reduction in GFR (22). However, creatinine may also be one of the many substrates of this transporter and, therefore, a missense mutation might cause impairment of renal tubular secretion, resulting in higher SCr.
Finally, we found a missense mutation in SLC25A45, an orphan mitochondrial transporter that associates significantly with renal quantitative traits but not with CKD. Based on our data and previous reports, we postulate that SLC25A45, which is related to the ornithine/citrulline transporters SLC25A2 and SLC25A15 (32), may play a role in biosynthesis of arginine, one of the three amino acids involved in the synthesis of creatine, the precursor of creatinine.
E3 ubiquitin ligases play an important role in regulation of diverse cellular processes (39)(40)(41). RNF186, which encodes an E3 ubiquitin ligase localizing to the ER and promotes Lys29-and Lys63-linked polyubiquitination (34), harbors a rare stop-gain mutation associating with both elevated SCr and a risk of CKD. RNF128 is an E3 ubiquitin ligase in the endocytic pathway that reportedly plays a role in glucose and lipid metabolism in the mouse and in T-cell anergy (35,42,43). A missense mutation in RNF128 associates with elevated SCr but not with CKD. The expression profile of RNF128 indicates that it has a role in the small intestine, liver, kidney and skeletal muscle, which are sites relevant for creatinine production and/or excretion. Additional studies are required to search for the substrates of RNF186 and RNF128 and to determine whether these ligases are involved in molecular pathways relevant to kidney function and pathogenesis of kidney disease.
In conclusion, we have discovered several rare protein-coding variants that associate with SCr and CKD. These variants add 0.5% to the explained variability of SCr in the Icelandic population. Future studies will require more accurate characterization of CKD, using additional filtration markers for estimation of GFR and alternative markers of kidney disease, particularly proteinuria.

Study population
Landspitali -The National University Hospital of Iceland (LUH) is a regional hospital for the greater Reykjavík area and a tertiary referral center for the entire Icelandic nation. The nation's only nephrology clinic is located at LUH, and all laboratory tests for the primary care clinics of the greater Reykjavik area are performed in the hospital's laboratories. We obtained results of all serum creatinine, uric acid, potassium, calcium and phosphate measurements performed at LUH during the period 1993-2012 (Supplementary Material, Table S7). We also obtained all measurements of these traits from The Icelandic Medical Center (Laeknasetrid) Laboratory in Mjodd (RAM), a private outpatient clinic, during the period from 2004 to 2012 and from the Department of Clinical Biochemistry, Akureyri Hospital, the regional hospital of the north of Iceland, during the period from 2004 to 2010 (Supplementary Material,  Table S2). We did not observe a difference between laboratories in the distribution of age or sex, but SCr measurements from LUH, the only tertiary care hospital among the three, were on average higher than those from the other two institutions. All quantitative measurements were quantile -quantile standardized to a standard normal distribution and then adjusted for sex and the age at measurement. For individuals with multiple measurements, the mean adjusted measurement was used. Only SCr measurements of individuals that had reached the age of 18 years were used in the analysis.
We estimated the GFR using the MDRD Study equation: eGFR (ml/min/1.73 m 2 ) ¼ 175 × (S cr ) 21.154 × (Age) 20.203 × (0.742 if female) × (1.212 if black). CKD was defined as eGFR ,60 ml/min/1.73 m 2 as for at least 3 months. When defining CKD, all SCr measurements obtained during an episode of acute kidney injury were excluded and individuals who had eGFR ,60 ml/min/1.73 m 2 for ,3 months duration were excluded from the CKD sample set. A total of 15 594 CKD cases were identified using this method. Other chip-typed individuals who had not been diagnosed with CKD, and first-and second-degree relatives of chip-typed individuals were used as controls.
The 6736 Icelandic kidney stone disease cases consisted of patients with confirmed radiopaque kidney stones from the Icelandic Kidney Stone Disease Registry at LUH.
The study was approved by the Icelandic Data Protection Authority and the National Bioethics Committee. All participating subjects that donated blood also signed informed consent. Personal identities of the patients and biological samples were encrypted by a third party system provided by the Icelandic Data Protection Authority.
The full GWAS results provide specific information on the genetic characteristics of the whole Icelandic nation, precluding full disclosure of the data. Association results for all variants with P-value of ,10 24 , MAF .0.01 and imputation information score higher than 0.95 are shown in Supplementary Material, Table S9.

Illumina chip genotyping
Icelandic chip-typed samples were assayed using the Illumina HumanHap300, HumanCNV370, HumanHap610, Human-Hap1M, HumanHap660, Omni-1, Omni 2.5 or Omni Express bead chips at deCODE genetics. SNPs were excluded if they had (i) yield ,95%, (ii) MAF ,1% in the population or (iii) significant deviation from Hardy -Weinberg equilibrium in the controls (P , 0.001), (iv) if they produced an excessive inheritance error rate (over 0.001) or (v) if there was substantial difference in allele frequency between chip types (from just a single chip if that resolved all differences, but from all chips otherwise). All samples with a call rate below 97% were excluded from the analysis. For the HumanHap series of chips, 304 937 SNPs were used for long-range phasing, whereas for the Omni series of chips 564 196 SNPs were included. The final set of SNPs used for long-range phasing was composed of 707 525 SNPs.

Whole-genome sequencing
Whole-genome sequencing was performed for 2230 Icelanders, selected for various conditions. All the individuals were sequenced at a depth of at least 10× (average sequencing depth ¼ 22×).
Template DNA fragments were hybridized to the surface of flow cells [GA PE cluster kit (v2) or HiSeq PE cluster kits (v2.5 or v3)] and amplified to form clusters using the Illumina cBot. In brief, DNA (2.512 pM) was denatured, followed by hybridization to grafted adaptors on the flow cell. Isothermal bridge amplification using Phusion polymerase was then followed by linearization of the bridged DNA, denaturation, blocking of 3 ′ ends, and hybridization of the sequencing primer. Sequencing-by-synthesis (SBS) was performed on Illumina GAIIx and/or HiSeq 2000 instruments. Paired-end libraries were sequenced at 2 × 101 (HiSeq) or 2 × 120 (GAIIx) cycles of incorporation and imaging using the appropriate TruSeq TM SBS kits. Each library or sample was initially run on a single GAIIx lane for QC validation followed by further sequencing on either GAIIx ( 3 4 lanes) or HiSeq ( 3 1 lane) with targeted raw cluster densities of 5 00 800K/mm 2 , depending on the version of the data imaging and analysis packages (SCS2.6-2-9/ RTA1.6-1.9, HCS1.3.8-1.4.8/RTA1.10.36-1.12.4.2). Real-time analysis involved the conversion of image data to base-calling in real time.
Generation of whole-genome genotype data SNP and INDEL calling from the whole-genome sequence data of the 2230 Icelanders and generation of imputed genotypes have been described previously (44). Briefly, the genotypes identified were imputed into 81 656 chip-genotyped and long-rangephased Icelanders with SCr measurements. Probabilities of genotypes were furthermore predicted for 112 630 relatives of chip-typed individuals with SCr measurements.

Association testing
A generalized form of linear regression that accounts for the relatedness between individuals was used to test for the association of quantitative traits, such as SCr and serum uric acid, with sequence variants (44) (Supplementary Material, Methods). Conditional analysis was performed by including the sequence variant being conditioned on as a covariate in the model under the null and the alternative in the generalized linear regression.
Logistic regression was used to test for association between sequence variants and disease (CKD and kidney stone disease), treating disease status as the response and genotype counts as covariates. Other available individual characteristics that correlate with disease status were also included in the model as nuisance variables. These characteristics were: sex, county of birth, current age or age at death (first-and second-order terms included), blood sample availability for the individual and an indicator function for the overlap of the lifetime of the individual with the timespan of phenotype collection (Supplementary Material, Methods).
The genomic inflation factor for SCr was l GC ¼ 1.91. This large inflation factor is certainly, in part, due to the large sample size and the polygenic nature of the genetic architecture that affects SCr (45). The correction may therefore be overly conservative.

Genotype imputation information
The informativeness of genotype imputation was estimated by the ratio of the variance of imputed expected allele counts and the variance of the actual allele counts: where u [ {0, 1} is the allele count. Var[E(u|chip data)] was estimated by the observed variance of the imputed expected counts and Var(u) was estimated by p(1 2 p), where p is the allele frequency. Sequence variants with imputation information below 0.8 were excluded from the analysis.

Gene and variant annotation
For the annotation of the exome data, coordinates of variants were converted between hg18 and hg19 using the liftOver tool from UCSC (46). Variants in hg19 coordinates were annotated with information from Ensembl release 70 using the Variant Effect Predictor (VEP) version 2.8 (47). Only protein-coding transcripts from RefSeq Release 56 (48) were considered.

Correction for relatedness of the Icelandic subjects and genomic control
Individuals in both the Icelandic case and control groups are related, causing the x 2 test statistic to have a mean of .1 and median of .0.675. We estimated the inflation factor l g based on a subset of about 300 000 common variants and the P-values adjusted by dividing the corresponding x 2 values by this factor to adjust for both relatedness and potential population stratification (49).

Thresholds for genome-wide significance
We weighed sequence variants according to their prior probability of affecting gene function by applying thresholds for genome-wide significance that depend on the variant class. The Bonferroni correction for multiple testing can be adjusted to account for prior importance of sequence variants. We performed a weighted Holm -Bonferroni correction based on giving equal weight to the classes of LoF, MSSNS and other variants (50) (Supplementary Material, Table S1). For example, sequence variants in the LoF class get weight 1 3x6476 . The sum of these weights over all the variants in the genome is 1, and the Bonferroni threshold for significance within a class containing m sequence variants will be 0.05 3m .

SUPPLEMENTARY MATERIAL
Supplementary Material is available at HMG online.