De novo mutations implicate novel genes with burden of rare variants in Systemic Lupus Erythematosus

The omnigenic model of complex diseases stipulates that the majority of the heritability will be explained by the effects of common variation on genes in the periphery of core disease pathways. Rare variant associations, expected to explain far less of the heritability, may be enriched in core disease genes and thus will be instrumental in the understanding of complex disease pathogenesis and their potential therapeutic targets. Here, using complementary whole-exome sequencing (WES), high-density imputation, and in vitro cellular assays, we identify three candidate core genes in the pathogenesis of Systemic Lupus Erythematosus (SLE). Using extreme-phenotype sampling, we sequenced the exomes of 30 SLE parent-affected-offspring trios and identified 14 genes with missense de novo mutations (DNM), none of which are within the >80 SLE susceptibility loci implicated through genome-wide association studies (GWAS). In a follow-up cohort of 10,995 individuals of matched European ancestry, we imputed genotype data to the density of the combined UK10K-1000 genomes Phase III reference panel across the 14 candidate genes. We identify a burden of rare variants across PRKCD associated with SLE risk (P=0.0028), and across DNMT3A associated with two severe disease prognosis sub-phenotypes (P=0.0005 and P=0.0033). Both genes are functional candidates and significantly constrained against missense mutations in gene-level analyses, along with C1QTNF4. We further characterise the TNF-dependent functions of candidate gene C1QTNF4 on NF-κB activation and apoptosis, which are inhibited by the p.His198Gln DNM. Our results support extreme-phenotype sampling and DNM gene discovery to aid the search for core disease genes implicated through rare variation. Significance Statement Rare variants, present in <1% in population, are expected to explain little of the heritability of complex diseases, such as Systemic Lupus Erythematosus (SLE), yet are likely to identify core genes crucial to disease mechanisms. Their rarity, however, limits the power to show their statistical association with disease. Through sequencing the exomes of SLE patients and their parents, we identified non-inherited de novo mutations in 14 genes and hypothesised that these are prime candidates for harbouring additional disease-associated rare variants. We demonstrate that two of these genes also carry a significant excess of rare variants in an independent, large cohort of SLE patients. Our findings will influence future study designs in the search for the ‘missing heritability’ of complex diseases.

We identify a burden of rare variants across PRKCD associated with SLE risk (P=0.0028), 50 and across DNMT3A associated with two severe disease prognosis sub-phenotypes 51 (P=0.0005 and P=0.0033). Both genes are functional candidates and significantly 52 constrained against missense mutations in gene-level analyses, along with C1QTNF4. We 53 further characterise the TNF-dependent functions of candidate gene C1QTNF4 on NF-κB 54 activation and apoptosis, which are inhibited by the p.His198Gln DNM. Our results support 55 extreme-phenotype sampling and DNM gene discovery to aid the search for core disease 56 genes implicated through rare variation. 57

Significance Statement 58
Rare variants, present in <1% in population, are expected to explain little of the heritability of 59 complex diseases, such as Systemic Lupus Erythematosus (SLE), yet are likely to identify 60 core genes crucial to disease mechanisms. Their rarity, however, limits the power to show 61

Introduction 69
Considerable progress has been made in elucidating the genetic basis of complex diseases. 70 The vast majority of identified disease-associated genetic polymorphisms are common in the 71 population and the risk alleles impart a modest individual increment to the likelihood of 72 developing disease. Although large-scale genome-wide association studies (GWAS) have so 73 far explained less of the heritability than originally predicted (1), much of the 'missing 74 heritability' is expected to be accounted for by common variants with effect sizes below the 75 genome-wide significance threshold (2). However, under the newly proposed omnigenic 76 model of complex traits, the majority of associated common variantsboth identified and 77 unidentified -will primarily be found in periphery genes expressed in relevant cell types but 78 not necessarily biologically relevant to disease (3). 79 In contrast, the role of rare variants in complex disease is largely unknown and often 80 dismissed. A recent study, however, with an extremely large sample size, identified rare and 81 low frequency variants contributing to the genetic variance of adult human height (4)a 82 polygenic trait with a genetic architecture similar to that of complex diseases (5) -suggesting 83 previous complex disease studies with seemingly large sample sizes were perhaps still 84 insufficiently powered to detect rare variant associations (6). Furthermore, studies of rare 85 variants typically find gene sets enriched in biologically relevant functions/pathways (3, 7, 8). 86 Therefore, although estimated to explain less of the heritable disease risk at a population 87 importance to understanding disease pathogenesis as they are likely to implicate biologically 89 relevant core genes (3). Supporting the theory that common and rare variant associations 90 will be found in discrete gene sets is the lack of additional rare variant associations in GWAS 91 genes (9). 92 Exome-wide searches, which provides a highly enriched source of potential disease-causing 93 mutations (10), have revealed limited numbers of rare variation associated with complex 94 diseases. Even though greater statistical power is achieved by gene-level analyses whereby 95 aggregated variants are tested for an allelic burden of collective rare variation, widely used 96 gene-based association tests have been shown to lack power at the exome-wide level (11). 97 Coupled with the insufficient sample sizes currently available in the study of most complex 98 diseases, hypothesis-free searches for core genes with rare variant associations are unlikely 99 to be fruitful. 100 Our strategy to address this problem in autoimmune disease Systemic Lupus Erythematosus 101 (SLE), is outlined here and summarised in Fig. 1. Using a discovery cohort of 30 unrelated 102 SLE cases with a severe disease (young age of onset and clinical features associated with 103 poorer outcome), we hypothesized that these individuals would exhibit unique mutation 104 events in their protein-coding DNA that may predisposed to disease risk. We undertook 105 whole exome sequencing (WES) in 30 family trios (both parents and affected offspring) and 106 scrutinized the data for non-inherited de novo mutations (DNM) in the individual with SLE to 107 identify a group of candidate genes for an independent follow-up rare variant analysis. This 108 method allowed the identification of novel loci harbouring disease risk through collective rare 109 variation, and emphasises the value of phenotypic extremes in the search for core genes in 110 multifactorial disorders (12). 111 112

113
of 30 family trios with an affected offspring with more severe SLE (Fig. S1). A total of 115 584,798 variants (>20X), including single nucleotide variants and indels, were identified in 116 the 30 affected probands. Using three bioinformatic tools and employing conservative 117 parameters, 17 putative missense DNM were identified across 17 genes (Table S1; Fig. S2). 118 We also analysed the SLE proband WES data alone, without the unaffected parents. This 119 revealed 1,194 non-silent, heterozygous, rare variants in 1,067 genes distributed across the 120 genome, which would make prioritisation for downstream analysis a difficult task, highlighting 121 the benefit of parent-offspring trio sequencing (Fig. S3). Sanger sequencing confirmed 14 122 true positive non-silent DNM (Table 1; Table S2), present in the SLE proband but absent in 123 both parents and any unaffected siblings, in 11 of the 30 probands (36.7%) for further 124 analysis. No DNM was found in any of the >80 known SLE-associated genes. Of the three 125 false positive DNM (11.7%; Table S1) (Table  144 1). We further explored the function, expression (BioGPS), existing autoimmunity 145 associations (ImmunoBase), and gene-level constraint against missense mutations (ExAC), 146 of the DNM genes to build a profile of a priori evidence of a role in SLE pathogenesis. None 147 of the candidate genes have been previously associated with SLE through GWAS in any 148 population (17). We also identify candidate genes through known/predicted function and 149 expression profiles (C1QTNF4, SRRM2, HMSD), and four genes (PRKCD, DNMT3A, 150 C1QTNF4 and LRP1) with a significant (Z>3.09) constraint against missense variants (Table  151 2). However, across the entire gene set, there was no difference in the median Z-score 152 (MAF<1%) exonic variants in SLE cases compared with healthy controls. We identify an 166 association of PRKCD rare variants with SLE (Table S3; P=0.0028; ncases=4,036). In sub-167 phenotype analyses, we identify collective rare exonic variants in DNMT3A associated with 168 both anti-dsDNA (Table S3; P=0.0005; ncases=1,261) and renal involvement with 169 hypocomplementemia (Table S3; P=0.0033; ncases=186), both of which are markers of more 170 severe disease. We also collapsed all exons from the 14 genes together to test for an overall 171 burden of rare variants across these loci. These analyses revealed no excess of rare exonic 172 variants across the grouped genes, reflecting the hypothesis that some/most genes will not 173 be relevant to disease status because the observed DNM are random background variation  Table 2). 182 Although gene coding length does not correlate with missense constraint scores (15), the 183 small (<1Kb) coding sequence of this candidate gene may have contributed to insufficient 184 power to detect a rare variant association in the burden testing. On the variant-level, the 185 DNM in C1QTNF4 generates a p.His198Gln sequence change with a modest CADD score 186 of 12.3 (Table 1). Although useful in the absence of suitable functional assays, the sensitivity 187 of bioinformatic prediction tools is known to be suboptimal. Where functional assays are 188 available, previous studies have also demonstrated functional effects of variants predicted to 189 be tolerated/benign (21). We therefore pursued a functional analysis of the p.His198Gln 190 DNM detected in the C1QTNF4 gene as an alternative method to add support for its 191 potential role in disease. Although its function is rather poorly understood, the protein 192 product, C1QTNF4 (CTRP4) is secreted and may act as a cytokine, as it has homology with 193 TNF and the complement component C1q (Fig. 2). C1QTNF4 has been shown to influence 194 looked for an effect of the p.His198Gln mutation on NF-κB production. Using a HEK293-NF-196 κB reporter cell line, we showed that C1QTNF4 p.His198Gln mutant protein was expressed 197 and that it inhibited the NF-κB activation generated by exposure to TNF (Fig. 2). 198 Furthermore, we showed that the fibroblast L929 cell line, which is sensitive to TNF-induced 199 cell death, was rescued by exposure to C1QTNF4 p.His198Gln, but not by wild type 200 C1QTNF4. Thus, the mutant form of C1QTNF4 appears to inhibit some of the actions of TNF 201 (23-25). 202 DNM genes do not harbour common variant associations. We next tested for additional 203 common variant associations at these 14 loci using the high-density UK10K-1000GP3 204 imputed data. No significant association at any locus was observed with overall risk in a 205 case-control comparison, nor with anti-dsDNA (ncases=1,261) or renal-involvement with 206 hypocomplementemia (ncases=186) sub-phenotypes (Table S4). The lack of an associated 207 common variant within PRKCD and DNMT3A supports the hypothesis that discrete gene 208 sets will be identified through rare and common variant associations, with the former 209 expecting to be enriched for core disease genes (3). 210

Discussion 211
To fully understand the pathogenesis of complex diseases we must analyse the full 212 frequency spectrum of genetic variants (4). The study of rare variants associated with 213 disease is of paramount importance to the discovery of core genes that have the potential to 214 be therapeutic targets (12). Our data support the omnigenic hypothesis that rare genetic risk 215 may be found in a discrete set of non-canonical susceptibility genes, as we report an 216 association of collective rare variation across PRKCD and DNMT3A, and found no evidence 217 of an association with common variants across these loci. This, to the best of our knowledge, 218 is the first WES study in polygenic cases of autoimmune disease to use DNM discovery to 219 identify candidate genes for rare variant analyses. Furthermore, our study supports the 220  (Table S1). 14 DNM genes plus a 2Mb flanking region. To increase the accuracy of imputed genotype 324 genotypes were filtered for confidence using an info score (IMPUTE2) threshold of 0.3 (Fig.  326 S6 and S7). The most likely genotype from IMPUTE2 was taken if its probability was > 0.5. If 327 the probability fell below this threshold, it was set as missing. Variants with >10% missing 328 genotype calls were removed for further analysis. All individuals had <8% missing genotype 329 data. 330 Rare variant burden tests. Imputed data were filtered, using Plink v1.9, to include only 331 variants mapping to coding exons of hg19 RefSeq transcripts. Plink/SEQv1.0 (20) was used 332 to run gene-wise one-tailed burden testing with a MAF<1% threshold. A 5% false discovery 333 rate was used for multiple testing correction for 14 genes.