Abstract

Amyotrophic lateral sclerosis (ALS) is an unrelenting neurodegenerative condition characterized by adult-onset loss of motor neurons. Genetic risk factors have been implicated in ALS susceptibility. Copy number variants (CNVs) account for more inter-individual genetic variation than SNPs and have the capacity to alter gene dose and phenotype. We sought to identify the contribution both of commonly polymorphic CNVs and rare ALS-specific CNVs to sporadic ALS (SALS). Using high-density genome-wide data from 408 Irish individuals and 868 Dutch individuals and the QuantiSNP CNV-detection algorithm, we showed that no common CNV locus is significantly associated with ALS risk. However, we identified 39 recurrent CNV loci and 16 replicated ALS-specific gene dose alterations that occur exclusively in patients with ALS and do not occur in more than 11 000 previously identified CNVs in the Database of Genomic Variation. Ataxin genes and the hereditary haemochromatosis locus were implicated along with ENSG00000176605 , an uncharacterized gene on chromosome 14. Our data support the hypothesis that multiple rare CNVs may contribute risk for SALS. Future work should seek to profile the contribution of CNVs located in regions not covered on the present SNP platforms.

INTRODUCTION

Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disorder characterized by progressive weakness of the limbs and bulbar muscles. The peak age of presentation is in the fifth to seventh decades, and average survival from symptom onset is 3–5 years. At present, the only available therapy is riluzole, which increases survival by ∼3–6 months ( 1 ).

Between 2 and 10% of ALS cases are associated with familial, usually autosomal dominant, inheritance. For ∼20% of familial ALS, mutations have been identified in genes, including Cu/Zn superoxide-dismutase, dynactin 1, alsin, vesicle-associated protein B, senataxin , angiogenin and TAR-DNA ( TDP-43 ) ( 2–9 ). The cause of sporadic ALS (SALS) is less well understood, although it is widely believed that genetic risk factors are involved ( 10 ). Indeed, over the past year genome-wide association studies for SALS have reported initial SNP associations with FLJ10986, ITPR2 and DPP6 ( 11–14 ), and it is likely that a clear picture of the contribution of SNPs to ALS will be uncovered as sample numbers worldwide increase.

In addition to the contribution of SNPs to inter-individual variation across the genome, it is now recognized that insertions or deletions of DNA sequence, typically ranging from 1 kb to several mega bases in size, also contribute to genetic variation and disease susceptibility ( 15 ). These are termed copy number variants (CNVs) and comprise between 4 and 25 Mb of an individual’s genome ( 16 , 17 ). Thus CNVs account for more nucleotide variation between two individuals than SNPs, which account for only 2.5 Mb of the genome ( 17 ). As with SNPs, individual CNV loci may be commonly polymorphic within the population or may occur as inherited or de novo rare mutations ( 15–17 ). Loss (deletion) or gain (duplication) of genomic sequence can influence gene expression levels ( 18 ) or lead to truncated proteins with altered functionality, providing mechanisms by which CNVs may influence disease susceptibility. There are already many known CNV phenotypes—some of these exhibit Mendelian inheritance (for example, duplication of peripheral myelin protein 22 in autosomal dominant Charcot–Marie–Tooth neuropathy type 1A) ( 19 ), whereas others are sporadic and result from de novo CNV mutation (as for example, in Williams syndrome) ( 20 ). Our previous observation that CNV of the survival of motor neuron ( SMN ) genes influences susceptibility to and severity of ALS is a precedent for a role of CNVs in ALS pathogenesis ( 21 ).

The recent development of robust, high-throughput genotyping arrays provides the opportunity to rapidly screen up to one million SNPs across the human genome. Genotyping on these arrays has been used in sporadic ALS (SALS) to examine allelic association at SNPs ( 11–14 ). However, in addition to genotyping calls, the arrays provide a binding intensity measure at each SNP ( 22 ). Low intensity of several contiguous SNPs suggests the presence of a deletion in that region, whereas high intensity at several contiguous SNPs suggests duplication. Together with genotype, these data can be used as an indirect measure to detect the positions of CNVs along each chromosome. This method will identify not only common CNV polymorphisms, but also rare and potentially de novo CNV mutations even when present in a single study participant ( 22 ).

We have recently reported a genome-wide screen of CNVs in a Dutch SALS data set, in which we found an excess of hemizygous gene deletions specific to ALS ( 23 ). In the present work, we adopted a collaborative approach to further characterize the contribution of CNVs to ALS by using genome-wide SNP intensity data from the Irish and Dutch ALS populations and the well-validated CNV detection algorithm QuantiSNP, which can robustly detect even small CNVs ( 24 ). We examined the association of common CNVs with ALS susceptibility and made an effort to identify rare ALS-specific genomic variants.

RESULTS

We genotyped 432 unique Irish individuals, using the Illumina HumanHap 550 Beadchip (Illumina Inc., San Diego, CA, USA), and 911 unique Dutch individuals, using the Illumina HumanHap 300 Beadchip. Genotyping methods and quality control (QC) criteria have been described previously ( 12 , 14 ). B allele frequency (BAF) and log r ratio (LRR) for each SNP were recorded using BeadStudio version 3.0 (Illumina Inc.), and the QuantiSNP algorithm ( 24 ) was utilized to make CNV calls. After applying the QC criteria described in Materials and Methods, genome-wide CNV data of 408 Irish individuals (comprising 206 patients with SALS and 202 control subjects) and 868 individuals from The Netherlands (comprising 445 patients with SALS and 423 control subjects) were included in the final analysis. Table  1 shows the characteristics of the study participants.

Table 1.

Characteristics of the Irish and The Netherlands study populations

 Total  Male, n (%)   Female, n (%)  Age at onset, median (range)  Spinal onset, n (%)   Bulbar onset, n (%)  
Ireland       
 Patients with SALS 206 112 (54) 94 (46) 62 (23–90) 147 (71) 59 (29) 
 Controls 202 108 (53) 94 (47) 59 (25–84) — — 
The Netherlands       
 Patients with SALS 445 262 (59) 183 (41) 62 (23–87) 306 (69) 139 (31) 
 Controls 423 249 (59) 174 (41) 61 (23–87) — — 
 Total  Male, n (%)   Female, n (%)  Age at onset, median (range)  Spinal onset, n (%)   Bulbar onset, n (%)  
Ireland       
 Patients with SALS 206 112 (54) 94 (46) 62 (23–90) 147 (71) 59 (29) 
 Controls 202 108 (53) 94 (47) 59 (25–84) — — 
The Netherlands       
 Patients with SALS 445 262 (59) 183 (41) 62 (23–87) 306 (69) 139 (31) 
 Controls 423 249 (59) 174 (41) 61 (23–87) — — 

In total, 5017 CNVs were called using 550 K array data in the 408 Irish participants (on average 12.3 CNVs per genome) and 4103 CNVs were called using 300 K array data in the 868 Dutch participants (on average 4.7 CNVs per genome). Table  2 shows, for each CNV type, the mean number of calls per individual and the median CNV size. As was expected, the algorithm detected more CNVs per individual, using the 550 K data compared with the 300 K data, due to the higher marker density on the 550 K platform. However, the number of CNV calls per individual did not differ between patients and controls within the Irish or Dutch study group. Interestingly, the median length of hemizygous deletions was greater in patients with ALS compared with controls in both populations. In contrast, the length of duplications and homozygous deletions did not differ between groups.

Table 2.

Characteristics of detected CNVs

  Ireland—550 K array data
 
The Netherlands—300 K array data
 
 ALS Control P -value a ALS Control P -value a 
Duplications       
 Total 689 726  891 883  
 Mean per individual (range) 3.3 (0–9) 3.6 (0–9) 0.171 2.0 (0–6) 2.2 (0–6) 0.428 
 Median size, kb (range) 76.4 (0.1–3261.7) 71.8 (0.1–1578.9) 0.602 93.4 (0.2–2534.6) 98.7 (0–2914.7) 0.527 
Heterozygous deletions       
 Total 1549 1699  1177 1079  
 Mean per individual (range) 7.6 (0–16) 8.4 (1–18) 0.006 2.71 (0–8) 2.63 (0–8) 0.467 
 Median size, kb (range) 21.9 (0.4–214.8) 17.4 (0.2–214.8) <0.0001 44.1 (0.6–2240.6) 37.6 (0.6–1448.1) 0.011 
Homozygous deletions       
 Total 188 166  44 29  
 Mean per individual (range) 0.91 (0–7) 0.82 (0–5) 0.59 0.10 (0–2) 0.07 (0–1) 0.242 
 Median size, kb (range) 8.8 (0.1–220.1) 5.99 (0.1–606.1) 0.609 12.5 (0.2–1626.9) 24.1 (0–190.3) 0.632 
  Ireland—550 K array data
 
The Netherlands—300 K array data
 
 ALS Control P -value a ALS Control P -value a 
Duplications       
 Total 689 726  891 883  
 Mean per individual (range) 3.3 (0–9) 3.6 (0–9) 0.171 2.0 (0–6) 2.2 (0–6) 0.428 
 Median size, kb (range) 76.4 (0.1–3261.7) 71.8 (0.1–1578.9) 0.602 93.4 (0.2–2534.6) 98.7 (0–2914.7) 0.527 
Heterozygous deletions       
 Total 1549 1699  1177 1079  
 Mean per individual (range) 7.6 (0–16) 8.4 (1–18) 0.006 2.71 (0–8) 2.63 (0–8) 0.467 
 Median size, kb (range) 21.9 (0.4–214.8) 17.4 (0.2–214.8) <0.0001 44.1 (0.6–2240.6) 37.6 (0.6–1448.1) 0.011 
Homozygous deletions       
 Total 188 166  44 29  
 Mean per individual (range) 0.91 (0–7) 0.82 (0–5) 0.59 0.10 (0–2) 0.07 (0–1) 0.242 
 Median size, kb (range) 8.8 (0.1–220.1) 5.99 (0.1–606.1) 0.609 12.5 (0.2–1626.9) 24.1 (0–190.3) 0.632 

a Two-tailed Mann–Whitney U test.

To further assess the role of CNVs in ALS, we examined pooled CNV data from our populations and examined both association of common CNVs and the occurrence of novel CNVs. Because breakpoints of CNVs at the same locus may vary between individuals, we examined gain and loss of copy number at each involved SNP using pooled data from the two study populations and tested for association with ALS. We detected 26 SNP copy number gains, corresponding to five chromosomal regions ( Supplementary Material, Table S1 ), and 58 SNP copy number losses, corresponding to 16 chromosomal regions ( Supplementary Material, Table S2 ), which showed nominal association with ALS at P -values <0.05 in the pooled data set. However, after Bonferroni correction, none of these remained significantly associated.

To identify novel CNVs in our combined data set, we compared our data with the Database of Genomic Variants (DGV), available at http://projects.tcag.ca/variation , which contains more than 11 000 CNVs detected in more than 2500 individuals ( 15 ). To avoid inclusion of false positives, we searched for recurrent novel CNVs, i.e. overlapping CNVs that were present in at least two individuals. We detected 89 recurrent CNVs mapping to 39 novel genomic loci that were exclusive to ALS patients (i.e. at CNV loci that had not been previously reported in the DGV, nor in the controls in our study) ( Supplementary Material, Table S3 ). In 14 of these 39 recurrent CNV loci (36%), a CNV was detected in both Irish and Dutch ALS patients.

Finally, we investigated genes that were deleted or duplicated exclusively in ALS patients (i.e. not altered in any controls in our populations and not reported in the DGV). We corroborated our previously reported finding that there was an excess of ALS-specific deleted genes in the Dutch population, compared with controls: 23% of deleted genes in ALS patients were exclusive to ALS patients, whereas only 14% of deleted genes observed in controls were exclusive to controls ( P = 5.4 × 10 −5 , χ2 test). We did not observe this in the Irish population: the proportion of deleted genes that was exclusive to either ALS patients or controls was 22% in both Irish ALS patients and controls. We identified 16 genes bearing loss or gain of copy number exclusively among ALS patients, replicated in both the Irish and Dutch analyses (Table  3 ). Among these, there was one uncharacterized gene on chromosome 14 ( ENSG00000176605) , where a hemizygous deletion was observed in six individual patients. Two neighbouring genes on chromosome 8 showed copy loss in five ALS patients and copy gain in one patient. The remaining ALS-specific genes showed deletions or duplications in two or three patients.

Table 3.

Genes found to have ALS-specific alteration of copy number, replicated in both the Irish and Dutch populations

Ensembl gene ID a Gene Putative function Chr  Ireland
 
The Netherlands
 
Total 
    Number of ALS patients Copy number phenotype Number of ALS patients Copy number phenotype  
ENSG00000176605 C14orf177 Uncharacterized 14 Loss Loss 
ENSG00000104518 GSDMDC1 Tyrosine phosphorylation Gain and loss Loss 
ENSG00000173201  Uncharacterized Gain and loss Loss 
ENSG00000101846 STS Arylsulphatase 1 (female) Gain 2 (males) Gain and loss 
ENSG00000130021 HDHD1A Uncharacterized 1 (female) Gain 2 (males) Gain and loss 
ENSG00000134538 SLCO1B1 Xenobiotic and drug detoxification solute carrier 12 Loss Gain 
ENSG00000077713 SLC25A43 Mitochondrial solute carrier widely expressed in CNS 1 (male) Loss 1 (male) Loss 
ENSG00000123454 DBH Converts dopamine to noradrenalin in nerve cells Loss Loss 
ENSG00000151963  Uncharacterized 12 Gain Gain 
ENSG00000156110 ADK Catalyses phosphate transfer to adenosine 10 Loss Loss 
ENSG00000174373 GARNL1 Variants previously implicated in microcephaly and Fahr disease 14 Loss Loss 
ENSG00000186440 OR6P1 Olfactory receptor gene family member Gain Gain 
ENSG00000201674  Uncharacterized 1 (female) Loss 1 (male) Gain 
ENSG00000207220  Uncharacterized 1 (female) Loss 1 (male) Gain 
ENSG00000208843  Uncharacterized 1 (female) Loss 1 (male) Gain 
ENSG00000212611  Uncharacterized 1 (female) Loss 1 (male) Gain 
Ensembl gene ID a Gene Putative function Chr  Ireland
 
The Netherlands
 
Total 
    Number of ALS patients Copy number phenotype Number of ALS patients Copy number phenotype  
ENSG00000176605 C14orf177 Uncharacterized 14 Loss Loss 
ENSG00000104518 GSDMDC1 Tyrosine phosphorylation Gain and loss Loss 
ENSG00000173201  Uncharacterized Gain and loss Loss 
ENSG00000101846 STS Arylsulphatase 1 (female) Gain 2 (males) Gain and loss 
ENSG00000130021 HDHD1A Uncharacterized 1 (female) Gain 2 (males) Gain and loss 
ENSG00000134538 SLCO1B1 Xenobiotic and drug detoxification solute carrier 12 Loss Gain 
ENSG00000077713 SLC25A43 Mitochondrial solute carrier widely expressed in CNS 1 (male) Loss 1 (male) Loss 
ENSG00000123454 DBH Converts dopamine to noradrenalin in nerve cells Loss Loss 
ENSG00000151963  Uncharacterized 12 Gain Gain 
ENSG00000156110 ADK Catalyses phosphate transfer to adenosine 10 Loss Loss 
ENSG00000174373 GARNL1 Variants previously implicated in microcephaly and Fahr disease 14 Loss Loss 
ENSG00000186440 OR6P1 Olfactory receptor gene family member Gain Gain 
ENSG00000201674  Uncharacterized 1 (female) Loss 1 (male) Gain 
ENSG00000207220  Uncharacterized 1 (female) Loss 1 (male) Gain 
ENSG00000208843  Uncharacterized 1 (female) Loss 1 (male) Gain 
ENSG00000212611  Uncharacterized 1 (female) Loss 1 (male) Gain 

a These 16 genes showed copy number variation only in ALS patients and were observed in both the Irish and Dutch ALS populations. Copy number variation involving these genes was not observed in the 625 Irish and Dutch controls, nor among >2500 controls reported in the DGV.

DISCUSSION

We conducted a genome-wide search for ALS-associated and ALS-specific CNVs in 651 patients with SALS and 625 control subjects drawn from the Irish and Dutch populations. We identified CNV loci exhibiting recurrent CNV uniquely in patients with ALS and identified genes with replicated ALS-specific dose alterations between our populations.

The recent advent of genome-wide approaches, facilitating characterization of the genome at much higher resolution than hitherto, has revealed the presence of a rich mosaic of submicroscopic chromosomal alterations, including segments that are deleted, duplicated, inverted in orientation, inserted and translocated ( 15–18 , 25 ). The influence of these genetic variants on human disease remains understudied, and it has been suggested that many smaller variations remain to be discovered ( 26–28 ). To follow our initial characterization of CNVs in ALS ( 23 ), here we utilized QuantiSNP, a validated and sensitive algorithm for CNV detection using SNP array data, and describe the potential contribution of CNVs to ALS pathogenesis at higher resolution than hitherto and in two independent populations.

We took two approaches to uncover rare CNV mutations of potential relevance to ALS. The first, based on overlapping CNVs, implicated 39 genomic loci as novel, recurrent and ALS-specific. Several of the genes affected by these variations are biologically plausible candidates in ALS susceptibility, including ataxin 1 ( ATXN1 ) , ataxin 3-like protein ( ATXN3L ) and the hereditary haemochromatosis gene ( HFE ). For example, variations in HFE significantly increased the risk of SALS in a number of different populations ( 29 ). Furthermore, we particularly highlight the deletion of ataxin 1 ( ATXN1 ) in two Irish SALS patients and a duplication of the related gene ataxin 3-like protein ( ATXN3L ) in a Dutch SALS patient. The occurrence of an expanded polyglutamine repeat in ATXN1 is the cause of autosomal dominant spinocerebellar ataxia type 1 (SCA1) ( 30 ), a neurological disease with phenotypic similarities to ALS. Over-expression of polyglutamine-binding protein 1 ( PQBP-1 ), which interacts with ATXN1 , induces a progressive motor neuron disease phenotype in mice ( 31 ). Furthermore, we have recently noted strong association of another ataxin variant in our SALS populations ( 12 , 14 ). rs1551960, an intronic marker in ataxin-2-binding protein 1 ( A2BP1 ), showed an allelic association in both the Irish genome-wide SNP study (uncorrected P -value 4.7 × 10 −5 ) and the Dutch genome-wide SNP study (uncorrected P -value 0.007). The combined P -value for this SNP in the two populations is 8.4 × 10 −6 . A2BP1 is a trans -Golgi network protein which binds the C-terminus of ataxin 2 and modifies the pathology of spinocerebellar ataxia type 2 ( 32 , 33 ). Taken together, our findings raise the intriguing possibility that the ataxins and functionally related genes may influence susceptibility for ALS.

Our second approach to identify rare variants focused on gene dose alterations that were exlusively observed in ALS patients. We previously reported an excess of ALS-specific deleted genes in the Dutch population. This finding was corroborated when re-analysing our data with QuantiSNP, making a spurious finding due to CNV detection methodology unlikely. However, in the Irish population, we did not observe a similar finding. This could be explained by sampling issues due to the relatively low number of patients, or by genetic heterogeneity between the two populations. Our previously reported finding of extended linkage disequilibrium and increased genetic homogeneity in the Irish population could support the latter ( 14 ). Because the finding of a rare variant in a disease population can be explained by chance, here we identified genes that showed a gene dose alteration exclusively in ALS patients replicating in both Irish and Dutch populations. Moreover, deletion CNVs were larger in both populations in ALS patients compared with controls, suggesting that deletions in ALS are more detrimental. Of the 16 gene alterations identified as replicate, seven have an identical effect on gene dose in each population. Among these, the most compelling result, being present in six ALS patients, was hemizygous loss of ENSG00000176605 , a gene of unknown expression and function.

Common CNV polymorphisms seem not to alter risk for developing ALS. However, as the present generation of genotyping arrays do not include regions which are technically difficult to genotype or prone to genomic rearrangements, there is a bias in coverage away from areas with the highest CNV polymorphism rate ( 22 ). For example, the previously reported SMN genes are not covered by the array. For this reason, we have focused on the discovery of rare and potentially pathogenic CNVs. Although it is not unlikely that these ALS-specific CNVs may ultimately be found also in controls, we reported putative mutations only where an observation was made in two or more individuals with ALS and we have further censored our data for CNVs reported in the DGV. As ALS is a late-onset disease, we cannot be certain whether the use of the DGV in this way might have excluded ALS-related CNVs which were ascertained in younger individuals prior to disease onset. Similarly, the late onset of ALS precludes genotyping of the parents of affected individuals to differentiate inherited versus de novo CNVs, as has been accomplished, for example, in CNV studies on autism ( 34 ).

Genomic structural variants of all sizes and types may act as substrates for disease risk ( 26 ). Reliance on commercial SNP arrays for the detection of these variations thus has limitations which must be considered when interpreting our findings. First, it is recognized that the lack of ‘breakpoint resolution’ on these arrays means that detected CNVs may include more genomic sequence than between the first and final involved SNP ( 26 ). This prevents delineation of the true extent of any given CNV and hence valid allele frequency counts. In addition, the array data cannot detect the potential influence of balanced rearrangements, such as translocations and segment inversions. Finally, owing to the low density of Y chromosome markers, no CNV calls are made for the Y chromosome.

In conclusion, we further characterized the contribution of genomic CNVs in SALS in the Irish and Dutch populations. Using the presently available sample size and genotyping platforms, no common CNVs appear associated with SALS risk. Our data support our previous finding that rare CNVs may have an important role in ALS pathogenesis. Future studies will be necessary to confirm these observations in additional populations and to explore those genomic structural variants, which by their location or nature have not yet been addressed in ALS research.

MATERIALS AND METHODS

Participants

The final Irish study population comprised 206 ALS patients and 202 controls; the final Dutch population comprised 445 ALS patients and 423 controls (Table  1 ). The Irish DNA samples were collected at the Beaumont Hospital in Dublin, Ireland, and the Dutch DNA samples were collected at the ALS centre in the University Hospital in Utrecht, The Netherlands. In both Ireland and The Netherlands, all patients fulfilled the 1994 El Escorial criteria ( 35 ) for probable or definite SALS and were phenotyped by physicians with expertise in ALS. Patients with a family history of ALS have been excluded from the study. For QC, participants with El Escorial ‘possible ALS’ and those with atypical clinical features are not included. At both centres, patients are followed prospectively and any subjects with revision to an alternative diagnosis removed. However, no patients genotyped in the present experiments were removed for these reasons. Control DNA samples were collected from healthy, unrelated neurologically normal individuals, either spouses of ALS patients or those accompanying non-ALS patients. Controls were frequency-matched for age, gender and ethnicity. In the Irish population, all participants were of self-declared Irish-Caucasian ethnicity for at least three generations. The Dutch participants were of self-declared Dutch descent, with all four grandparents originating from The Netherlands. All participants gave written informed consent, and local ethics review boards approved all procedures.

Procedures

Genomic DNA was isolated from fresh venous blood using standard procedures and used in this form for both the Irish and Dutch experiments. No samples had been immortalized in cell lines prior to assay. The Irish samples were genotyped using Illumina HumanHap 550 Beadchips, which assay 561 466 SNPs. The Dutch samples were genotyped using Illumina HumanHap 300 Beadchips, which assay 317 503 SNPs. All experiments were performed according to the manufacturer’s protocol. Stringent QC has been applied to the samples in our study as described previously ( 12 , 14 ): in particular, samples with call rates <97.5% (Irish population) or 95% (Dutch population) and samples that showed cryptic relatedness had been excluded.

CNV detection

Putative CNVs were identified using the QuantiSNP algorithm ( 24 ). In brief, QuantiSNP is a hidden Markov model-based algorithm for high-resolution CNV detection that uses log r ratio (LRR) values (a normalized measure of the signal intensity for each SNP) and BAF values (the frequency of the minor allele for each SNP) to generate CNV calls. As a measure of confidence, a Bayes factor is computed for each CNV. A correction for local differences in GC content is implemented in the algorithm to correct for irregularities in signal intensity.

We computed LRR and BAF values with the Beadstudio v3.0 genotyping module (Illumina Inc.) and analysed these with QuantiSNP using default settings. We discarded all CNV calls with a Bayes factor value <10 ( 24 ). In order to minimize the number of false-positive calls, we also excluded samples that fulfilled one of the following criteria: samples with a standard deviation of LRR > 0.3 after GC correction; samples with a standard deviation of BAF > 0.15 after GC correction and samples that were outliers in terms of number of CNV calls [expressed as upper quartile + 1.5 × (interquartile range)]. In total, 26 samples from the Irish study and 43 samples from the Dutch study failed QC criteria. The proportion of excluded cases versus controls was not significant in Irish and Dutch populations ( P = 0.25 in the Irish population, P = 0.07 in the Dutch population, χ2 test). Finally, we excluded CNV calls that spanned the centromere, because the low density of probes in these regions leads to inaccurately called CNVs.

We determined false-positive and false-negative rates using two methods. First, we performed replicate experiments on five samples that were genotyped twice and analysed the data using QuantiSNP. In three samples genotyped on the 300 K platform, 18 CNVs were detected with a Bayes factor >10. In the replicates, 16 out of 18 CNVs (89%) were called with a Bayes factor >10, one CNV (5%) was called with a Bayes factor <10 and one CNV (5%) was not detected in the replicate experiment. In two samples genotyped on the 550 K platform, 15 CNVs were called with Bayes factor >10. In the two replicates, 13 (87%) of these were called with a Bayes factor >10, one (6%) was called with a Bayes factor <10 and one (6%) was not detected in the replicate experiment.

As a second method to determine the performance of the algorithm, we applied the algorithm on data sets of simulated CNVs. For this purpose, we constructed artificial CNVs from LRR and BAF values from three large, qPCR-validated CNVs: a one-copy duplication, a one-copy (hemizygous) deletion and a two-copy (homozygous) deletion. We constructed chromosomes of randomly shuffled diploid SNPs and inserted series varying from 2 to 10 SNPs from the biologically confirmed CNVs. This way we simulated 500 CNVs per CNV length for each CNV type. We applied QuantiSNP on the simulated data and determined the false-positive and false-negative counts. At the Bayes factor cut-off value of 10, QuantiSNP detected zero false-positive CNVs per 860 085 SNPs (45 simulated chromosomes of 19 113 SNPs). The false-negative readings for each CNV length are depicted in Figure  1 .

Figure 1.

Performance of the CNV detection algorithm QuantiSNP. CNVs of different lengths were simulated by inserting LRR and BAF values from validated CNVs into artificial chromosomes of randomly shuffled diploid SNPs. Homozygous deletions (zero copy), hemizygous deletions (one copy) and hemizygous duplications (three copies) were modelled. For each CNV length, five chromosomes containing 100 CNVs were constructed. Data points are the mean percentage of detected CNVs ± standard deviation of five chromosomes.

Figure 1.

Performance of the CNV detection algorithm QuantiSNP. CNVs of different lengths were simulated by inserting LRR and BAF values from validated CNVs into artificial chromosomes of randomly shuffled diploid SNPs. Homozygous deletions (zero copy), hemizygous deletions (one copy) and hemizygous duplications (three copies) were modelled. For each CNV length, five chromosomes containing 100 CNVs were constructed. Data points are the mean percentage of detected CNVs ± standard deviation of five chromosomes.

Statistical analysis

All statistical analyses were computed using SPSS 16.0, JMP v6.0 (SAS Inc.) and R. For comparisons of CNV lengths and number of CNVs per individual, we used the Mann–Whitney U test because of the non-normality of the data. We ran the program STRUCTURE version 2.0 ( 36 ) using 4118 pseudo-randomly selected unlinked markers and identified structure differences between our populations. We did not detect population stratification, which could lead to spurious associations between the pooled case and pooled control groups ( Supplementary Material, Figure ). To identify associated CNV regions, we performed the exact Cochran–Mantel–Haenszel χ2 test per SNP using pooled Irish and Dutch data and Fisher’s exact test for separate populations. To correct for multiple testing, we used the Bonferroni correction. For the rare variation analysis, CNVs were defined as recurrent when they overlapped in at least two individuals.

SUPPLEMENTARY MATERIAL

Supplementary Material is available at HMG Online .

FUNDING

This research was funded by the Muscular Dystrophy Association (USA) (S.C., D.G.B., O.H.), by the Health Research Board of Ireland (O.H.), by the Irish Institute of Clinical Neuroscience Travel Award (S.C.), by the Irish Motor Neurone Disease Research Foundation, by the Prinses Beatrix Fonds, the Kersten Foundation, The Netherlands Organisation for Scientific Research, the Adessium Foundation and VSB Fonds (L.H.v.d.B.) and by The Brain Foundation of The Netherlands (J.H.V.).

ACKNOWLEDGEMENTS

We sincerely thank the patients, their relatives and referring neurologists for their participation. The original genotyping experiments for the Irish data were conducted at the NIH-funded laboratory of Drs Singleton, Traynor and Hardy (Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA). The original genotyping experiments for the Irish data were conducted at the NIH-funded laboratory of Drs Singleton, Traynor and Hardy (Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA).

Conflict of Interest statement . None declared.

REFERENCES

1
Lacomblez
L.
Bensimon
G.
Leigh
P.N.
Guillet
P.
Meininger
V.
Dose-ranging study of riluzole in amyotrophic lateral sclerosis. Amyotrophic Lateral Sclerosis/Riluzole Study Group II
Lancet
 , 
1996
, vol. 
347
 (pg. 
1425
-
1431
)
2
Pasinelli
P.
Brown
R.H.
Molecular biology of amyotrophic lateral sclerosis: insights from genetics
Nat. Rev. Neurosci.
 , 
2006
, vol. 
7
 (pg. 
710
-
723
)
3
Rosen
D.R.
Siddique
T.
Patterson
D.
Figlewicz
D.A.
Sapp
P.
Hentati
A.
Donaldson
D.
Goto
J.
O’Regan
J.P.
Deng
H.X.
, et al.  . 
Mutations in Cu/Zn superoxide dismutase gene are associated with familial amyotrophic lateral sclerosis
Nature
 , 
1993
, vol. 
362
 (pg. 
59
-
62
)
4
Puls
I.
Jonnakuty
C.
LaMonte
B.H.
Holzbaur
E.L.
Tokito
M.
Mann
E.
Floeter
M.K.
Bidus
K.
Drayna
D.
Oh
S.J.
, et al.  . 
Mutant dynactin in motorneuron disease
Nat. Genet.
 , 
2003
, vol. 
33
 (pg. 
455
-
456
)
5
Yang
Y.
Hentati
A.
Deng
H.X.
Dabbagh
O.
Sasaki
T.
Hirano
M.
Hung
W.Y.
Ouahchi
K.
Yan
J.
Azim
A.C.
, et al.  . 
The gene encoding alsin, a protein with three guanine-nucleotide exchange factor domains, is mutated in a form of recessive amyotrophic lateral sclerosis
Nat. Genet.
 , 
2001
, vol. 
29
 (pg. 
160
-
165
)
6
Nishimura
A.L.
Mitne-Neto
M.
Silva
H.C.
Richieri-Costa
A.
Middleton
S.
Cascio
D.
Kok
F.
Oliveira
J.R.
Gillingwater
T.
Webb
J.
, et al.  . 
A mutation in the vesicle-trafficking protein VAPB causes late-onset spinal muscular atrophy and amyotrophic lateral sclerosis
Am. J. Hum. Genet.
 , 
2004
, vol. 
75
 (pg. 
822
-
831
)
7
Chen
Y.Z.
Bennett
C.L.
Huynh
H.M.
Blair
I.P.
Puls
I.
Irobi
J.
Dierick
I.
Abel
A.
Kennerson
M.L.
Rabin
B.A.
, et al.  . 
DNA/RNA helicase gene mutations in a form of juvenile amyotrophic lateral sclerosis (ALS4). Am. J. Hum. Genet.
2004
, vol. 
74
 (pg. 
1128
-
1135
)
8
Greenway
M.J.
Andersen
P.M.
Russ
C.
Ennis
S.
Cashman
S.
Donaghy
C.
Patterson
V.
Swingler
R.
Kieran
D.
Prehn
J.
, et al.  . 
ANG mutations segregate with familial and ‘sporadic’ amyotrophic lateral sclerosis
Nat. Genet.
 , 
2006
, vol. 
38
 (pg. 
411
-
413
)
9
Sreedharan
J.
Blair
I.P.
Tripathi
V.B.
Hu
X.
Vance
C.
Rogelj
B.
Ackerley
S.
Durnall
J.C.
Williams
K.L.
Buratti
E.
, et al.  . 
TDP-43 mutations in familial and sporadic amyotrophic lateral sclerosis
Science
 , 
2008
, vol. 
319
 (pg. 
1668
-
1672
)
10
Schymick
J.C.
Talbot
K.
Traynor
B.J.
Genetics of sporadic amyotrophic lateral sclerosis
Hum. Mol. Genet.
 , 
2007
, vol. 
16
 (pg. 
R233
-
R242
)
11
Dunckley
T.
Huentelman
M.J.
Craig
D.W.
Pearson
J.V.
Szelinger
S.
Joshipura
K.
Halperin
R.F.
Stamper
C.
Jensen
K.R.
Letizia
D.
, et al.  . 
Whole-genome analysis of sporadic amyotrophic lateral sclerosis
N. Engl. J. Med.
 , 
2007
, vol. 
357
 (pg. 
775
-
788
)
12
Van Es
M.A.
Van Vught
P.W.
Blauw
H.M.
Franke
L.
Saris
C.G.
Andersen
P.M.
Van Den Bosch
L.
de Jong
S.W.
van’t Slot
R.
, et al.  . 
ITPR2 as a susceptibility gene in sporadic amyotrophic lateral sclerosis: a genome-wide association study
Lancet Neurol.
 , 
2007
, vol. 
6
 (pg. 
869
-
877
)
13
Van Es
M.A.
van Vught
P.W.
Blauw
H.
Franke
L.
Saris
C.G.
Van Den Bosch
L.
de Jong
S.W.
de Jong
V.
Baas
F.
van’t Slot
R.
, et al.  . 
Genetic variation in DPP6 is associated with susceptibility to amyotrophic lateral sclerosis
Nat. Genet.
 , 
2008
, vol. 
40
 (pg. 
29
-
31
)
14
Cronin
S.
Berger
S.
Ding
J.
Schymick
J.C.
Washecka
N.
Hernandez
D.G.
Greenway
M.J.
Bradley
D.G.
Traynor
B.J.
Hardiman
O.
A genome-wide association study of sporadic ALS in a homogenous Irish population
Hum. Mol. Genet.
 , 
2008
, vol. 
17
 (pg. 
768
-
774
)
15
Iafrate
A.J.
Feuk
L.
Rivera
M.N.
Listewnik
M.L.
Donahoe
P.K.
Qi
Y.
Scherer
S.W.
Lee
C.
Detection of large-scale variation in the human genome
Nat. Genet.
 , 
2004
, vol. 
36
 (pg. 
949
-
951
)
16
Redon
R.
Ishikawa
S.
Fitch
K.R.
Feuk
L.
Perry
G.H.
Andrews
T.D.
Fiegler
H.
Shapero
M.H.
Carson
A.R.
Chen
W.
, et al.  . 
Global variation in copy number in the human genome
Nature
 , 
2006
, vol. 
444
 (pg. 
444
-
454
)
17
Sebat
J.
Major changes in our DNA lead to major changes in our thinking
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
S3
-
S5
)
18
Stranger
B.E.
Forrest
M.S.
Dunning
M.
Ingle
C.E.
Beazley
C.
Thorne
N.
Redon
R.
Bird
C.P.
de Grassi
A.
Lee
C.
, et al.  . 
Relative impact of nucleotide and copy number variation on gene expression phenotypes
Science
 , 
2007
, vol. 
315
 (pg. 
848
-
853
)
19
Lupski
J.R.
de Oca-Luna
R.M.
Slaugenhaupt
S.
Pentao
L.
Guzzetta
V.
Trask
B.J.
Saucedo-Cardenas
O.
Barker
D.F.
Killian
J.M.
Garcia
C.A.
, et al.  . 
DNA duplication associated with Charcot–Marie–Tooth disease type 1A
Cell
 , 
1991
, vol. 
66
 (pg. 
219
-
232
)
20
Ewart
A.K.
Morris
C.A.
Atkinson
D.
Jin
W.
Sternes
K.
Spallone
P.
Stock
A.D.
Leppert
M.
Keating
M.T.
Hemizygosity at the elastin locus in a developmental disorder, Williams syndrome
Nat. Genet.
 , 
1993
, vol. 
5
 (pg. 
11
-
16
)
21
Veldink
J.H.
van den Berg
L.H.
Cobben
J.M.
Stulp
R.P.
De Jong
J.M.
Vogels
O.J.
Baas
F.
Wokke
J.H.
Scheffer
H.
Homozygous deletion of the survival motor neuron 2 gene is a prognostic factor in sporadic ALS
Neurology
 , 
2001
, vol. 
56
 (pg. 
749
-
752
)
22
Carter
N.P.
Methods and strategies for analyzing copy number variation using DNA microarrays
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
S16
-
S21
)
23
Blauw
H.M.
Veldink
J.H.
van Es
M.A.
van Vught
P.W.
Saris
C.G.
van der Zwaag
B.
Franke
L.
Burbach
J.P.
Wokke
J.H.
Ophoff
R.A.
, et al.  . 
Copy-number variation in sporadic amyotrophic lateral sclerosis: a genome-wide screen
Lancet Neurol.
 , 
2008
, vol. 
7
 (pg. 
319
-
326
)
24
Colella
S.
Yau
C.
Taylor
J.M.
Mirza
G.
Butler
H.
Clouston
P.
Bassett
A.S.
Seller
A.
Holmes
C.C.
Ragoussis
J.
QuantiSNP: an objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data
Nucleic Acids Res.
 , 
2007
, vol. 
35
 (pg. 
2013
-
2025
)
25
Scherer
S.W.
Lee
C.
Birney
E.
Altshuler
D.M.
Eichler
E.E.
Carter
N.P.
Hurles
M.E.
Feuk
L.
Challenges and standards in integrating surveys of structural variation
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
S7
-
S15
)
26
Conrad
D.F.
Hurles
M.E.
The population genetics of structural variation
Nat. Genet.
 , 
2007
, vol. 
39
 (pg. 
S30
-
S36
)
27
de Smith
A.J.
Tsalenko
A.
Sampas
N.
Scheffer
A.
Yamada
N.A.
Tsang
P.
Ben-Dor
A.
Yakhini
Z.
Ellis
R.J.
Bruhn
L.
, et al.  . 
Array CGH analysis of copy number variation identifies 1284 new genes variant in healthy white males: implications for association studies of complex diseases
Hum. Mol. Genet.
 , 
2007
, vol. 
16
 (pg. 
2783
-
2794
)
28
Kidd
J.M.
Cooper
G.M.
Donahue
W.F.
Hayden
H.S.
Sampas
N.
Graves
T.
Hansen
N.
Teague
B.
Alkan
C.
Antonacci
F.
, et al.  . 
Mapping and sequencing of structural variation from eight human genomes
Nature
 , 
2008
, vol. 
453
 (pg. 
56
-
64
)
29
Goodall
E.F.
Greenway
M.J.
van Marion
I.
Carroll
C.B.
Hardiman
O.
Morrison
K.E.
Association of the H63D polymorphism in the hemochromatosis gene with sporadic ALS
Neurology
 , 
2005
, vol. 
65
 (pg. 
934
-
937
)
30
Banfi
S.
Servadio
A.
Chung
M.Y.
Kwiatkowski
T.J.
Jr
McCall
A.E.
Duvick
L.A.
Shen
Y.
Roth
E.J.
Orr
H.T.
Zoghbi
H.Y.
Identification and characterization of the gene causing type 1 spinocerebellar ataxia
Nat. Genet.
 , 
1994
, vol. 
7
 (pg. 
513
-
520
)
31
Okuda
T.
Hattori
H.
Takeuchi
S.
Shimizu
J.
Ueda
H.
Palvimo
J.J.
Kanazawa
I.
Kawano
H.
Nakagawa
M.
Okazawa
H.
PQBP-1 transgenic mice show a late-onset motor neuron disease-like phenotype
Hum. Mol. Genet.
 , 
2003
, vol. 
12
 (pg. 
711
-
725
)
32
Shibata
H.
Huynh
D.P.
Pulst
S.M.
A novel protein with RNA-binding motifs interacts with ataxin-2
Hum. Mol. Genet.
 , 
2000
, vol. 
9
 (pg. 
1303
-
1313
)
33
Lim
J.
Hao
T.
Shaw
C.
Patel
A.J.
Szabó
G.
Rual
J.F.
Fisk
C.J.
Li
N.
Smolyar
A.
Hill
D.E.
, et al.  . 
A protein–protein interaction network for human inherited ataxias and disorders of Purkinje cell degeneration
Cell
 , 
2006
, vol. 
125
 (pg. 
801
-
814
)
34
Sebat
J.
Lakshmi
B.
Malhotra
D.
Troge
J.
Lese-Martin
C.
Walsh
T.
Yamrom
B.
Yoon
S.
Krasnitz
A.
Kendall
J.
, et al.  . 
Strong association of de novo copy number mutations with autism
Science
 , 
2007
, vol. 
316
 (pg. 
445
-
449
)
35
Brooks
B.R.
Miller
R.G.
Swash
M.
Munsat
T.L.
World Federation of Neurology Research Group on Motor Neuron Diseases
El Escorial revisited: revised criteria for the diagnosis of amyotrophic lateral sclerosis
Amyotroph. Lateral Scler. Other Motor Neuron Disord.
 , 
2000
, vol. 
1
 (pg. 
293
-
299
)
36
Pritchard
J.K.
Stephens
M.
Donnelly
P.
Inference of population structure using multilocus genotype data
Genetics
 , 
2000
, vol. 
155
 (pg. 
945
-
959
)

Author notes

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors and the last two authors should be regarded as joint Last Authors.