-
PDF
- Split View
-
Views
-
Cite
Cite
Yoji Ogura, Ikuyo Kou, Yohei Takahashi, Kazuki Takeda, Shohei Minami, Noriaki Kawakami, Koki Uno, Manabu Ito, Ikuho Yonezawa, Takashi Kaito, Haruhisa Yanagida, Kei Watanabe, Hiroshi Taneichi, Katsumi Harimaya, Yuki Taniguchi, Toshiaki Kotani, Taichi Tsuji, Teppei Suzuki, Hideki Sudo, Nobuyuki Fujita, Mitsuru Yagi, Kazuhiro Chiba, Michiaki Kubo, Yoichiro Kamatani, Masaya Nakamura, Morio Matsumoto, Japan Scoliosis Clinical Research Group, Kota Watanabe, Shiro Ikegawa, Member of the Japan Scoliosis Clinical Research Group , A functional variant in MIR4300HG, the host gene of microRNA MIR4300 is associated with progression of adolescent idiopathic scoliosis, Human Molecular Genetics, Volume 26, Issue 20, 15 October 2017, Pages 4086–4092, https://doi.org/10.1093/hmg/ddx291
- Share Icon Share
Abstract
Adolescent idiopathic scoliosis (AIS) is a common spinal deformity affecting millions of children. Since treatment and prognosis of AIS depend on curve progression, identifying factors related to AIS curve progression is important in its management. Although several genetic loci for AIS occurrence are reported, no locus for curve progression has been identified. To identify genes associated with AIS progression, we conducted a genome-wide association study followed by a replication study using a total of 2,543 AIS subjects who were evaluated for the curve progression. We identified a significantly associated locus on chromosome 11q14.1 (P = 1.98 × 10−9, odds ratio = 1.56). In silico and in vitro analyses identified a functional variant, rs35333564 in MIR4300HG, the host gene of a microRNA, MIR4300. The genomic region containing rs35333564 had enhancer activity, which was decreased in its risk allele. Our data suggest that decrease of MIR4300 is related to AIS progression.
Introduction
Adolescent idiopathic scoliosis (AIS) is defined as a complex lateral, rotated spinal curvature with a Cobb angle ≥ 10–15°. AIS affects otherwise healthy children from 10 years of age to the end of growth, having a prevalence of 2–4% (1). In some patients, the curve progresses to severe scoliosis with Cobb angle > 40° (2). Progressive curve requires bracing and/or spinal fusion surgery, both of which impose heavy physical and psychological burdens on the patients. Delayed intervention in AIS leads to poor prognosis (3). Accurate prediction of curve progression can reduce negative long-term effects in AIS management, such as psychological trauma, unnecessary bracing, serial exposures to radiation, and cost of care (4). However, conventional prediction methods using clinical parameters such as the Cobb angle at initial visit, age, sex, status of skeletal maturity, and menarche status, are not so reliable (5).
AIS curve progression has been shown to have genetic components (3). A twin study found a significant correlation with curve severity in monozygous twins (r = 0.7407, P < 0.0002), but not in dizygous twins (6). Another twin study also showed that there is a genetic factor in AIS curve progression (7). Genetic information associated with AIS progression have the potential to improve the prediction method (8) as well as in providing insight into its pathogenesis. Several loci have been identified by genome-wide association studies (GWASs) for the AIS susceptibility (occurrence) (9–12); however, no GWAS has been conducted for AIS curve progression. SNPs in ESR1, ESR2, MATN1, and IGF1 genes were reported to be associated with AIS severity using the candidate gene approach (13–19); however, replication studies showed no association for any of the genes (11,20–22).
To identify loci related to AIS curve progression, we conducted a GWAS by comparing AIS patients with and without progression.
Results
GWAS
For a GWAS, we recruited 2,142 patients with AIS as previously described (10). We divided them into the progression and non-progression groups: 1,105 patients were assigned to the progression group and 832 to the non-progression group. The remaining 205 patients with Cobb angle <40 degree and the Risser grade <4 were excluded from both groups. We used their genotyping data determined in the previous GWASs for AIS susceptibility genes using two Illumina Genotyping BeadChips (HumanQuad 610k and HumanOmniExpress) (9,10). In summary, 662 patients in the progression group and 422 patients in the non-progression group were genotyped using HumanQuad 610 K, while 443 patients in the progression group and 410 patients in the non-progression group were genotyped using HumanOmniExpress. After quality control filtering, 502,857 SNPs in HumanQuad 610k and 592,392 SNPs in HumanOmniExpress were examined for association. We checked for possible population stratification effects using a quantile-quantile (Q-Q) plot and obtained a genomic inflation factor of 1.03 and 1.05 in each platform, implying that the possibility of false positive associations due to population structure or cryptic relatedness was low (Supplementary Material, Fig. S1). Since the tag SNPs involved in the two genotyping chips were different, we performed a whole genome imputation and a subsequent meta-analysis as previously described (10). In total, we analyzed the association of 7,521,072 SNPs. We identified 10 candidate loci with evidence of suggestive association (P < 1 × 10−5; Fig. 1A, Supplementary Material, Table S1).

(A) Manhattan plot showing the P value from the GWAS. The P values were plotted against their positions on the chromosome. The blue line represents the threshold (P = 1.0 × 10−5) for the validation study. (B) Association plot for the 11q locus. SNPs are plotted by chromosomal position (x-axis) and the GWAS association (–log10P value; y-axis). Their linkage disequilibrium (r2) with the SNP with the strongest association signal (rs1828853: purple circle) are indicated by color. (C) Regional association plot around rs1828853. The significantly associated SNPs are in intron 1 of MIR4300HG. SNPs with black circle overlap with the ChromHMM epigenomics-annotated enhancer elements. (D) Topologically associated domain (TAD) around the landmark SNP (rs1828853: asterisk) on chromosome 11q14.1 and position of MIR4300. Normalized Hi-C interaction frequencies were generated using the Interactive Hi-C Data Browser and displayed as a two-dimensional heat map, in which red represents a high interaction frequency. We obtained the Hi-C data from 28 types of cells from the database. There was no significant difference among them. The data of H1-mesenchymal stem cell is presented as a representative. The TAD involving the AIS progression locus includes only a single gene, MIR4300HG. MIR4300 is produced from intron 6 of MIR4300HG. (E) Structure of MIR4300HG and location of MIR4300. MIR4300HG consists of eight exons. MIR4300 resides in intron 6 of MIR4300HG. MIR4300 is transcribed to pri-microRNA 4300 (blue bar; 96 nucleotides), which is finally processed to mature microRNA 4300 (red bar; 18 nucleotides).
Replication study
We examined the replication of the association of the 10 candidate loci using an independent Japanese AIS cohort (323 progressed and 283 non-progressed patients). We found evidence for association with one locus represented by the SNP rs1828853 on chromosome 11q14.1 with P = 3.73 × 10−5 (Bonferroni-corrected P < 5.00 × 10−3) (Supplementary Material, Table S2). When the GWAS and replication study results were combined, rs1828853 reached a genome-wide significance threshold of P < 5 × 10−8 (combined P = 1.98 × 10−9; odds ratio = 1.56; 95% confidence interval (CI) = 1.35–1.80; Table 1).
Association of rs1828853 with progression of adolescent idiopathic scoliosis
Study . | Number of samples . | RAF . | P valuea . | Odds ratio . | ||
---|---|---|---|---|---|---|
Progression . | Non-progression . | Progression . | Non-progression . | (95% CI) . | ||
GWAS | 1,105 | 832 | 0.823 | 0.768 | 4.69 × 10−6 | 1.48 (1.25–1.76) |
Replication | 323 | 283 | 0.823 | 0.723 | 3.73 × 10−5 | 1.78 (1.36–2.35) |
Combinedb | 1,428 | 1,115 | 1.98 × 10−9 | 1.56 (1.35–1.80) |
Study . | Number of samples . | RAF . | P valuea . | Odds ratio . | ||
---|---|---|---|---|---|---|
Progression . | Non-progression . | Progression . | Non-progression . | (95% CI) . | ||
GWAS | 1,105 | 832 | 0.823 | 0.768 | 4.69 × 10−6 | 1.48 (1.25–1.76) |
Replication | 323 | 283 | 0.823 | 0.723 | 3.73 × 10−5 | 1.78 (1.36–2.35) |
Combinedb | 1,428 | 1,115 | 1.98 × 10−9 | 1.56 (1.35–1.80) |
RAF: risk allele (A) frequency, CI: confidence interval.
The Cockran-Armitage trend test.
Combined by the inverse-variance method.
Association of rs1828853 with progression of adolescent idiopathic scoliosis
Study . | Number of samples . | RAF . | P valuea . | Odds ratio . | ||
---|---|---|---|---|---|---|
Progression . | Non-progression . | Progression . | Non-progression . | (95% CI) . | ||
GWAS | 1,105 | 832 | 0.823 | 0.768 | 4.69 × 10−6 | 1.48 (1.25–1.76) |
Replication | 323 | 283 | 0.823 | 0.723 | 3.73 × 10−5 | 1.78 (1.36–2.35) |
Combinedb | 1,428 | 1,115 | 1.98 × 10−9 | 1.56 (1.35–1.80) |
Study . | Number of samples . | RAF . | P valuea . | Odds ratio . | ||
---|---|---|---|---|---|---|
Progression . | Non-progression . | Progression . | Non-progression . | (95% CI) . | ||
GWAS | 1,105 | 832 | 0.823 | 0.768 | 4.69 × 10−6 | 1.48 (1.25–1.76) |
Replication | 323 | 283 | 0.823 | 0.723 | 3.73 × 10−5 | 1.78 (1.36–2.35) |
Combinedb | 1,428 | 1,115 | 1.98 × 10−9 | 1.56 (1.35–1.80) |
RAF: risk allele (A) frequency, CI: confidence interval.
The Cockran-Armitage trend test.
Combined by the inverse-variance method.
Fine mapping
To characterize the locus on chromosome 11q14.1, we plotted genotyped and imputed SNPs around rs1828853 (Fig. 1B). Regional association plots were generated using Locus Zoom. We extracted and plotted SNPs containing strong LD with the landmark SNP rs1828853 (Fig. 1C). These SNPs were in the same LD block that included only a single gene, MIR4300HG (Supplementary Material, Fig. S2). The most significantly associated SNPs were in intron 1 of MIR4300HG (Fig. 1B and C).
Next, we evaluated a topologically associated domain (TAD) around these SNPs using Interactive Hi-C Data Browser. It has been shown that the genome is partitioned into megabase-scale TADs, within which promoters and enhancers can interact (23). We could obtain the Hi-C data from 28 types of cells from the database. There was no significant difference in the patterns of the TAD around the genomic region containing the associated SNPs among the 28 cells. The TAD containing the associated SNPs included only a single gene, MIR4300HG (Fig. 1D).
Identification of the functional SNP in the locus
To search for the functional SNP, we extracted SNPs that were strongly correlated (r2 ≥ 0.8) with rs1828853 using HaploReg v4.1. (Supplementary Material, Table S3). rs2037504 had promoter marks in muscle tissues and was predicted to be within the DNase I hypersensitive site in skin tissues. Three SNPs, rs971548, rs35333564 and rs7949305, were in enhancer histone marks. To predict protein binding to the sequences around these SNPs, we used RegulomeDB. The result showed that rs971548 is in CTCF binding site. These were good candidates for the functional SNPs.
To evaluate whether the genomic region containing the SNPs could function as cis-element of some enhancers, we performed an electrophoretic mobility shift assay (EMSA) using nuclear extracts from A172 glioblastoma cells. The sequences containing these SNPs demonstrated specific bindings to the nuclear proteins; however, only rs35333564 showed the allelic difference in the binding (Fig. 2A). Similar results were obtained using HeLa cells (data not shown). We then examined the transcriptional activities of the SNPs by luciferase assays using HeLa cells. After 48-h incubation, we compared the luciferase activities between the risk and non-risk alleles of each SNP. Only rs35333564 had significant allelic difference in the activities; the construct containing the risk allele showed significantly lower luciferase activity than that containing the non-risk allele (Fig. 2B), suggesting that the risk allele of rs35333564 is implicated in AIS curve progression by decreasing the enhancer activity. Other three SNP had no significant allelic difference in their luciferase activities. Experiments using other cell lines (A172) yielded similar results (data not shown).

(A) Electrophoretic mobility shift assay using A172 cells. Unlabelled probes (a 200-fold excess) were used as competitors. Both oligo sequences containing risk allele (R) and non-risk allele (N) showed bands which diminished with competitors (arrowhead). (B) Transcriptional activities of rs35333564 constructs and pGL3 promoter vector (–) using HeLa cells. The construct containing risk (susceptibility) allele (R) showed significantly lower luciferase activity compared to that containing non-risk allele (N). Asterisk: P value < 0.01 (t test). Error bar: standard error. The experiment was repeated three times.
MIR4300HG expression in human tissues
BioGPS database showed that MIR4300HG was highly expressed in spinal cord, brain, skeletal muscle, salivary gland, epithelial cells in various tissues and sperm. We also examined the expression of MIR4300HG mRNA using quantitative RT-PCR in a variety of human tissues. We found that MIR4300 was most highly expressed in testis, followed by spinal cord, bone, vertebral disc and cartilage (Supplementary Material, Fig. S3). Almost no expression was observed in other tissues.
Functional annotation of MIR4300
We examined genes interacting with MIR4300 using TargetScanHuman7.0 and miRWalk2.0. We identified 757 and 758 genes in them, respectively, 55 of which were shared (Supplementary Material, Table S4). Next, to examine common functional characteristics among the 55 genes, we performed gene ontology and pathway analysis for the 55 genes using DAVID and Gene Ontology Consortium websites. Gene ontology analysis revealed that these 55 genes had enrichment in cysteine-type endopeptidase inhibitor activity. Cysteine endopeptidases are responsible for apoptosis, MHC class II immune responses, prohormone processing and extracellular matrix remodeling that is important to bone development (24). Decreased calpain-3, one of cysteine endopeptidases results in scoliosis, suggesting cysteine-type endopeptidase inhibitor activity may be related to scoliosis progression.
Discussion
We have performed a GWAS and whole genome imputation followed by a replication study and discovered the first genome-wide significant locus for AIS curve progression on chromosome 11q14.1. The most significantly associated SNPs in the GWAS were clustered in intron 1 of MIR4300HG. Both LD block and TAD containing the associated SNPs included only a single gene, MIR4300HG. A functional SNP significantly associated with AIS curve progression was found in a putative enhancer region in intron 1 of MIR4300HG and its risk allele showed significantly lower enhancer activities than the non-risk allele. These findings indicated that MIR4300HG is the gene for AIS curve progression in the locus and its decreased expression would lead to the curve progression.
MIR4300HG is the host gene of microRNA (MIR4300). Most microRNAs derive from their respective microRNA host genes. MIR4300 is encoded within intron 6 of MIR4300HG (Fig. 1E) and is transcribed together with MIR4300HG. MIR4300 consists of 96 nucleotides (pri-microRNA), which is eventually processed into 18 nucleotides (mature microRNA) (Fig. 1E).
The function of MIR4300 is unknown. MicroRNAs regulate the expression of as much as 30% of all mammalian protein-encoding genes as down-regulators (25). We searched for genes interacting with MIR4300 by in silico analyses and found 55 genes (Supplementary Material, Table S4). Among them, three genes, ESR1 (estrogen receptor 1), EBP emopamil binding protein and KIF6 (kinesin family member 6) caught our interest. ESR1 is reported to be associated with AIS (17). We searched ESR binding motifs using JASPAR and found a lot of ESR binding motif in MIR4300HG. EBP is related to chondrodysplasia punctata type 2, a skeletal dysplasia whose phenotype includes severe scoliosis (OMIM 302960). In zebrafish, a loss of function mutation in kif6 has been reported to cause scoliosis (26). The list of the target genes does not include previously identified AIS susceptibility genes (9–12,27) such as LBX1, GPR126, SOX9, BNC2, and PAX, supporting our hypothesis that genetic factors differ between susceptibility (occurrence) and progression of AIS. In fact, previously identified susceptibility loci had no association with curve progression in the present study (Supplementary Material, Table S5).
MIR4300HG was expressed in spinal cord, bone, cartilage, and intervertebral disc. These tissues have been implicated in the etiology of AIS (9–12,27). Interestingly, MIR4300HG was most highly expressed in testis, which may be related to the sex difference of AIS in its severity and progression (28,29). A possibility is that MIR4300HG in the testis has some systemic effect that influences spinal growth in adolescent males. In the initial stage, AIS equally affects boys and girls, but progression to a severe curve occurs much more frequently in girls: the prevalence of AIS is equal for curves of 10° but increases by 5.4 times in girls for curves of 21° or more (30). The sex difference in the curve progression may be related to the stronger protection by MIR4300 in AIS boys during adolescence.
Materials and Methods
Patients
We recruited 2,142 patients with AIS as previously described (10). All patients underwent clinical examinations and radiologic evaluations by expert scoliosis surgeons. We obtained informed consent from all patients or their parents. The ethics committees of RIKEN and participating institutions approved this study. We defined the progression of the curve as that with a Cobb angle ≥ 40° regardless of skeletal maturity, and non-progression as that with a Cobb angle ≤ 30° in skeletally mature patients. This criterion is based on AIS’s natural history, wherein a curve with Cobb angle higher than 40° is progressive even after skeletal maturity, while a curve with Cobb angle lower than 30° is non-progressive (31). We put 10° of gap between the two groups to achieve a clear-cut grouping because Cobb method has 4–10° of intra- or inter observer error (32). Skeletal maturity was defined by the Risser grade ≥ 4 (33). Cobb angles were evaluated using the patient’s most recent radiograph for the non-operated cases, and information from the patient’s last examination before surgery for the operated cases.
GWAS
For the GWAS, we genotyped 1,289 patients with the Illumina HumanQuad 610 K BeadChip and 853 patients with the Illumina Human OmniExpress BeadChip (Illumina, San Diego, CA). Quality control measures were previously described (10). For the sample quality control, we checked gender information using SNPs on the X chromosome, and evaluated cryptic relatedness for each sample with the identity-by-state method, and removed samples that showed second-degree relatedness or closer. We performed a PCA with four reference populations from the HapMap data as reference to check for population stratification in this study as previously described (10). Scatter plots were generated using the top two associated principal components (eigenvectors) to identify outliers not belonging to the JPT (Japanese in Tokyo, Japan) cluster. We then performed PCA analysis using only the genotype information of subjects with and without progression to further evaluate the population substructure. We constructed Q–Q plots using observed P values against expected P values, and an inflation factor value (λ-value) was calculated to assess potential population stratification of the study patients.
Whole genome imputation
We conducted a whole genome imputation using a dataset of 1000 genomes from East Asian populations [JPT, CHB (Han Chinese in Beijing) and CHS (Southern Han Chinese)] with datasets from the Phase I Integrated Release version 2 as a reference panel to infer missing genotypes. We prepared the input files after quality control, which excluded SNPs with genotyping rate of <99%, those that deviated from Hardy–Weinberg equilibrium (P ≤ 1.0 × 10−6), and those with a minor allele frequency < 0.01. We excluded SNPs whose allele frequencies were comparable between the GWAS dataset and the reference panel with differences of > 0.16. We used SNPs with imputation quality score Rsq ≥ 0.9 for the association study.
Replication study
We selected 10 SNPs with P < 1.0 × 10−5 from the GWAS (Table 1). The SNPs with the best P value were selected if multiple SNPs in the same linkage disequilibrium (LD) block reached the threshold. We genotyped 323 progressed and 283 non-progressed patients using a multiplex PCR-based Invader assay for the validation study as previously described (34). We recruited the subjects with the same criteria as the GWAS.
Statistical analyses for association
The associations of the imputed SNPs were generated with mach2dat software, which utilized the output results from Minimac (dosage of the imputed SNP). Regional association plots were generated using Locus Zoom. Meta-analysis for the combined analysis of the GWAS and validation study was performed using the inverse-variance method and heterogeneity between the two phases was evaluated using the Breslow-Day test.
EMSA
We chose A172 cells for EMSA because MIR4300HG was expressed in this cell line. We prepared nuclear extract from A172 cells. EMSA was performed using the DIG Gel Shift kit, 2nd Generation (Roche, Basel, Switzerland) as previously described (35). In brief, we labeled double-stranded oligonucleotides with digoxigenin-11-ddUTP using the DIG Gel Shift kit, 2nd Generation (Roche). We incubated the labeled probes with the nuclear extracts of A172 in a binding buffer at room temperature for 20 min. For competition studies, we pre-incubated a 200-fold excess of unlabeled oligonucleotides with the nuclear extract before adding the DIG-labeled probes. After incubation, we separated the protein DNA complexes by 6% non-denaturing acrylamide gel electrophoresis with 0.5× TBE buffer. We transferred the protein DNA complexes to a nylon membrane by electro-blotting, and were detected using the anti-DIG antibody (Roche).
Luciferase assay
We chose HeLa cells for luciferase assay because MIR4300HG was expressed in this cell line and its transfection efficiency is high. We constructed luciferase reporter plasmids of the SNPs by cloning oligonucleotides into the pGL3-promoter vector (Promega, Madison, WI) at an NheI site upstream of the SV40 promoter. Oligonucleotides (25 bp) were designed to contain the SNPs that corresponded to genomic sequences. HeLa cells (5 × 104) were transfected with 500 ng of the reporter constructs and 1 ng of the pRL-TK control vector using FuGene6 transfection reagent (Promega) according to the manufacturer’s protocol. We grew HeLa cells in DMEM with 10% fetal bovine serum and antibiotics. After 48-h incubation, we measured the luciferase activities using the Dual-Luciferase Reporter Assay system (Promega). Each experiment was independently repeated three times.
Functional annotation
We used HaploReg v4.1. and RegulomeDB to search for functional SNPs in the locus identified by GWAS. We searched MIR4300HG expression using BioGPS database. Prediction of MIR4300 target genes was searched using TargetScan Human7.0 and miRWalk2.0 database. We conducted gene ontology analyses using DAVID and Gene Ontology Consortium websites.
HaploReg, http://archive.broadinstitute.org/mammals/haploreg/haploreg.php
Regulome, http://www.regulomedb.org/
BioGPS, http://biogps.org/
TargetScan Human7.0, http://www.targetscan.org/vert_70/
miRWalk2.0, http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/index.html
DAVID, https://david.ncifcrf.gov/
Gene Ontology Consortium, http://geneontology.org/
Quantitative RT-PCR
We synthesized cDNA from the total RNAs of bone, spinal cord, articular cartilage, and intervertebral disc samples using Multiscribe reverse transcriptase and a random hexamer primer (Applied Biosystems, Foster city, CA). Each tissue included equal amounts of cDNAs from three individuals. We used these cDNA and multiple-tissue cDNA panels (Clontech, Mountain view, CA) for RT-PCR experiments to examine the tissue-specific expression of MIR4300HG. We designed MIR4300HG-specific primers to span the junction of exon 4 and 5. We performed quantitative RT-PCR using ABI StepOne Real-Time PCR system with the QuantiTect SYBR Green PCR kit (Qiagen, Hilden, Germany) according to the manufacturer’s protocol.
Supplementary Material
Supplementary Material is available at HMG online.
Acknowledgements
We are grateful to the patients, their families, and the referring physicians for their participation in this study. We thank N. Suzuki, M. Saito, and M. Kamata for their help in collecting the samples. We also thank T. Isono, Y. Takanashi, and T. Kusadokoro for technical assistance, and N. Atsumi for checking English. This work was supported by Grant of Japan Orthopaedics and Traumatology Research Foundation, Inc. No. 344 (to Y.O.).
Conflict of Interest statement. All authors declare no competing financial interests.
References
Author notes
†Member of the Japan Scoliosis Clinical Research Group are listed in the Acknowledgements section.