Genome-Wide Association Analysis Identified ANXA1 Associated with Shoulder Impingement Syndrome in UK Biobank Samples

Shoulder impingement syndrome (SIS) is a common shoulder disorder with unclear genetic mechanism. In this study, Genome-wide Association Study (GWAS) was conducted to identify the candidate loci associated with SIS by using the UK Biobank samples (including 3,626 SIS patients and 3,626 control subjects). Based on the GWAS results, gene set enrichment analysis was further performed to detect the candidate gene ontology and pathways associated with SIS. We identified multiple risk loci associated with SIS, such as rs750968 (P = 4.82 × 10−8), rs754832 (P = 4.83 × 10−8) and rs1873119 (P = 6.39 × 10−8) of ANXA1 gene. Some candidate pathways have been identified related to SIS, including those linked to infection response and hypoxia, “ZHOU_INFLAMMATORY_RESPONSE_FIMA_DN” (P = 0.012) and “MANALO_HYPOXIA_UP” (P = 5.00 × 10−5). Our results provide novel clues for understanding the genetic mechanism of SIS.

millions of single nucleotide polymorphisms (SNPs) from the genome to perform case-control association analysis as markers and screens the SNPs in the whole genome to determine the susceptible sites and regions associated with diseases (Nicolae et al. 2010). Traditional GWAS focuses on single SNPs/genes that have a strong statistical association with phenotypic traits. Considering the potential biological interactions of the tested genes, a number of tools for downstream gene set enrichment analysis of GWAS results are widely used, such as GSEA (Wang et al. 2007) and FUMA (Watanabe and Taskesen 2017).
Gene set enrichment analysis (GSEA), also known as functional enrichment analysis, has been developed to identify functionally related genes that are significantly enriched in concentrations of genes that are signaly associated with complex diseases or traits (Du et al. 2018). Since GSEA integrates the joint genetic effects of multiple genes and prior information about biological processes (such as pathways), it shows a powerful function of biological interactions for pathogenetic research, especially for complex diseases or traits determined by a set of genes with moderate or weak genetic effects (Wang et al. 2007;Zhang et al. 2014).
To the best of our knowledge, no effort has paid to explore the genetic mechanism of SIS. Therefore a GWAS of SIS in UK Biobank was performed in this study. The GWAS results were further performed with GSEA to detect the candidate gene ontology and pathways associated with SIS. Our results may provide a new clue for understanding the molecular genetic mechanisms in the development of SIS.

UK BioBank samples
For the GWAS, we used the UK Biobank resource, a prospective cohort study including approximately 500,000 candidates from the UK, aged between 40 and 69 years, who have had whole-genome genotyping undertaken and have allowed the linkage of these data with their patient records (Bycroft et al. 2018). Briefly, participants provided selfreported information and collected blood samples for biochemical tests, genotyping, and physical measurements, as described previously (Bycroft et al. 2018). Individuals were linked retrospectively and prospectively to the National Health Service's Hospital Episode Statistics database. The 9 th and 10 th revisions of the International Classification of Diseases (ICD) were used to define cases based on inpatient Hospital Episode Statistics data. Specific for this study, 3,626 SIS cases were identified according to the ICD10 diagnoses (Data Field: 41202) from the UK Biobank. Among the remaining samples, 3,626 samples were randomly selected as normal controls after excluding osteoarticular disorders according to the ICD 10 th . Finally, a total of 7,252 participants from the UK Biobank cohort were included in this study (Table 1).

Genotyping
Genotyping details for UK Biobank participants have been reported previously (Bycroft et al. 2018). Briefly, 49,950 candidates were genotyped using the UK BiLEVE Axiom Array (807,411 markers) and 438,427 candidates were genotyped using UK Biobank Axiom Array (825,927 markers); the two arrays share about 95% of their marker content. Genotypes were called from the array intensity data and 106 batches of approximately 4700 samples were using a custom genotype-calling pipeline.
Genome-wide association study of shoulder impingement syndrome The GWAS of SIS were performed using 7,254 participants of European ancestry in the UK Biobank. All the subjects are unrelated self-reported White. The unrelated subjects were generated with KING software (http://people.virginia.edu/wc9c/KING/) . The kinship coefficients estimated by KING, and the option "-min" to apply the same cut-off for 3 rd degree (2 · 1/2 (9/2) = 0.088). The subjects who had a self-reported sex inconsistent with the genetic gender, who were genotyped but not imputed or who withdrew their consents were removed. A logistic regression model (implemented in PLINK2) assuming additive genetic effects was used for association analysis, adjusting for age, sex, PC1, PC2 and PC3 as covariates (Purcell et al. 2007). For quality control, we removed the SNPs with a call rate , 90%, SNP-level missingness (the number of individuals lacking information on a specific SNP is missing in the sample) . 0.1, individual-level missingness (the number of SNPs that is missing for a specific individual) . 0.1, minor allele frequency (MAF) , 0.01 and Hardy-Weinberg equilibrium (HWE) P value , 1.0 · 10 23 . The Manhattan plot of GWAS results was drawn by R (https://www.r-project.org/). The quantile-quantile (QQ) plot was made by FUMA (Watanabe and Taskesen 2017).
Linking GWAS findings to eQTL loci GTEx eQTL dataset were used to identify genes with eQTL association for further investigation in the effect of gene variation on gene expression (GTEx Consortium et al. 2017). The identified GWAS SNPs were linked to eQTL loci from GTEx databases and then further evaluated for SIS related gene expression in our GWAS findings.

Gene set enrichment analysis
The obtained GWAS summary data of SIS SNP were further subjected to gene set enrichment analysis (Wang et al. 2007;Subramanian et al. 2005). The detailed analytical procedures can be found in a published study (Liu et al. 2010). Briefly, four public pathway databases were utilized to generate a collection of gene ontology terms and pathways for testing in this study, which include BioCarta pathway database (http://www.biocarta.com/genes), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (http://www.genome.ad.jp/kegg/pathway.html), Ambion GeneAssist Pathway Atlas (http://www.ambion.com/tools/pathway), and Gene Ontology (GO) database (http://www.geneontology.org). A total of 2,850 gene ontology terms and pathways were analyzed in this study. 5,000 permutations were conducted to obtain the empirical distribution for the statistical tests. The enrichment scores and the empirical distribution were normalized to account for the different sizes of gene sets for each analyzed pathway databases. For each pathway gene sets, the empirical P value was calculated based on the observed normalized enrichment scores and the empirical distribution (Wang et al. 2007). Significant pathways were identified at empirical P values , 0.05.

Data availability
The GWAS data (Category 100314) used in this study are available in http://biobank.ctsu.ox.ac.uk/crystal/label.cgi?id=100314. The UK Biobank resource can be found in http://www.ukbiobank.ac.uk/ scientists-3/genetic-data/. Identification of SIS cases were followed to the ICD10 diagnoses (Data Field: 41202) from the UK Biobank in http://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=41202. Genome-wide n■ significant loci associated with SIS are summarized in Table S1. Significant pathways associated with TWAS of SIS are summarized in Table  S2. Quantile-quantile (QQ) plots of observed against expected P values for SIS is shown in Figure S1. All the supplementary materials including the GWAS summary statistics of SIS are available at figshare: https:// doi.org/10.25387/g3.12477842.

DISCUSSION
In this study, we conducted a GWAS of SIS using a large data set from UK biobank samples. GWAS results were further performed with GSEA to detect candidate gene ontology and SIS related pathways. We detected several candidate genes and gene sets for SIS, such as ANXA1 and PLGRKT.
One vital finding of this study is the two significant risk SNPs rs754832 and rs750968, both of them were near to the ANXA1 (Annexin A1) gene on chromosome nine. This gene encodes a membrane-localized protein that inhibits phospholipase A2 and has anti-inflammatory activity (Cardin et al. 2017). A research identified the regulation of the inflammation by regulating the Figure 1 Manhattan plot for shoulder impingement syndrome in UK. Biobank Genomic coordinates are displayed along the X-axis, with the negative logarithm of the association P value for each SNP displayed on the Y-axis, meaning that each dot on the Manhattan plot signifies a SNP. The red line indicates the P-value threshold for genome-wide significance (P , 5 · 10 28 ) while the blue line indicates P-value threshold for suggestive significance (P , 5 · 10 25 ). clearance of neutrophils by macrophages in mouse bone marrow as a critical regulator role for the anti-inflammatory molecule ANXA1 (Dali et al. 2012). Moreover, ANXA1 is essential to regulate bone marrow mesenchymal stem cells (BMSCs) proliferation and osteogenic differentiation (Pan et al. 2015). Likewise, Liang et al. investigated the effects of silencing ANXA1 gene on proliferation and osteogenic differentiation of rabbit BMSCs, and found that silencing ANXA1 gene could reduce those capacities of BMSCs, which may be one of mechanisms underlying osteoarticular diseases (Liang et al. 2012). Our results suggest that MIR6130 is another gene near the top SNP rs750968. MIR6130 (microRNA 6130), alias as hsamir-6130, is one of the microRNAs that coordinately regulated in rheumatic heart disease patients with mitral valve stenosis (Lu et al. 2018). However, the relationship between MIR6130 and SIS remains unclear.
In this study, another identified candidate genes for SIS was PLGRKT (Plasminogen Receptor with A C-Terminal Lysine) which involved in regulation of inflammatory response; regulated monocyte chemotactic migration and matrix metalloproteinase activation, such as of MMP2 and MMP9 (Lighvani et al. 2011). It has been demonstrated that Plasminogen (PLG) and the plasminogen receptor, PLGRKT, play an important role in macrophage recruitment in the inflammatory response, and regulated inflammation via the interplay between PLG and its receptors (Miles et al. 2014). Miles et al. evaluated the effect of PLGRKT deletion on inflammation, and demonstrated that PLGRKT was required for plasminogen binding and macrophage migration in vivo (Miles et al. 2017). Likewise, PLGRKT involved in regulation of inflammatory response; regulates matrix metallproteinase activation, which may damage cells by digesting the ECM (Erlykina et al. 2015).
GSEA detected several candidate biological gene sets for SIS, such as "ZHOU_INFLAMMATORY_RESPONSE_FIMA_DN" (genes down-regulated in macrophages by P.gingivalis FimA pathogen), which was related to inflammatory response. This gene set up-regulated in macrophages by lipopolysaccharide (LPS) and can lead to an inflammatory condition (Zhou and Amar 2007). Several investigators have suggested that fimbriae of P.gingivalis can trigger the production of proinflammatory mediators in human macrophages (Nakagawa et al. 2002). Besides, LPS in P.gingivalis could promote the antibacterial responses of macrophages, and simultaneously cause detrimental lesion in infected tissues due to enhanced local inflammatory reactions (Mancuso et al. 2007). Therefore, this gene set may affect the development of SIS by regulating inflammatory responses, but needs further verification.
"MANALO_HYPOXIA_UP" (genes up-regulated in response to both hypoxia and overexpression of an active form of HIF1A), another candidate biological gene sets associated with SIS, played a vital role in the pathogenesis of osteoarthritis (OA) and rheumatoid arthritis (RA) (Johnson et al. 2000;Brouwer et al. 2009;Pfander et al. 2006). However, the molecular mechanism of hypoxic-induced articular lesions in osteoarthrosis remains controversial. A variety of hypotheses on the role of hypoxia in the pathogenesis of osteoarthrosis, such as abnormal energy metabolism and immune inflammation have been proposed in previous studies (Pfander and Gelse 2007). For example, the molecular mechanism of hypoxiainduced joint injury may be partly due to the increase in production of ROS under hypoxia, which can accelerate protein oxidation, and induce mitochondrial injury and apoptosis (Zhang et al. 2011). Based on these evidences above, we suggested that hypoxia might play an important role in the pathogenesis of SIS.
The gene set enrichment researches are receiving increasing attention in genetic study of human complex diseases. Although this is the first molecular mechanism study in SIS combining GWAS and gene set enrichment analysis, there are still several limitations need to be addressed. First, the sample size of SIS GWAS is not large enough. Additionally, our analysis results need to be verified in experimentation via collecting sufficient SIS samples. Consequently, our study results should be interpreted with caution and further studies are needed to confirm the findings and reveal the potential roles of identified locus and pathways in the development of SIS.
n■ In conclusion, we conducted GWAS for SIS, and then performed a gene set enrichment analysis based on GWAS results. Finally, a group of significant difference genes and gene sets within SIS were identified, which may provide novel clues for revealing the complex genetic mechanism of SIS.

ACKNOWLEDGMENTS
This work was supported by the National Natural Scientific Foundation of China [81472925,81673112]; the Key projects of international cooperation among governments in scientific and technological innovation [2016YFE0119100]; and the Fundamental Research Funds for the Central Universities [xzy022019006].