Abstract

While many disease-associated single nucleotide polymorphisms (SNPs) are expression quantitative trait loci (eQTLs), a large proportion of genome-wide association study (GWAS) variants are of unknown function. Alternative polyadenylation (APA) plays an important role in posttranscriptional regulation by allowing genes to shorten or extend 3′ untranslated regions (UTRs). We hypothesized that genetic variants that affect APA in lung tissue may lend insight into the function of respiratory associated GWAS loci. We generated alternative polyadenylation (apa) QTLs using RNA sequencing and whole genome sequencing on 1241 subjects from the Lung Tissue Research Consortium (LTRC) as part of the NHLBI TOPMed project. We identified 56 179 APA sites corresponding to 13 582 unique genes after filtering out APA sites with low usage. We found that a total of 8831 APA sites were associated with at least one SNP with q-value < 0.05. The genomic distribution of lead APA SNPs indicated that the majority are intronic variants (33%), followed by downstream gene variants (26%), 3′ UTR variants (17%), and upstream gene variants (within 1 kb region upstream of transcriptional start site, 10%). APA sites in 193 genes colocalized with GWAS data for at least one phenotype. Genes containing the top APA sites associated with GWAS variants include membrane associated ring-CH-type finger 2 (MARCHF2), nectin cell adhesion molecule 2 (NECTIN2), and butyrophilin subfamily 3 member A2 (BTN3A2). Overall, these findings suggest that APA may be an important mechanism for genetic variants in lung function and chronic obstructive pulmonary disease (COPD).

Introduction

Chronic obstructive pulmonary disease (COPD) is a complex disease characterized by irreversible airflow obstruction on lung function testing. The leading environmental risk factor for COPD is cigarette smoking, however genetic factors have also been shown to play a role in disease susceptibility [1–4]. Through the use of genome-wide genotyping data, the COPDGene Study estimated the heritability of COPD as 37.7% in non-Hispanic white subjects and 37.9% in African Americans, while the heritability of lung function (forced expiratory volume in 1 s, FEV1) is 38.4% in white subjects and 50.9% in African Americans [5]. Genome-wide association studies (GWAS) have been used to identify genetic variants associated with COPD and lung function [6–9]. A 2019 GWAS including 257 811 subjects from studies including the International COPD Genetics Consortium and UK Biobank identified 82 loci with genome wide significance [7], while a study of 400 102 individuals from UK Biobank and the SpiroMeta consortium identified 279 loci associated with lung function [8]. However, like most GWAS, the majority of identified variants lie in non-coding regions. To identify mechanisms connecting GWAS non-coding variants and human disease, the effect of genetic variants on gene expression (expression quantitative trait loci, or eQTL) have been successful in identifying target genes and functional mechanisms underlying disease associations. However, a large proportion of GWAS risk loci underlying complex traits such as COPD, remain unexplained [10].

Alternative polyadenylation (APA), which occurs in > 70% of human genes, plays an important role in posttranscriptional regulation by allowing genes to shorten or extend 3′ untranslated regions (UTRs) which may contain cis-regulatory elements including microRNA or RNA-binding protein binding sites. Thus APA can alter mRNA stability and translational efficiency, as well as protein localization [11]. The primary regulatory elements involved in APA are the polyadenylation signal (PAS), as well as other-cis elements located in the 3′ UTR, however other regulatory mechanisms underlying APA have been described [12].

There is growing evidence that genetic variants that alter 3′ UTR length, known as 3′ UTR APA quantitative trait loci (apaQTLs), can contribute to human disease, and several studies have demonstrated an association between genetic variants that alter 3′ UTR length and disease states [13–19]. However, to date, there has not been a comprehensive analysis of apaQTLs in lung tissue, or the role of apaQTLs in COPD and lung function, with prior analyses of apaQTLs including only 515 lung tissue samples from GTEx [20]. Here, we characterize apaQTLs in lung tissue samples from 1241 subjects and investigate their overlap with GWAS results for COPD case-control status and lung function.

Results

Discovery of alternative polyadenylation sites in lung tissue

Using RNA sequencing (RNAseq) data from 1241 lung tissue samples from subjects in the Lung Tissue Research Consortium (LTRC) (Table 1), we identified APA sites using DaPars2, which identifies and quantifies the usage of one proximal APA site in relation to the distal APA site for each annotated transcript meeting expression thresholds. We identified a total of 178 012 APA sites, and after filtering out transcripts with low coverage of the UTR (< 30 reads per sample), we were left with 56 179 transcripts corresponding to 13 583 genes. On average, there were 3.9 transcripts containing proximal APA sites per gene. The mean percentage distal usage index (PDUI) per sample was 0.57, and we found a roughly equal distribution of the distal and proximal APA sites, with 10 715 transcripts primarily utilizing the distal site (mean PDUI>0.9) and 9971 transcripts primarily utilizing the proximal site (mean PDUI<0.1). The mean distance between the proximal and distal APA site was 561 bases.

Table 1

Clinical characteristics of LTRC study individuals included in the analysis.

Overall (n = 1241)
Gender, male, n (%)52.4
Age, mean (SD)63.3 (10.6)
Race, n (%)
 White
 Asian
 Black
 Hispanic
 Other

1118 (90.1)
4 (0.3)
81 (6.5)
25 (2.0)
13 (1.0)
Current Smokers, n (% smokers)73 (5.9)
Pack-Years Smoked, mean (SD)31.8 (33.2)
Overall (n = 1241)
Gender, male, n (%)52.4
Age, mean (SD)63.3 (10.6)
Race, n (%)
 White
 Asian
 Black
 Hispanic
 Other

1118 (90.1)
4 (0.3)
81 (6.5)
25 (2.0)
13 (1.0)
Current Smokers, n (% smokers)73 (5.9)
Pack-Years Smoked, mean (SD)31.8 (33.2)
Table 1

Clinical characteristics of LTRC study individuals included in the analysis.

Overall (n = 1241)
Gender, male, n (%)52.4
Age, mean (SD)63.3 (10.6)
Race, n (%)
 White
 Asian
 Black
 Hispanic
 Other

1118 (90.1)
4 (0.3)
81 (6.5)
25 (2.0)
13 (1.0)
Current Smokers, n (% smokers)73 (5.9)
Pack-Years Smoked, mean (SD)31.8 (33.2)
Overall (n = 1241)
Gender, male, n (%)52.4
Age, mean (SD)63.3 (10.6)
Race, n (%)
 White
 Asian
 Black
 Hispanic
 Other

1118 (90.1)
4 (0.3)
81 (6.5)
25 (2.0)
13 (1.0)
Current Smokers, n (% smokers)73 (5.9)
Pack-Years Smoked, mean (SD)31.8 (33.2)

Generation of APA-QTLs

For the purposes of this analysis, we define an apaTranscript as a transcript in which we were able to identify a proximal APA site whose usage in relation to the distal APA site was significantly associated with at least one SNP (i.e. had an apaQTL) (see Supplementary Fig. 1). We identified a total of 8831 apaTranscripts, corresponding to 4216 unique genes, with APA sites that were associated with at least one SNP with q-value < 0.05. We found that the majority of lead SNPs associated with apaTranscripts (apaVariants) were located in introns (33.0%), followed by downstream gene variants (25.8%), 3′ UTR variants (16.9%), and upstream gene variants (10.0%) (Table 2). The majority of apaTranscripts were protein coding (62.6%), followed by retained introns transcripts (lacking open reading frames) (18%) and non-coding processed transcripts (7.7%) (Table 3). Out of the 5896 unique apaVariants, 5083 (86%) were also associated with gene expression with q value < 0.05. We next investigated what proportion of the apaQTLs found in our data were previously identified in GTEx lung (n = 515). Data was available for 6545 significant apaTranscripts in GTEx, and we detected 4085 of these transcripts in our dataset. Out of these, we found that 1219 were affected by significant apaQTLs. Furthermore, 209 apaQTLs shared the same SNP-transcript pair. We found that 65 apaVariants disrupted known PAS motifs (Supplementary Table 1), however only 36 of these fully abrogated the PAS motif, while the other 29 variants created a new PAS motif. Pathway analysis of genes containing apaQTLs revealed overrepresentation of pathways including “Neutrophil degranulation”, “Major pathway of rRNA processing in the cytosol”, and “mRNA splicing-Major Pathway”(Supplementary Table 2).

Table 2

apaQTL variant annotation.

NumberPercentage
intronvariant131033.0
downstreamgene_variant102325.8
3primeUTRvariant66916.9
upstreamgenevariant39610.0
noncodingtranscriptexonvariant3488.8
missensevariant721.8
synonymousvariant621.6
5 prime UTR variant300.8
Sequence feature160.4
Splice region variant & intron variant140.4
5 prime UTR premature start codon gain variant50.1
Splice region variant & non coding transcript exon variant50.1
Splice acceptor variant & intron variant40.1
Splice region variant30.08
Splice donor variant & intron variant20.05
Stop gained20.05
Stop lost20.05
Missense variant & splice region variant10.03
NumberPercentage
intronvariant131033.0
downstreamgene_variant102325.8
3primeUTRvariant66916.9
upstreamgenevariant39610.0
noncodingtranscriptexonvariant3488.8
missensevariant721.8
synonymousvariant621.6
5 prime UTR variant300.8
Sequence feature160.4
Splice region variant & intron variant140.4
5 prime UTR premature start codon gain variant50.1
Splice region variant & non coding transcript exon variant50.1
Splice acceptor variant & intron variant40.1
Splice region variant30.08
Splice donor variant & intron variant20.05
Stop gained20.05
Stop lost20.05
Missense variant & splice region variant10.03
Table 2

apaQTL variant annotation.

NumberPercentage
intronvariant131033.0
downstreamgene_variant102325.8
3primeUTRvariant66916.9
upstreamgenevariant39610.0
noncodingtranscriptexonvariant3488.8
missensevariant721.8
synonymousvariant621.6
5 prime UTR variant300.8
Sequence feature160.4
Splice region variant & intron variant140.4
5 prime UTR premature start codon gain variant50.1
Splice region variant & non coding transcript exon variant50.1
Splice acceptor variant & intron variant40.1
Splice region variant30.08
Splice donor variant & intron variant20.05
Stop gained20.05
Stop lost20.05
Missense variant & splice region variant10.03
NumberPercentage
intronvariant131033.0
downstreamgene_variant102325.8
3primeUTRvariant66916.9
upstreamgenevariant39610.0
noncodingtranscriptexonvariant3488.8
missensevariant721.8
synonymousvariant621.6
5 prime UTR variant300.8
Sequence feature160.4
Splice region variant & intron variant140.4
5 prime UTR premature start codon gain variant50.1
Splice region variant & non coding transcript exon variant50.1
Splice acceptor variant & intron variant40.1
Splice region variant30.08
Splice donor variant & intron variant20.05
Stop gained20.05
Stop lost20.05
Missense variant & splice region variant10.03
Table 3

apaQTL transcript function.

NumberPercentage
Protein coding248362.6
Retained intron71518.0
Processed transcript3077.7
Nonsense mediated decay2025.1
antisense1203.0
lincRNA972.4
IG V genea130.3
Sense overlapping50.1
Transcribed processed pseudogene50.1
Processed pseudogene40.1
Transcribed unprocessed pseudogene40.1
TECb30.08
IG C genea20.05
IG V pseudogenec10.03
Non stop decay10.03
prime3 overlapping ncrna10.03
Unprocessed pseudogene10.03
NumberPercentage
Protein coding248362.6
Retained intron71518.0
Processed transcript3077.7
Nonsense mediated decay2025.1
antisense1203.0
lincRNA972.4
IG V genea130.3
Sense overlapping50.1
Transcribed processed pseudogene50.1
Processed pseudogene40.1
Transcribed unprocessed pseudogene40.1
TECb30.08
IG C genea20.05
IG V pseudogenec10.03
Non stop decay10.03
prime3 overlapping ncrna10.03
Unprocessed pseudogene10.03

aImmunoglobulin (Ig) variable chain and T-cell receptor (TcR) genes annotated according to the international ImMunoGeneTics information system.

bTEC (To be Experimentally Confirmed): Regions with EST clusters that have polyA features that could indicate the presence of protein coding genes. These require experimental validation, either by 5′ RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies.

cInactivated immunoglobulin gene.

Table 3

apaQTL transcript function.

NumberPercentage
Protein coding248362.6
Retained intron71518.0
Processed transcript3077.7
Nonsense mediated decay2025.1
antisense1203.0
lincRNA972.4
IG V genea130.3
Sense overlapping50.1
Transcribed processed pseudogene50.1
Processed pseudogene40.1
Transcribed unprocessed pseudogene40.1
TECb30.08
IG C genea20.05
IG V pseudogenec10.03
Non stop decay10.03
prime3 overlapping ncrna10.03
Unprocessed pseudogene10.03
NumberPercentage
Protein coding248362.6
Retained intron71518.0
Processed transcript3077.7
Nonsense mediated decay2025.1
antisense1203.0
lincRNA972.4
IG V genea130.3
Sense overlapping50.1
Transcribed processed pseudogene50.1
Processed pseudogene40.1
Transcribed unprocessed pseudogene40.1
TECb30.08
IG C genea20.05
IG V pseudogenec10.03
Non stop decay10.03
prime3 overlapping ncrna10.03
Unprocessed pseudogene10.03

aImmunoglobulin (Ig) variable chain and T-cell receptor (TcR) genes annotated according to the international ImMunoGeneTics information system.

bTEC (To be Experimentally Confirmed): Regions with EST clusters that have polyA features that could indicate the presence of protein coding genes. These require experimental validation, either by 5′ RACE or RT-PCR to extend the transcripts, or by confirming expression of the putatively-encoded peptide with specific antibodies.

cInactivated immunoglobulin gene.

Colocalization analysis

To determine whether GWAS associations with COPD and lung function may be attributed to alternative polyadenylation, colocalization analysis was performed between apaQTL data and published GWAS data from COPD case control status [7] and lung function measures FEV1, forced vital capacity (FVC), FEV1/FVC ratio, and peak expiratory flow (PEF) [9]. We tested a total of 2104 unique genomic windows, including 569 windows for FEV1/FVC, 656 windows for FEV1, 638 windows for FVC, 467 windows for PEF and 21 windows for COPD case–control status containing both significant GWAS and apaQTLs. We identified APA sites in 193 genes that colocalized with at least one phenotype; specifically, APA sites in 63 genes colocalized with PEF, sites in 62 genes colocalized with FVC, sites in 27 genes colocalized with FEV1, sites in 33 genes colocalized with FEV1/FVC and sites in 8 genes colocalized with COPD (Supplementary Table 3). To select for APA sites with the strongest effect, we focused on apaQTLs with a beta score (slope from eQTL linear regression) of greater than 0.1, resulting in 27 loci for follow up (Table 4). Of these, four APA sites were located in the vicinity of the APA variant (within 60 bp), or within the 5′ UTR, and these were selected for further analysis.

Table 4

Top loci colocalized between GWAS and apaQTL data.

LocusGWAS PhenotypesLead apaQTL SNPapaQTL slopeaapaQTL  
p-valuea
Gene NamebapaQTL transcript(s)Predicted Proximal APA Site & MotifcGWAS implicated gened
6:32645386:C:A
6:33228793:G:A
FEV1, PEF6:32646995:G:A
6:32661737:T:A
0.41< 1×10−317HLA-DQA1ENST00000343139.11
ENST00000443574.1
32 643 217
32 659 946
AGER
19:44892009:G:A
19:44922203:A:G
FEV, FVC19:44878660:C:T0.32< 1×10−317NECTIN2ENST00000252485.8
ENST00000585601.1
44 878 606
44 878 613
No known motif
9:133274084:C:TPEF9:133273813:C:T−0.321.21E-78ABOENST00000611156.4
ENST00000651471.1
ENST00000453660.4
ENST00000647353.1
133 256 205
133 256 205
133 254 367
133 250 897
ABO
6:27607489:T:AFVC6:26378060:T:A
6:26376800:G:A
6:25384133:C:T
−0.311.78E-49BTN3A2ENST00000527422.5
ENST00000396948.5
ENST00000528222.1
ENST00000377708.7
ENST00000508906.6
26 375 666
26 375 666
26 368 267
26 375 948
26 375 948
19:3385231:G:AFEV119:2936537:G:A0.282.59E-10ZNF77ENST00000314531.52 934 664
13:78267917:G:AFVC13:79306066:C:A−0.241.95E-42RBM26-AS1ENST00000456602.5
ENST00000656910.1
ENST00000606584.6
ENST00000654024.1
79 422 214
79 422 214
79 422 213
79 422 401
11:2001961:G:AFEV1/FVC11:1219991:G:T0.251.45E-65MUC5BENST00000529681.51 261 708
10:79946568:A:GCOPD10:79747240:A:T
10:79753551:T:C
10:79758006:G:A
0.101.77E-103NUTM2B-AS1ENST00000665716.1
ENST00000610681.1
ENST00000483080.6
ENST00000661316.1
ENST00000658528.1
79 766 173
79 766 173
79 766 173
79 766 173
79 723 162
DYDC2  
FAM213A  
TMEM254
2:229904475:A:CPEF2:229948378:C:G−0.200.000734FBXO36ENST00000283946.8230 011 082
19:45790878:G:ACOPD19:45721396:C:T−0.167.36E-82DMPKENST00000683086.145 770 270DMPK  
SNRPD2
6:33745319:C:TFEV16:32658788:C:T−0.153.49E-259HLA-DQB1ENST00000460185.132 660 068
19:3471605:A:G
19:4977049:T:G
PEF19:4437453:A:G
19:4449290:T:C
19:4385662:A:G
−0.148.07E-09CHAF1AENST00000585854.1
ENST00000587368.1
ENST00000301280.10
4 409 054
4 433 221
4 443 163
1:161545536:C:TFEV1/FVC1:161523491:A:G
1:161593265:G:A
0.042.76E-10FCGR2CENST00000543859.5
ENST00000496692.6
161 599 806
161 589 809
6:28333322:A:GPEF6:29947986:C:T−0.131.20E-17AL645929.1ENST00000420084.129 888 117
14:68978668:G:T
14:70045724:C:T
FVC14:69487244:G:A−0.132.06E-46GALNT16ENST00000448469.869 352 182
1:24555524:T:GPEF1:24555051:C:G0.133.67E-10NCMAPENST00000374392.324 606 527
15:41548040:A:GPEF, FEV1/FVC15:41554978:T:G−0.136.46E-17LTKENST00000453182.2
ENST00000563518.5
41 504 034
41 503 904
9:133270015:A:C
9:133271182:T:C
FVC, FEV19:133271182:T:C
9:133249012:G:A
9:133279786:G:A
−0.121.28E-57ABOENST00000453660.4
ENST00000680600.1
ENST00000647353.1
133 254 367
133 250 524
133 250 897
19:8595399:C:TFEV1/FVC19:8438737:T:G0.121.98E-112MARCHF2ENST00000381035.8
ENST00000215555.7
8 438 712
8 438 712
TTTTTT
11:68849554:A:G
11:68853972:C:T
FVC, FEV111:68857818:T:G0.124.22E-09IGHMBP2ENST00000675684.168 929 301IGHMBP2
16:19715459:A:GPEF16:19708624:C:A0.117.37E-63KNOP1ENST00000565844.119 710 435
1:36322287:G:AFEV1/FVC1:36323945:A:G
1:36233260:C:T
0.101.39E-10SH3D21ENST00000672313.1
ENST00000480549.6
ENST00000373137.2
ENST00000672976.1
36 327 871
36 324 091
36 327 871
36 327 872
AATAAA
1:39594353:G:ACOPD1:39757456:T:C0.101.10E-11PPIELENST00000649325.139 522 354PABPC4  
OXCT2  
MYCL
5:140937818:A:TFVC5:140735758:C:T
5:140650658:A:T
−0.102.66E-14WDR55ENST00000504897.2
ENST00000358337.10
140 672 529
140 669 690
16:30108980:C:TPEF16:30312489:C:T
16:30327682:T:C
−0.103.77E-46NPIPB11ENST00000524087.5
ENST00000614927.1
29 383 688
29 385 534
18:47098808:C:GFVC18:47108385:A:G0.101.46E-54AC012254.2ENST00000602459.647 108 439
6:29176755:C:T
6:31598391:C:G
FVC
FEV1
6:30726039:A:G0.021.52E-25HCG18ENST00000681435.1
ENST00000680530.1
ENST00000681421.1
ENST00000656629.1
ENST00000657595.1
ENST00000670071.1
ENST00000659563.1
ENST00000665953.1
ENST00000656629.1
30 724 468
30 724 468
30 724 468
30 289 146
30 288 780
30 288 780
30 288 780
30 288 780
30 289 146
AGER
LocusGWAS PhenotypesLead apaQTL SNPapaQTL slopeaapaQTL  
p-valuea
Gene NamebapaQTL transcript(s)Predicted Proximal APA Site & MotifcGWAS implicated gened
6:32645386:C:A
6:33228793:G:A
FEV1, PEF6:32646995:G:A
6:32661737:T:A
0.41< 1×10−317HLA-DQA1ENST00000343139.11
ENST00000443574.1
32 643 217
32 659 946
AGER
19:44892009:G:A
19:44922203:A:G
FEV, FVC19:44878660:C:T0.32< 1×10−317NECTIN2ENST00000252485.8
ENST00000585601.1
44 878 606
44 878 613
No known motif
9:133274084:C:TPEF9:133273813:C:T−0.321.21E-78ABOENST00000611156.4
ENST00000651471.1
ENST00000453660.4
ENST00000647353.1
133 256 205
133 256 205
133 254 367
133 250 897
ABO
6:27607489:T:AFVC6:26378060:T:A
6:26376800:G:A
6:25384133:C:T
−0.311.78E-49BTN3A2ENST00000527422.5
ENST00000396948.5
ENST00000528222.1
ENST00000377708.7
ENST00000508906.6
26 375 666
26 375 666
26 368 267
26 375 948
26 375 948
19:3385231:G:AFEV119:2936537:G:A0.282.59E-10ZNF77ENST00000314531.52 934 664
13:78267917:G:AFVC13:79306066:C:A−0.241.95E-42RBM26-AS1ENST00000456602.5
ENST00000656910.1
ENST00000606584.6
ENST00000654024.1
79 422 214
79 422 214
79 422 213
79 422 401
11:2001961:G:AFEV1/FVC11:1219991:G:T0.251.45E-65MUC5BENST00000529681.51 261 708
10:79946568:A:GCOPD10:79747240:A:T
10:79753551:T:C
10:79758006:G:A
0.101.77E-103NUTM2B-AS1ENST00000665716.1
ENST00000610681.1
ENST00000483080.6
ENST00000661316.1
ENST00000658528.1
79 766 173
79 766 173
79 766 173
79 766 173
79 723 162
DYDC2  
FAM213A  
TMEM254
2:229904475:A:CPEF2:229948378:C:G−0.200.000734FBXO36ENST00000283946.8230 011 082
19:45790878:G:ACOPD19:45721396:C:T−0.167.36E-82DMPKENST00000683086.145 770 270DMPK  
SNRPD2
6:33745319:C:TFEV16:32658788:C:T−0.153.49E-259HLA-DQB1ENST00000460185.132 660 068
19:3471605:A:G
19:4977049:T:G
PEF19:4437453:A:G
19:4449290:T:C
19:4385662:A:G
−0.148.07E-09CHAF1AENST00000585854.1
ENST00000587368.1
ENST00000301280.10
4 409 054
4 433 221
4 443 163
1:161545536:C:TFEV1/FVC1:161523491:A:G
1:161593265:G:A
0.042.76E-10FCGR2CENST00000543859.5
ENST00000496692.6
161 599 806
161 589 809
6:28333322:A:GPEF6:29947986:C:T−0.131.20E-17AL645929.1ENST00000420084.129 888 117
14:68978668:G:T
14:70045724:C:T
FVC14:69487244:G:A−0.132.06E-46GALNT16ENST00000448469.869 352 182
1:24555524:T:GPEF1:24555051:C:G0.133.67E-10NCMAPENST00000374392.324 606 527
15:41548040:A:GPEF, FEV1/FVC15:41554978:T:G−0.136.46E-17LTKENST00000453182.2
ENST00000563518.5
41 504 034
41 503 904
9:133270015:A:C
9:133271182:T:C
FVC, FEV19:133271182:T:C
9:133249012:G:A
9:133279786:G:A
−0.121.28E-57ABOENST00000453660.4
ENST00000680600.1
ENST00000647353.1
133 254 367
133 250 524
133 250 897
19:8595399:C:TFEV1/FVC19:8438737:T:G0.121.98E-112MARCHF2ENST00000381035.8
ENST00000215555.7
8 438 712
8 438 712
TTTTTT
11:68849554:A:G
11:68853972:C:T
FVC, FEV111:68857818:T:G0.124.22E-09IGHMBP2ENST00000675684.168 929 301IGHMBP2
16:19715459:A:GPEF16:19708624:C:A0.117.37E-63KNOP1ENST00000565844.119 710 435
1:36322287:G:AFEV1/FVC1:36323945:A:G
1:36233260:C:T
0.101.39E-10SH3D21ENST00000672313.1
ENST00000480549.6
ENST00000373137.2
ENST00000672976.1
36 327 871
36 324 091
36 327 871
36 327 872
AATAAA
1:39594353:G:ACOPD1:39757456:T:C0.101.10E-11PPIELENST00000649325.139 522 354PABPC4  
OXCT2  
MYCL
5:140937818:A:TFVC5:140735758:C:T
5:140650658:A:T
−0.102.66E-14WDR55ENST00000504897.2
ENST00000358337.10
140 672 529
140 669 690
16:30108980:C:TPEF16:30312489:C:T
16:30327682:T:C
−0.103.77E-46NPIPB11ENST00000524087.5
ENST00000614927.1
29 383 688
29 385 534
18:47098808:C:GFVC18:47108385:A:G0.101.46E-54AC012254.2ENST00000602459.647 108 439
6:29176755:C:T
6:31598391:C:G
FVC
FEV1
6:30726039:A:G0.021.52E-25HCG18ENST00000681435.1
ENST00000680530.1
ENST00000681421.1
ENST00000656629.1
ENST00000657595.1
ENST00000670071.1
ENST00000659563.1
ENST00000665953.1
ENST00000656629.1
30 724 468
30 724 468
30 724 468
30 289 146
30 288 780
30 288 780
30 288 780
30 288 780
30 289 146
AGER

aFor some loci, the lead apaQTL-variant was associated with APA sites in multiple transcripts, the p-value and slope for the association with the lowest P-value is reported here.

bIf the lead apaQTL variant was associated with APA sites in multiple genes, only the gene containing associations with the lowest P-value are reported.

cMotif information is only provided for loci where the variant is within the 5′ UTR.

dGWAS implicated gene was ascertained from published GWAS findings for COPD case-control status [7] and lung function [9].

Table 4

Top loci colocalized between GWAS and apaQTL data.

LocusGWAS PhenotypesLead apaQTL SNPapaQTL slopeaapaQTL  
p-valuea
Gene NamebapaQTL transcript(s)Predicted Proximal APA Site & MotifcGWAS implicated gened
6:32645386:C:A
6:33228793:G:A
FEV1, PEF6:32646995:G:A
6:32661737:T:A
0.41< 1×10−317HLA-DQA1ENST00000343139.11
ENST00000443574.1
32 643 217
32 659 946
AGER
19:44892009:G:A
19:44922203:A:G
FEV, FVC19:44878660:C:T0.32< 1×10−317NECTIN2ENST00000252485.8
ENST00000585601.1
44 878 606
44 878 613
No known motif
9:133274084:C:TPEF9:133273813:C:T−0.321.21E-78ABOENST00000611156.4
ENST00000651471.1
ENST00000453660.4
ENST00000647353.1
133 256 205
133 256 205
133 254 367
133 250 897
ABO
6:27607489:T:AFVC6:26378060:T:A
6:26376800:G:A
6:25384133:C:T
−0.311.78E-49BTN3A2ENST00000527422.5
ENST00000396948.5
ENST00000528222.1
ENST00000377708.7
ENST00000508906.6
26 375 666
26 375 666
26 368 267
26 375 948
26 375 948
19:3385231:G:AFEV119:2936537:G:A0.282.59E-10ZNF77ENST00000314531.52 934 664
13:78267917:G:AFVC13:79306066:C:A−0.241.95E-42RBM26-AS1ENST00000456602.5
ENST00000656910.1
ENST00000606584.6
ENST00000654024.1
79 422 214
79 422 214
79 422 213
79 422 401
11:2001961:G:AFEV1/FVC11:1219991:G:T0.251.45E-65MUC5BENST00000529681.51 261 708
10:79946568:A:GCOPD10:79747240:A:T
10:79753551:T:C
10:79758006:G:A
0.101.77E-103NUTM2B-AS1ENST00000665716.1
ENST00000610681.1
ENST00000483080.6
ENST00000661316.1
ENST00000658528.1
79 766 173
79 766 173
79 766 173
79 766 173
79 723 162
DYDC2  
FAM213A  
TMEM254
2:229904475:A:CPEF2:229948378:C:G−0.200.000734FBXO36ENST00000283946.8230 011 082
19:45790878:G:ACOPD19:45721396:C:T−0.167.36E-82DMPKENST00000683086.145 770 270DMPK  
SNRPD2
6:33745319:C:TFEV16:32658788:C:T−0.153.49E-259HLA-DQB1ENST00000460185.132 660 068
19:3471605:A:G
19:4977049:T:G
PEF19:4437453:A:G
19:4449290:T:C
19:4385662:A:G
−0.148.07E-09CHAF1AENST00000585854.1
ENST00000587368.1
ENST00000301280.10
4 409 054
4 433 221
4 443 163
1:161545536:C:TFEV1/FVC1:161523491:A:G
1:161593265:G:A
0.042.76E-10FCGR2CENST00000543859.5
ENST00000496692.6
161 599 806
161 589 809
6:28333322:A:GPEF6:29947986:C:T−0.131.20E-17AL645929.1ENST00000420084.129 888 117
14:68978668:G:T
14:70045724:C:T
FVC14:69487244:G:A−0.132.06E-46GALNT16ENST00000448469.869 352 182
1:24555524:T:GPEF1:24555051:C:G0.133.67E-10NCMAPENST00000374392.324 606 527
15:41548040:A:GPEF, FEV1/FVC15:41554978:T:G−0.136.46E-17LTKENST00000453182.2
ENST00000563518.5
41 504 034
41 503 904
9:133270015:A:C
9:133271182:T:C
FVC, FEV19:133271182:T:C
9:133249012:G:A
9:133279786:G:A
−0.121.28E-57ABOENST00000453660.4
ENST00000680600.1
ENST00000647353.1
133 254 367
133 250 524
133 250 897
19:8595399:C:TFEV1/FVC19:8438737:T:G0.121.98E-112MARCHF2ENST00000381035.8
ENST00000215555.7
8 438 712
8 438 712
TTTTTT
11:68849554:A:G
11:68853972:C:T
FVC, FEV111:68857818:T:G0.124.22E-09IGHMBP2ENST00000675684.168 929 301IGHMBP2
16:19715459:A:GPEF16:19708624:C:A0.117.37E-63KNOP1ENST00000565844.119 710 435
1:36322287:G:AFEV1/FVC1:36323945:A:G
1:36233260:C:T
0.101.39E-10SH3D21ENST00000672313.1
ENST00000480549.6
ENST00000373137.2
ENST00000672976.1
36 327 871
36 324 091
36 327 871
36 327 872
AATAAA
1:39594353:G:ACOPD1:39757456:T:C0.101.10E-11PPIELENST00000649325.139 522 354PABPC4  
OXCT2  
MYCL
5:140937818:A:TFVC5:140735758:C:T
5:140650658:A:T
−0.102.66E-14WDR55ENST00000504897.2
ENST00000358337.10
140 672 529
140 669 690
16:30108980:C:TPEF16:30312489:C:T
16:30327682:T:C
−0.103.77E-46NPIPB11ENST00000524087.5
ENST00000614927.1
29 383 688
29 385 534
18:47098808:C:GFVC18:47108385:A:G0.101.46E-54AC012254.2ENST00000602459.647 108 439
6:29176755:C:T
6:31598391:C:G
FVC
FEV1
6:30726039:A:G0.021.52E-25HCG18ENST00000681435.1
ENST00000680530.1
ENST00000681421.1
ENST00000656629.1
ENST00000657595.1
ENST00000670071.1
ENST00000659563.1
ENST00000665953.1
ENST00000656629.1
30 724 468
30 724 468
30 724 468
30 289 146
30 288 780
30 288 780
30 288 780
30 288 780
30 289 146
AGER
LocusGWAS PhenotypesLead apaQTL SNPapaQTL slopeaapaQTL  
p-valuea
Gene NamebapaQTL transcript(s)Predicted Proximal APA Site & MotifcGWAS implicated gened
6:32645386:C:A
6:33228793:G:A
FEV1, PEF6:32646995:G:A
6:32661737:T:A
0.41< 1×10−317HLA-DQA1ENST00000343139.11
ENST00000443574.1
32 643 217
32 659 946
AGER
19:44892009:G:A
19:44922203:A:G
FEV, FVC19:44878660:C:T0.32< 1×10−317NECTIN2ENST00000252485.8
ENST00000585601.1
44 878 606
44 878 613
No known motif
9:133274084:C:TPEF9:133273813:C:T−0.321.21E-78ABOENST00000611156.4
ENST00000651471.1
ENST00000453660.4
ENST00000647353.1
133 256 205
133 256 205
133 254 367
133 250 897
ABO
6:27607489:T:AFVC6:26378060:T:A
6:26376800:G:A
6:25384133:C:T
−0.311.78E-49BTN3A2ENST00000527422.5
ENST00000396948.5
ENST00000528222.1
ENST00000377708.7
ENST00000508906.6
26 375 666
26 375 666
26 368 267
26 375 948
26 375 948
19:3385231:G:AFEV119:2936537:G:A0.282.59E-10ZNF77ENST00000314531.52 934 664
13:78267917:G:AFVC13:79306066:C:A−0.241.95E-42RBM26-AS1ENST00000456602.5
ENST00000656910.1
ENST00000606584.6
ENST00000654024.1
79 422 214
79 422 214
79 422 213
79 422 401
11:2001961:G:AFEV1/FVC11:1219991:G:T0.251.45E-65MUC5BENST00000529681.51 261 708
10:79946568:A:GCOPD10:79747240:A:T
10:79753551:T:C
10:79758006:G:A
0.101.77E-103NUTM2B-AS1ENST00000665716.1
ENST00000610681.1
ENST00000483080.6
ENST00000661316.1
ENST00000658528.1
79 766 173
79 766 173
79 766 173
79 766 173
79 723 162
DYDC2  
FAM213A  
TMEM254
2:229904475:A:CPEF2:229948378:C:G−0.200.000734FBXO36ENST00000283946.8230 011 082
19:45790878:G:ACOPD19:45721396:C:T−0.167.36E-82DMPKENST00000683086.145 770 270DMPK  
SNRPD2
6:33745319:C:TFEV16:32658788:C:T−0.153.49E-259HLA-DQB1ENST00000460185.132 660 068
19:3471605:A:G
19:4977049:T:G
PEF19:4437453:A:G
19:4449290:T:C
19:4385662:A:G
−0.148.07E-09CHAF1AENST00000585854.1
ENST00000587368.1
ENST00000301280.10
4 409 054
4 433 221
4 443 163
1:161545536:C:TFEV1/FVC1:161523491:A:G
1:161593265:G:A
0.042.76E-10FCGR2CENST00000543859.5
ENST00000496692.6
161 599 806
161 589 809
6:28333322:A:GPEF6:29947986:C:T−0.131.20E-17AL645929.1ENST00000420084.129 888 117
14:68978668:G:T
14:70045724:C:T
FVC14:69487244:G:A−0.132.06E-46GALNT16ENST00000448469.869 352 182
1:24555524:T:GPEF1:24555051:C:G0.133.67E-10NCMAPENST00000374392.324 606 527
15:41548040:A:GPEF, FEV1/FVC15:41554978:T:G−0.136.46E-17LTKENST00000453182.2
ENST00000563518.5
41 504 034
41 503 904
9:133270015:A:C
9:133271182:T:C
FVC, FEV19:133271182:T:C
9:133249012:G:A
9:133279786:G:A
−0.121.28E-57ABOENST00000453660.4
ENST00000680600.1
ENST00000647353.1
133 254 367
133 250 524
133 250 897
19:8595399:C:TFEV1/FVC19:8438737:T:G0.121.98E-112MARCHF2ENST00000381035.8
ENST00000215555.7
8 438 712
8 438 712
TTTTTT
11:68849554:A:G
11:68853972:C:T
FVC, FEV111:68857818:T:G0.124.22E-09IGHMBP2ENST00000675684.168 929 301IGHMBP2
16:19715459:A:GPEF16:19708624:C:A0.117.37E-63KNOP1ENST00000565844.119 710 435
1:36322287:G:AFEV1/FVC1:36323945:A:G
1:36233260:C:T
0.101.39E-10SH3D21ENST00000672313.1
ENST00000480549.6
ENST00000373137.2
ENST00000672976.1
36 327 871
36 324 091
36 327 871
36 327 872
AATAAA
1:39594353:G:ACOPD1:39757456:T:C0.101.10E-11PPIELENST00000649325.139 522 354PABPC4  
OXCT2  
MYCL
5:140937818:A:TFVC5:140735758:C:T
5:140650658:A:T
−0.102.66E-14WDR55ENST00000504897.2
ENST00000358337.10
140 672 529
140 669 690
16:30108980:C:TPEF16:30312489:C:T
16:30327682:T:C
−0.103.77E-46NPIPB11ENST00000524087.5
ENST00000614927.1
29 383 688
29 385 534
18:47098808:C:GFVC18:47108385:A:G0.101.46E-54AC012254.2ENST00000602459.647 108 439
6:29176755:C:T
6:31598391:C:G
FVC
FEV1
6:30726039:A:G0.021.52E-25HCG18ENST00000681435.1
ENST00000680530.1
ENST00000681421.1
ENST00000656629.1
ENST00000657595.1
ENST00000670071.1
ENST00000659563.1
ENST00000665953.1
ENST00000656629.1
30 724 468
30 724 468
30 724 468
30 289 146
30 288 780
30 288 780
30 288 780
30 288 780
30 289 146
AGER

aFor some loci, the lead apaQTL-variant was associated with APA sites in multiple transcripts, the p-value and slope for the association with the lowest P-value is reported here.

bIf the lead apaQTL variant was associated with APA sites in multiple genes, only the gene containing associations with the lowest P-value are reported.

cMotif information is only provided for loci where the variant is within the 5′ UTR.

dGWAS implicated gene was ascertained from published GWAS findings for COPD case-control status [7] and lung function [9].

Colocalization of membrane associated ring-CH-type finger 2 (MARCHF2)

Variants in the 19p13.2 region have been associated with lung function in a GWAS [8]. Colocalization of GWAS results and apaQTL from a region flanking 2 MB of the lead variant revealed a shared signal between GWAS associations with FEV1/FVC and apaQTLs, with a colocalization posterior probability (CPP) of 0.7. The top colocalized variant was rs1055554 (grch38, chr19:8438737_T/G) (Fig. 1a), and SuSie finemapping identified rs1055554 as one of five likely causal variants in the 95% credible set. The T allele of rs1055554 is associated with increased FEV1/FVC (P = 5.8 × 10−6; ß = 0.012) as well as preferred usage of the proximal APA site (mean PDUI of TT subjects = 0.36). rs1055554 is located in the 3′ UTR of MARCHF2 and has a minor allele (G) frequency of 0.26 in LTRC. The G allele of rs1055554 disrupts the proximal PAS motif, changing the sequence of “TTTTTT” to ‘TTGTTT’, and resulting in increased usage of the distal APA site (mean PDUI of GG =0.58) (Fig. 1b and c). The T allele of rs1055554 is additionally associated with decreased gene expression of MARCHF2 (P = 4.1 × 10−143; β = 4.0) (Fig. 1d).

Image demonstrating the similarity in genetic association results between GWAS and apaQTL findings in addition to plots demonstrating the difference in RNASeq coverage over the 3′ UTR according to genotype.
Figure 1

apaQTLs for MARCHF2 colocalize with GWAS data for FEV1/FVC. (a) Locus association plot for FEV1/FVC GWAS and MARCHF2 apaQTL corresponding to ENST00000381035. The lead colocalized SNP, rs1055554, is highlighted with a diamond shape and used as the LD reference. (b) IGV coverage plot showing read pileup for the region spanning chr19:8,438,030-8,439,359 for 81 subjects from each genotype of rs1055554. (c) Boxplot of PDUI values for the ENST00000381035 transcript of MARCHF2. (d) Boxplot of RSEM gene expression values for MARCHF2.

Colocalization of NECTIN cell adhesion molecule 2 (NECTIN2)

Variants located in the 19q13.32 region have previously been associated with lung function [8] (rs3729640: FEV1  P = 5.9 × 10−6, ß =0.013; FVC P = 7.7 × 10−6, ß = 0.013). Colocalization analysis of a 2 MB region flanking rs3729640 (grch38, chr19:44878660_C/T) revealed evidence of a shared signal between GWAS data for FEV1 and FVC with apaQTL data with a CLPP of 0.8 and 0.6, respectively. The lead colocalized variant for both FEV1 and FVC was rs3729640 (Fig. 2a); furthermore SuSiE finemapping revealed that the most likely causal variant for this apaQTL association is also rs3729640 (PIP = 1.00). The T allele of rs3729640 is associated with increased FEV1 and FVC as well as decreased usage of the distal APA site. The mean PDUI of CC subjects is 0.43 compared to 0.97 in TT subjects (Fig. 2b and c). The variant did not result in a change in any known PAS motif, therefore the mechanism behind this association is unknown. We additionally found that the T allele is associated with a significant increase in NECTIN2 gene expression (P = 3.7 × 10−42; β = 5.0) (Fig. 2d).

Image demonstrating the similarity in genetic association results between GWAS and apaQTL findings in addition to plots demonstrating the difference in RNASeq coverage over the 3′ UTR according to genotype.
Figure 2

apaQTLs for NECTIN2 colocalize with GWAS data for FEV1 and FVC. (a) Locus association plots for FVC and FEV1 GWAS and NECTIN2 apaQTL corresponding to ENST00000252485. (b) IGV coverage plot showing read pileup for the region spanning chr19:44,878,000-44,879,200 for 51 subjects from each genotype of rs3729640. (c) Boxplot of PDUI values for the ENST00000252485 transcript of NECTIN2. (d) Boxplot of RSEM gene expression values for NECTIN2.

Colocalization of butyrophilin subfamily 3 member A2 (BTN3A2)

A genome-wide significant association of variants in the 6p22.2 region with lung function has been identified in two large meta-analyses of lung function GWAS data [8, 9]. The minor allele (A) of the rs72841536 variant (grch38, chr6: 26378060:_T/A), is associated with decreased FVC (P = 2.98 × 10−20, β = −0.032), however the definitive causal variant and mechanism has not been identified. We performed colocalization analysis of a 2 MB region in 6p22.2 with apaQTL data and identified a strong colocalization (PPA = 0.91) of signals from the two datasets (Fig. 3). The best colocalized SNP is rs72841536, which was also identified by SuSie as one of 55 variants in the 95% credible set. The A allele of rs72841536 was associated with decreased usage of the proximal site and therefore a shortened UTR. The mean PDUI for the TT subjects is 0.84 while the PDUI for AA is 0.79. The distance between the predicted proximal and distal APA sites is 222 bp. We further found that rs72841536 is an eQTL, with the A allele associated with decreased expression of BTN3A2 (P = 1.3 × 10−278, β = −57.7).

Image demonstrating the similarity in genetic association results between GWAS and apaQTL findings in addition to plots demonstrating the difference in RNASeq coverage over the 3′ UTR according to genotype.
Figure 3

apaQTLs for BTN3A2 colocalize with GWAS data for FVC. (a) Locus association plot for FVC GWAS and BTN3A2 apaQTL corresponding to ENST00000527422 (BTN3A2-209). (b) IGV coverage plot showing read pileup for the region spanning chr6:26,375,333-26,376,136 (top) and chr6:26,374,161-26,378,479 for 17 subjects from each genotype of rs12116935. (c) Boxplot of PDUI values for the transcript ENST00000527422 (BTN3A2-204). (d) Boxplot of RSEM gene expression values for BTN3A2.

See supplementary text and Supplementary Fig. 3 for colocalization of SH3 domain containing 21(SH3D21).

Discussion

Although APA has been identified as an important aspect of gene regulation, affecting most human protein coding genes and contributing to many complex diseases and traits, the role of APA in COPD and lung function has not yet been characterized. In this analysis, which is the largest study of apaQTLs in human lung tissue, we identified a total of 8831 apaTranscripts from 4216 genes. Pathway analysis of genes containing significant apaQTLs identified pathways related to the inflammatory response such as “Neutrophil degranulation”, and “Interferon alpha/beta signaling”, as well as pathways related to gene expression and regulation such as “Major pathway of rRNA processing in the nucleolus and cytosol” and “mRNA splicing-Major Pathway”. Both inflammation and dysregulated expression as a result of cigarette smoke are known to play a role in the development and progression of lung disease [21, 22]. We identified substantial genetic colocalization between apaQTLs and genetic associations with COPD and lung function, with evidence of colocalization of APA sites in 193 genes with at least one GWAS phenotype. Out of the 27 top loci identified in our analysis, our study identified the same gene implicated by the GWAS for three loci, while it identified a different gene for four loci. In addition, we identified a target gene for 20 loci which previously had no known target. For three of these genes, MARCHF2, NECTIN2, and SH3D21, the apaVariant was located within 60 bp of the proximal APA site or within the 3′ UTR, and a variant in MARCHF2 was identified to modify known PAS motifs.

MARCHF2 is a member of the MARCH family of membrane bound E3 ubiquitin ligases, which function in the ubiquitination of target lysine protein to signal vesicular transport between membrane compartments [23, 24]. MARCHF2 has previously been shown to regulate the intracellular trafficking and secretion of α1-antitrypsin, a serine protease inhibitor in which mutations are associated with COPD, through the ubiquitination and degradation of the cargo receptor ERGIC3 [25]. In addition, MARCHF2 is involved in the ubiquitination and degradation of CFTR, the causal gene for cystic fibrosis, by associating with adaptor proteins CAL and STX6 [26], and can further play a role in CFTR-mediated autophagy [27]. Furthermore, MARCHF2 has been shown to negatively regulate the innate immune response by repressing NF-ΚB and type 1 IFN signaling pathways through the polyubiquitination and proteosomal degradation of IKBKG/NEMO [28]. We found that the lead colocalized variant for MARCHF2, rs1055554, results in the modification of a “TTTTTT” PAS signal to a “TTGTTT” sequence and decreases the usage of the proximal APA site resulting in a longer 3′ UTR. Previous studies have identified rs1055554 as an apaVariant for the same APA site in eleven cancer types [29]. Furthermore, a variant in LD with rs1055554 (rs59372292) has previously been associated with coronary artery disease [30], with the allele linked to a longer 5′ UTR (and decreased lung function) also associated with increased disease risk. The elongated 3′ UTR of MARCHF2 contains well conserved binding sites for several miRNAs including miR-383-5p.2, miR142-5p, miR-130-3 and miR-148-3p, which are not present in the shorter UTR sequence, suggesting that 3′ UTR length of MARCHF2 could trigger downstream effects, miRNA mediated transcript degradation. Furthermore we found that rs1055554 is located in a region containing H3K27Ac marks and overlapped by transcription factor binding sites (Supplementary Fig. 2a), further reinforcing the likely regulatory function of this variant.

NECTIN2 encodes a membrane glycoprotein which is one of the plasma membrane components of adherens junctions. It is mainly localized at the adherent junction in epithelial cells, and is ubiquitously expressed in a variety of cells including epithelial cells, endothelial cells, neurons and fibroblasts [31]. NECTIN2 can play a role in the initiation of intracellular signal transduction, tissue organization and cell polarity [32]. Genetic variants in NECTIN2 have been associated with many phenotypes including Alzheimer’s disease [33–35], coronary heart disease [36], multiple sclerosis [37], and cleft lip/palate [38]. The apaVariant, rs3729640, does not modify any known PAS motif near the predicted proximal APA site. This is consistent with our overall findings; we found that only a small number of apaVariants disrupted known PAS motifs. This is supported by two previous studies of apaQTLs [20, 39] which have shown that disruption of canonical signal motifs is not a major mechanism by which genetic variants influence APA. It has been proposed that other co-transcriptional mechanisms may play a role, such as competition between the spliceosome and polyadenylation factors mediated, for example, by spliceosomal RNA U1 [40] and RNAPII pausing [41]. Furthermore, RNA secondary structure may play a role, in addition to more complex mechanisms such as the recently described connection between DNA methylation, gene looping and APA regulation [42]. Investigation of regulatory and functional data (Supplementary Fig. 2b) revealed that rs3729640 is in a region containing H2K27Ac marks, contains transcription factor binding sites, and has a distal enhancer-like signature according to the ENCODE Registry of candidate cis-Regulatory Elements.

BTN3A2 encodes a transmembrane protein which is a member of the immunoglobulin superfamily [43]. The gene is located within the major histocompatibility class 1 locus, surrounded by the other family members on chromosome 6. Members of the BTN3A family are expressed in most types of immune cells including T cells, B cells, monocytes, dendritic cells and natural killer cells [44]. BTN3 molecules have been shown to inversely correlate with lymphocyte activity and are thus thought to contribute to regulation of the cellular immune response [45]. Mendelian randomization analysis has demonstrated a causal role between BTN3A2 variants and risk of COPD [46] as well as both childhood and adult onset asthma [47]. Therefore, BTN3A2 may contribute to lung function by altering the severity of the immune response. While rs72841536 has been found to be a significant eQTL in several tissues [48], it is located within the 3′ UTR of BTN3A2, and therefore the mechanism of action of this variant on gene expression was previously unknown. Here, we find that rs72841536 is associated with 3′ UTR length of BTN3A2, with the risk allele for decreased lung function (A), resulting in both a shortened UTR and significantly decreased gene expression. While we were not able to identify any specific regulatory elements within the lengthened UTR with may contribute to increased gene expression, the association between rs72841536 and BTN3A2 has previously been shown in five cancer cell lines [29], suggesting that this is a true and ubiquitous apaQTL. We also found that rs72841536 is located in a predicted transcription factor binding site for ARID5A, and is in a region containing K3K27Ac marks (Supplementary Fig. 2c).

The major strength of this study is the large sample size which allowed us to comprehensively characterize APA in lung tissue. While apaQTLs have previously been generated in several tissue types from GTEx, only 515 lung samples were available in that analysis [20], while our study contains over double that number (n = 1241). DaPars2 may have inflated Type 1 error in some scenarios, when compared to 3′ RNAseq [49], which may explain why only 1187 of the 6395 significant apaTranscripts identified in GTEx were also apaQTLs in our dataset. However, the large sample size of our analysis allowed us to be more stringent in selecting candidates for follow up, and we selected only apaQTLs with large effect sizes and low p-values. An additional explanation for the modest overlap between the apaQTLs identified in GTEx compared to our study may be that LTRC largely consists of current and former smokers, most with end stage lung disease. This may result in altered gene expression profiles compared to that in GTEx.

One potential weakness of this study is that we used standard RNAseq to estimate APA usage, which may have reduced accuracy compared to directly quantifying usage through 3′ RNAseq. In addition, using standard RNAseq, it is challenging to separate the contributions of co-transcriptional mechanisms resulting in APA site usage from post-transcriptional mechanisms such as isoform-specific decay [39]. Furthermore, our RNAseq was performed using whole lung tissue samples, which contain a variety of cell types. Therefore, it is impossible to disentangle which changes are occurring as a result of changes in transcriptional regulation from differences in inter-individual cell proportions. Additionally, it is important to note that when performing our colocalization analyses we included all GWAS variants with P < 1 × 10−5 instead of using the genome wide significance threshold of 1 × 10−8. The Bonferroni correction is known to be overly stringent in GWAS analysis due to the collinearity of variants from linkage disequilibrium. Therefore, a p-value of 1 × 10−5 is often used as a threshold for associations that are “suggestive” of significance, to identify subthreshold associations that can be validated using additional datasets. Finally, while we have identified changes in the 3′ UTR that are associated with genotype, using short read RNAseq alone, it is possible that some of the 3′ UTR changes observed are linked to mechanisms upstream of the 3′ UTR such as alternative promoter usage. Further work using long read RNAseq and protein characterization will provide more definitive evidence regarding the contribution of APA of identified genes to lung function.

In conclusion, there are many mechanisms by which genetic variation can contribute to gene regulation and subsequently modulate human disease. While many studies have demonstrated the role of gene expression and splicing in disease, few have characterized the role of 3′ UTR length on phenotype. Our study shows that apaQTLs contribute significantly to lung function, and identifies MARCHF2, NECTIN2 and BTN3A2 as novel candidate genes for lung function.

Materials and methods

Study population

All samples collected as part of the Lung Tissue Research Consortium were collected using a standardized protocol that has been described in the original study design [50]. A total of 1241 subjects had both RNAseq data and whole genome sequencing data (TOPMed Freeze 8) that passed QC and were thus included in the current analysis (Table 1). All samples were sequenced at the University of Washington Northwest Genomics Center (RNASeq) and Broad Genomics (whole genome sequencing), during phase 4 of the TOPMed program.

RNA extraction, quality control, sequencing and alignment

Details have been previously published [50]. Briefly, mRNA sequencing (RNAseq) was performed through the NHLBI TOPMed program at the University of Washington. Poly-A selection and cDNA synthesis was performed using the TruSeq Stranded mRNA kit (Illumina), and sequencing was performed on the NovaSeq6000 instrument. Sequences were aligned to GRCh38 using STAR (v2.6.1d) with the GENCODE release 29 reference. Gene-level expression quantification was performed using RSEM (v1.3.1).

Identification and quantification of alternative polyadenylation (APA)

DaPars2 [20] was used to calculate the percentage of distal poly(A) site usage index (PDUI), which is a measure of the expression of isoforms using the distal poly-A site relative to the total expression of isoforms using either the distal or proximal site. A PDUI closer to 1 indicates that there is an increased abundance of transcripts using the distal poly(A) site, resulting in a longer 3′ UTR. APA sites with less than 30 reads covering the UTR region in each subject were excluded, and transcripts with missing values in greater than 50% of subjects were removed. RSEM gene-level (v1.3.1) expression counts were generated using the Gencode GTF (v29). Genes with low expression were filtered out by keeping only genes with at least 1 count per million in at least 20% of subjects. Pathway analysis of genes containing APA sites was performed using Sigora [51].

3′ aQTL and eQTL analysis

PDUI ratios calculated using DaPars2 were quantile normalized, and RSEM gene levels counts were TMM normalized [52] prior to QTL testing. TensorQTL [53] was used to test for association between genotype of all SNPs within 1000 kb of a gene (cis-) and quantifications of PDUI or gene expression using linear models, adjusting for sex, batch, principal components of APA data and principal components of genetic ancestry. Calculation of principal components of genetic ancestry has been previously described [54]. A total of 8 792 206 variants (biallelic SNPs with MAF > 0.01) were tested for association with APA sites in 56 179 transcripts (corresponding too 13 582 unique genes) and expression of 16 266 genes. Results were annotated using ANNOVAR [55] with annotations derived from dbSNP build 150. Motif analysis has been previously described [56].

Colocalization analysis

Published GWAS data for COPD case-control status and lung function outcomes (FEV1, FVC, FEV1/FVC, PEF) [8, 9] were used for this analysis. Testing windows were generated by identifying all GWAS variants with P < 1 × 10−5 and calculating non-overlapping windows of 1 MB on either side of each SNP. Only windows containing apaVariants with FDR < 0.05 were tested. For each window, Bayesian colocalization tests were performed using the Moloc R package [57] to quantify the probability that the GWAS and apaQTL associations were due to a shared causal variant. Windows with a colocalization posterior probability (CPP) of greater than 0.6 were reported. Finemapping was performed using SuSie [58].

Acknowledgements

Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). Whole Genome Sequencing and RNASeq for “NHLBI TOPMed: The Lung Tissue Research Consortium (phs001662)” was performed at Northwest Genome Center (NWGC, HHSN268201600032I, RNASeq) and Broad Genomics (HHSN268201600034I, WGS). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed.

Author contributions

Conceptualization: A.S., P.J.C., C.P.H.; Data Curation: M.H.C., P.J.C., C.P.H.; Formal Analysis: A.S.; W.K., Z.X., R.P.C.; Writing—Original Draft Preparation: A.S., C.P.H.; Writing—Review & Editing: A.S., W.K., Z.X., R.P.C., M.H.C., A.L., P.J.C., C.P.H.

Conflict of interest statement: PJC received grant support from Bayer and Sanofi and consulting fees from Verona Pharmaceuticals. CPH has received research grants from Bayer, Boehringer-Ingelheim, and Vertex, and consulting fees from Chiesi, Sanofi, and Takeda. MHC has received grant funding from Bayer.

Funding

This work was funded by K01HL157613, R01HL157879, P01HL114501, X01HL139404, R01HL124233 and R01HL126596. MHC was supported by R01HL153248, R01HL149861, R01 HL111527HL135142, and NIGMS R35 GM140844. AL was supported by R01HL147148.

Data availability

LTRC genotyping data are available on dbGaP with accession number phs001662.v1.p1.

LTRC RNA-seq data are available through TOPMed, https://topmed.nhlbi.nih.gov

Results of apaQTL analysis are available on dbGap (phs001662.v2.p1).

References

1.

Cohen
BH
,
Ball
WC
 Jr
,
Brashears
S
. et al.  
Risk factors in chronic obstructive pulmonary disease (COPD)
.
Am J Epidemiol
 
1977
;
105
:
223
32
.

2.

Kueppers
F
,
Miller
RD
,
Gordon
H
. et al.  
Familial prevalence of chronic obstructive pulmonary disease in a matched pair study
.
Am J Med
 
1977
;
63
:
336
42
.

3.

McCloskey
SC
,
Patel
BD
,
Hinchliffe
SJ
. et al.  
Siblings of patients with severe chronic obstructive pulmonary disease have a significant risk of airflow obstruction
.
Am J Respir Crit Care Med
 
2001
;
164
:
1419
24
.

4.

Silverman
EK
,
Chapman
HA
,
Drazen
JM
. et al.  
Genetic epidemiology of severe, early-onset chronic obstructive pulmonary disease. Risk to relatives for airflow obstruction and chronic bronchitis
.
Am J Respir Crit Care Med
 
1998
;
157
:
1770
8
.

5.

Zhou
JJ
,
Cho
MH
,
Castaldi
PJ
. et al.  
Heritability of chronic obstructive pulmonary disease and related phenotypes in smokers
.
Am J Respir Crit Care Med
 
2013
;
188
:
941
7
.

6.

Hobbs
BD
,
de
 
Jong
K
,
Lamontagne
M
. et al.  
Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis
.
Nat Genet
 
2017
;
49
:
426
32
.

7.

Sakornsakolpat
P
,
Prokopenko
D
,
Lamontagne
M
. et al.  
Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations
.
Nat Genet
 
2019
;
51
:
494
505
.

8.

Shrine
N
,
Guyatt
AL
,
Erzurumluoglu
AM
. et al.  
New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries
.
Nat Genet
 
2019
;
51
:
481
93
.

9.

Shrine
N
,
Izquierdo
AG
,
Chen
J
. et al.  
Multi-ancestry genome-wide association analyses improve resolution of genes and pathways influencing lung function and chronic obstructive pulmonary disease risk
.
Nat Genet
 
2023
;
55
:
410
22
.

10.

Gamazon
ER
,
Segre
AV
,
van de
 
Bunt
M
. et al.  
Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation
.
Nat Genet
 
2018
;
50
:
956
67
.

11.

Tian
B
,
Manley
JL
.
Alternative polyadenylation of mRNA precursors
.
Nat Rev Mol Cell Biol
 
2017
;
18
:
18
30
.

12.

Vosa
U
,
Esko
T
,
Kasela
S
. et al.  
Altered gene expression associated with microRNA binding site polymorphisms
.
PLoS One
 
2015
;
10
:
e0141351
.

13.

Mariella
E
,
Marotta
F
,
Grassi
E
. et al.  
The length of the expressed 3′ UTR is an intermediate molecular phenotype linking genetic variants to complex diseases
.
Front Genet
 
2019
;
10
:
714
.

14.

Stacey
SN
,
Sulem
P
,
Jonasdottir
A
. et al.  
A germline variant in the TP53 polyadenylation signal confers cancer susceptibility
.
Nat Genet
 
2011
;
43
:
1098
103
.

15.

Higgs
DR
,
Goodbourn
SE
,
Lamb
J
. et al.  
Alpha-thalassaemia caused by a polyadenylation signal mutation
.
Nature
 
1983
;
306
:
398
400
.

16.

van der
 
Maarel
SM
,
Tawil
R
,
Tapscott
SJ
.
Facioscapulohumeral muscular dystrophy and DUX4: breaking the silence
.
Trends Mol Med
 
2011
;
17
:
252
8
.

17.

Fahiminiya
S
,
Al-Jallad
H
,
Majewski
J
. et al.  
A polyadenylation site variant causes transcript-specific BMP1 deficiency and frequent fractures in children
.
Hum Mol Genet
 
2015
;
24
:
516
24
.

18.

Hellquist
A
,
Zucchelli
M
,
Kivinen
K
. et al.  
The human GIMAP5 gene has a common polyadenylation polymorphism increasing risk to systemic lupus erythematosus
.
J Med Genet
 
2007
;
44
:
314
21
.

19.

Graham
RR
,
Kyogoku
C
,
Sigurdsson
S
. et al.  
Three functional variants of IFN regulatory factor 5 (IRF5) define risk and protective haplotypes for human lupus
.
Proc Natl Acad Sci USA
 
2007
;
104
:
6758
63
.

20.

Li
L
,
Huang
KL
,
Gao
Y
. et al.  
An atlas of alternative polyadenylation quantitative trait loci contributing to complex trait and disease heritability
.
Nat Genet
 
2021
;
53
:
994
1005
.

21.

Oudijk
EJ
,
Lammers
JW
,
Koenderman
L
.
Systemic inflammation in chronic obstructive pulmonary disease
.
Eur Respir J Suppl
 
2003
;
46
:
5s
13
.

22.

Steiling
K
,
Lenburg
ME
,
Spira
A
.
Airway gene expression in chronic obstructive pulmonary disease
.
Proc Am Thorac Soc
 
2009
;
6
:
697
700
.

23.

Bartee
E
,
Mansouri
M
,
Hovey Nerenberg
BT
. et al.  
Downregulation of major histocompatibility complex class I by human ubiquitin ligases related to viral immune evasion proteins
.
J Virol
 
2004
;
78
:
1109
20
.

24.

Nakamura
N
,
Fukuda
H
,
Kato
A
. et al.  
MARCH-II is a syntaxin-6-binding protein involved in endosomal trafficking
.
Mol Biol Cell
 
2005
;
16
:
1696
710
.

25.

Yoo
W
,
Cho
EB
,
Kim
S
. et al.  
The E3 ubiquitin ligase MARCH2 regulates ERGIC3-dependent trafficking of secretory proteins
.
J Biol Chem
 
2019
;
294
:
10900
12
.

26.

Cheng
J
,
Guggino
W
.
Ubiquitination and degradation of CFTR by the E3 ubiquitin ligase MARCH2 through its association with adaptor proteins CAL and STX6
.
PLoS One
 
2013
;
8
:
e68001
.

27.

Xia
D
,
Qu
L
,
Li
G
. et al.  
MARCH2 regulates autophagy by promoting CFTR ubiquitination and degradation and PIK3CA-AKT-MTOR signaling
.
Autophagy
 
2016
;
12
:
1614
30
.

28.

Chathuranga
K
,
Kim
TH
,
Lee
H
. et al.  
Negative regulation of NEMO signaling by the ubiquitin E3 ligase MARCH2
.
EMBO J
 
2020
;
39
:
e105139
.

29.

Yang
Y
,
Zhang
Q
,
Miao
YR
. et al.  
SNP2APA: a database for evaluating effects of genetic variants on alternative polyadenylation in human cancers
.
Nucleic Acids Res
 
2020
;
48
:
D226
32
.

30.

van der
 
Harst
P
,
Verweij
N
.
Identification of 64 novel genetic loci provides an expanded view on the genetic architecture of coronary artery disease
.
Circ Res
 
2018
;
122
:
433
43
.

31.

Zeng
T
,
Cao
Y
,
Jin
T
. et al.  
The CD112R/CD112 axis: a breakthrough in cancer immunotherapy
.
J Exp Clin Cancer Res
 
2021
;
40
:
285
.

32.

Stamm
H
,
Wellbrock
J
,
Fiedler
W
.
Interaction of PVR/PVRL2 with TIGIT/DNAM-1 as a novel immune checkpoint axis and therapeutic target in cancer
.
Mamm Genome
 
2018
;
29
:
694
702
.

33.

Harold
D
,
Abraham
R
,
Hollingworth
P
. et al.  
Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease
.
Nat Genet
 
2009
;
41
:
1088
93
.

34.

Takei
N
,
Miyashita
A
,
Tsukie
T
. et al.  
Genetic association study on in and around the APOE in late-onset Alzheimer disease in Japanese
.
Genomics
 
2009
;
93
:
441
8
.

35.

Logue
MW
,
Schu
M
,
Vardarajan
BN
. et al.  
A comprehensive genetic association study of Alzheimer disease in African Americans
.
Arch Neurol
 
2011
;
68
:
1569
79
.

36.

Freitas
EM
,
Phan
TC
,
Herbison
CE
. et al.  
The poliovirus receptor related 2 (PRR2) and apolipoprotein E genes and coronary heart disease
.
J Cardiovasc Risk
 
2002
;
9
:
59
65
.

37.

Schmidt
S
,
Pericak-Vance
MA
,
Sawcer
S
. et al.  
Allelic association of sequence variants in the herpes virus entry mediator-B gene (PVRL2) with the severity of multiple sclerosis
.
Genes Immun
 
2006
;
7
:
384
92
.

38.

Jagomagi
T
,
Nikopensius
T
,
Krjutskov
K
. et al.  
MTHFR and MSX1 contribute to the risk of nonsyndromic cleft lip/palate
.
Eur J Oral Sci
 
2010
;
118
:
213
20
.

39.

Mittleman
BE
,
Pott
S
,
Warland
S
. et al.  
Alternative polyadenylation mediates genetic regulation of gene expression
.
elife
 
2020
;
9
:
e57492
.

40.

Oh
JM
,
Di
C
,
Venters
CC
. et al.  
U1 snRNP telescripting regulates a size-function-stratified human genome
.
Nat Struct Mol Biol
 
2017
;
24
:
993
9
.

41.

Fusby
B
,
Kim
S
,
Erickson
B
. et al.  
Coordination of RNA polymerase II pausing and 3′ end processing factor recruitment with alternative polyadenylation
.
Mol Cell Biol
 
2016
;
36
:
295
303
.

42.

Nanavaty
V
,
Abrash
EW
,
Hong
C
. et al.  
DNA methylation regulates alternative polyadenylation via CTCF and the cohesin complex
.
Mol Cell
 
2020
;
78
:
752
764.e6
.

43.

Chen
S
,
Li
Z
,
Huang
W
. et al.  
Prognostic and therapeutic significance of BTN3A proteins in Tumors
.
J Cancer
 
2021
;
12
:
4505
12
.

44.

Arnett
HA
,
Viney
JL
.
Immune modulation by butyrophilins
.
Nat Rev Immunol
 
2014
;
14
:
559
69
.

45.

Yamashiro
H
,
Yoshizaki
S
,
Tadaki
T
. et al.  
Stimulation of human butyrophilin 3 molecules results in negative regulation of cellular immunity
.
J Leukoc Biol
 
2010
;
88
:
757
67
.

46.

Lamontagne
M
,
Berube
JC
,
Obeidat
M
. et al.  
Leveraging lung tissue transcriptome to uncover candidate causal genes in COPD genetic associations
.
Hum Mol Genet
 
2018
;
27
:
1819
29
.

47.

Huang
QQ
,
Tang
HHF
,
Teo
SM
. et al.  
Neonatal genetics of gene expression reveal potential origins of autoimmune and allergic disease risk
.
Nat Commun
 
2020
;
11
:
3761
.

48.

Bhalala
OG
,
Nath
AP
.,
Consortium, U.K.B.E
 et al.  
Identification of expression quantitative trait loci associated with schizophrenia and affective disorders in normal brain tissue
.
PLoS Genet
 
2018
;
14
:
e1007607
.

49.

Shah
A
,
Mittleman
BE
,
Gilad
Y
. et al.  
Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
.
Genome Biol
 
2021
;
22
:
291
.

50.

Ghosh
AJ
,
Hobbs
BD
,
Yun
JH
. et al.  
Lung tissue shows divergent gene expression between chronic obstructive pulmonary disease and idiopathic pulmonary fibrosis
.
Respir Res
 
2022
;
23
:
97
.

51.

Foroushani
AB
,
Brinkman
FS
,
Lynn
DJ
.
Pathway-GPS and SIGORA: identifying relevant pathways based on the over-representation of their gene-pair signatures
.
PeerJ
 
2013
;
1
:
e229
.

52.

Risso
D
,
Ngai
J
,
Speed
TP
. et al.  
Normalization of RNA-seq data using factor analysis of control genes or samples
.
Nat Biotechnol
 
2014
;
32
:
896
902
.

53.

Taylor-Weiner
A
,
Aguet
F
,
Haradhvala
NJ
. et al.  
Scaling computational genomics to millions of individuals with GPUs
.
Genome Biol
 
2019
;
20
:
228
.

54.

Cho
MH
,
Castaldi
PJ
,
Wan
ES
. et al.  
A genome-wide association study of COPD identifies a susceptibility locus on chromosome 19q13
.
Hum Mol Genet
 
2012
;
21
:
947
57
.

55.

Wang
K
,
Li
M
,
Hakonarson
H
.
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
.
Nucleic Acids Res
 
2010
;
38
:
e164
.

56.

Xu
Z
,
Platig
J
,
Lee
S
. et al.  
Cigarette smoking-associated isoform switching and 3' UTR lengthening via alternative polyadenylation
.
Genomics
 
2021
;
113
:
4184
95
.

57.

Giambartolomei
C
,
Zhenli Liu
J
,
Zhang
W
. et al.  
A Bayesian framework for multiple trait colocalization from summary association statistics
.
Bioinformatics
 
2018
;
34
:
2538
45
.

58.

Zou
Y
,
Carbonetto
P
,
Xie
D
. et al.  
Fast and flexible joint fine-mapping of multiple traits via the sum of single effects model
.
bioRxiv
 
2023
;
preprint
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)