Abstract

Few genome-wide association studies (GWAS) analyzing genetic regulation of morphological traits of white blood cells have been reported. We carried out a GWAS of 12 morphological traits in 869 individuals from the general population of Sardinia, Italy. These traits, included measures of cell volume, conductivity and light scatter in four white-cell populations (eosinophils, lymphocytes, monocytes, neutrophils). This analysis yielded seven statistically significant signals, four of which were novel (four novel, PRG2, P2RX3, two of CDK6). Five signals were replicated in the independent INTERVAL cohort of 11 822 individuals. The most interesting signal with large effect size on eosinophil scatter (P-value = 8.33 x 10−32, beta = −1.651, se = 0.1351) falls within the innate immunity cluster on chromosome 11, and is located in the PRG2 gene. Computational analyses revealed that a rare, Sardinian-specific PRG2:p.Ser148Pro mutation modifies PRG2 amino acid contacts and protein dynamics in a manner that could possibly explain the changes observed in eosinophil morphology. Our discoveries shed light on genetics of morphological traits. For the first time, we describe such large effect size on eosinophils morphology that is relatively frequent in Sardinian population.

Introduction

Large-scale genome-wide association studies (GWAS) have revealed thousands of loci affecting the levels of white blood cell subpopulations (1,2). However, previous GWAS scans have tended to focus on common genetic variants, while rare coding variants were often not directly assessed. The use of dense coverage genetic maps thus accesses more of genetic landscape and can also increase the likelihood to identify causal variants or proxies for the genotyped single-nucleotide polymorphisms (SNPs) (3).

The range of studies can also be extended from standard diagnostically used parameters to relatively non-classical parameters, enabling the exploration of the mechanism of regulation of white blood cell levels and their effects on related diseases. Previous GWAS studies using data on white blood cells derived from hematology analyzers have focused mainly on cell counts. Novel technologies of hematology analyzers provide additional detailed cell morphology analysis. WBC differential technology biophysically characterizes blood cells and detects abnormal cells through flow cytometric analysis with multiple light-scatter angles and digital conductivity. Cells are unaltered by staining. For example, DxH Coulter Cellular Analysis VCS technology measures cell volume (V, estimate cell size), conductivity (C, that is used to collect information about the permeability of cells, internal structure of the cell, including chemical composition and nuclear volume) and light scatter (S, that gives information about internal complexity like cellular granularity, nuclear lobularity and cell surface structure) in neutrophils, lymphocytes, monocytes and eosinophils (4).

Standard blood cell diagnostic evaluations include levels of different cell populations. For red blood cells, many morphological features are also analyzed, and platelet volume has also been usually reported. By contrast, morphologic features of white blood cells are rarely included in previous GWAS blood cell evaluations (2). However, previous GWAS on polychromatic flow cytometry–fluorescence-activated cell sorting (FACS) analyses evaluated some of these parameters (5). To date, only one study comprehensively analyzed such parameters assessed by a hematology analyzer (6). However, they are relevant in diagnosis and research and can be determined by appropriate laser settings and light refraction. Such non-classical cell parameters, are reported based on histogram data. They may be related to clinical conditions, assessed by statistical co-localization of genetic association data with other types of data. The possible function of the associated variants can also be further analyzed to look for protein folding effects of nonsynonymous variants providing insights into the function of rare coding variants.

Here, we report a GWAS of morphologic non classical white blood cell traits such as cell volume, conductivity and light scatter (7). We replicated five signals in the Interval cohort. Of two rare Sardinian specific variants in PRG2 and HBB, the second was already described in our FACS previous study (5). Using molecular dynamics (MD) analysis we infer that the previously undetected mutation in PRG2 could interfere with the mature PRG2 non-toxic granules formation, possibly influencing the crystallization/aggregation processes that mediate PRG2 toxicity.

Results

GWAS analysis

Interrogating 18 425 755 variants, either directly genotyped or successfully imputed, we conducted a GWAS on 12 white-blood cell morphological traits including measures of cell volume (V), conductivity (C) and scatter (S) in four white cell subpopulations (eosinophils, lymphocytes, monocytes, neutrophils), in 869 volunteers who are part of the SardiNIA general population cohort (Supplementary Material, Table S1, Figs S1S3) (8). This analysis yielded seven association signals with the assessed traits (Table 1) (P < 4.17 x 10−9), three known (LYZ, FAM117B, HBB) and four novel (PRG2, P2RX3, two associations with CDK6).

Table 1

Associated loci with morphological features of white blood cells

TraitChr.Position hg37Effect allele/other alleleMAF SardiniaMAF EUROPEPVALUEBETASEGeneSnp142ConsequenceNovel/knownR2h2
EOS1157 156 106G/A0.0316.2E-58.33E-32-1.6510.1351PRG2rs769591668p.Ser148Pro NP_002719.3novel0.14700.1638
EOS*1157 110 622A/G0.1140.1421.37E-14-0.59280.0757P2RX3rs80116434Intronicnovel0.06610.0710
EOV792 408 370T/C0.1830.0969.68E-310.70480.0588CDK6rs445Intronicnovel0.14220.1485
NEV792 408 370T/C0.1830.0963.94E-150.49020.0613CDK6rs445Intronicnovel0.06880.0710
MOS1269 744 014A/C0.0820.0401.54E-12-0.62640.0873LYZrs1800973p.Thr88Asn NP_000230.1known0.05610.0591
MOV2203 591 785CT/C0.5050.2912.76E-090.29690.0494FAM117Bn.a.Intronicknown0.03990.0441
LYV115 248 004A/G0.0586.2E-42.13E-090.65150.1077HBBrs11549407p.Gln40Ter NP_000509.1known0.04050.0464
TraitChr.Position hg37Effect allele/other alleleMAF SardiniaMAF EUROPEPVALUEBETASEGeneSnp142ConsequenceNovel/knownR2h2
EOS1157 156 106G/A0.0316.2E-58.33E-32-1.6510.1351PRG2rs769591668p.Ser148Pro NP_002719.3novel0.14700.1638
EOS*1157 110 622A/G0.1140.1421.37E-14-0.59280.0757P2RX3rs80116434Intronicnovel0.06610.0710
EOV792 408 370T/C0.1830.0969.68E-310.70480.0588CDK6rs445Intronicnovel0.14220.1485
NEV792 408 370T/C0.1830.0963.94E-150.49020.0613CDK6rs445Intronicnovel0.06880.0710
MOS1269 744 014A/C0.0820.0401.54E-12-0.62640.0873LYZrs1800973p.Thr88Asn NP_000230.1known0.05610.0591
MOV2203 591 785CT/C0.5050.2912.76E-090.29690.0494FAM117Bn.a.Intronicknown0.03990.0441
LYV115 248 004A/G0.0586.2E-42.13E-090.65150.1077HBBrs11549407p.Gln40Ter NP_000509.1known0.04050.0464

EOS: eosinophil scatter; EOV: eosinophil volume; NEV: neutrophil volume; MOS: monocyte scatter; LYV: lymphocyte volume.

MAF Europe – source gnomAD browser (gnomad.broadinstitute.org), positions according to the build GRCh37.

In cursive known signals.

R  2 represents the percentage of trait variance explained by the associated variant; h2 represents the narrow heritability explained by the associated variants.

aAfter conditional analysis.

Table 1

Associated loci with morphological features of white blood cells

TraitChr.Position hg37Effect allele/other alleleMAF SardiniaMAF EUROPEPVALUEBETASEGeneSnp142ConsequenceNovel/knownR2h2
EOS1157 156 106G/A0.0316.2E-58.33E-32-1.6510.1351PRG2rs769591668p.Ser148Pro NP_002719.3novel0.14700.1638
EOS*1157 110 622A/G0.1140.1421.37E-14-0.59280.0757P2RX3rs80116434Intronicnovel0.06610.0710
EOV792 408 370T/C0.1830.0969.68E-310.70480.0588CDK6rs445Intronicnovel0.14220.1485
NEV792 408 370T/C0.1830.0963.94E-150.49020.0613CDK6rs445Intronicnovel0.06880.0710
MOS1269 744 014A/C0.0820.0401.54E-12-0.62640.0873LYZrs1800973p.Thr88Asn NP_000230.1known0.05610.0591
MOV2203 591 785CT/C0.5050.2912.76E-090.29690.0494FAM117Bn.a.Intronicknown0.03990.0441
LYV115 248 004A/G0.0586.2E-42.13E-090.65150.1077HBBrs11549407p.Gln40Ter NP_000509.1known0.04050.0464
TraitChr.Position hg37Effect allele/other alleleMAF SardiniaMAF EUROPEPVALUEBETASEGeneSnp142ConsequenceNovel/knownR2h2
EOS1157 156 106G/A0.0316.2E-58.33E-32-1.6510.1351PRG2rs769591668p.Ser148Pro NP_002719.3novel0.14700.1638
EOS*1157 110 622A/G0.1140.1421.37E-14-0.59280.0757P2RX3rs80116434Intronicnovel0.06610.0710
EOV792 408 370T/C0.1830.0969.68E-310.70480.0588CDK6rs445Intronicnovel0.14220.1485
NEV792 408 370T/C0.1830.0963.94E-150.49020.0613CDK6rs445Intronicnovel0.06880.0710
MOS1269 744 014A/C0.0820.0401.54E-12-0.62640.0873LYZrs1800973p.Thr88Asn NP_000230.1known0.05610.0591
MOV2203 591 785CT/C0.5050.2912.76E-090.29690.0494FAM117Bn.a.Intronicknown0.03990.0441
LYV115 248 004A/G0.0586.2E-42.13E-090.65150.1077HBBrs11549407p.Gln40Ter NP_000509.1known0.04050.0464

EOS: eosinophil scatter; EOV: eosinophil volume; NEV: neutrophil volume; MOS: monocyte scatter; LYV: lymphocyte volume.

MAF Europe – source gnomAD browser (gnomad.broadinstitute.org), positions according to the build GRCh37.

In cursive known signals.

R  2 represents the percentage of trait variance explained by the associated variant; h2 represents the narrow heritability explained by the associated variants.

aAfter conditional analysis.

The fraction of trait variance explained by each associated locus ranges from 0.0399 to 0.147 while the heritability from 0.0441 to 0.1638 (Table 1).

Novel loci

In particular, PRG2:p.Ser148Pro, rs769591668 mutation, in exon 4 of the proteoglycan 2 gene (PRG2) gene was associated with reduced eosinophil scatter (EOS, cell granularity), P-value = 8.33 x 10−32, effect = −1.651 (Table 1, Fig. 1, Supplementary Material, Fig. S2A). PRG2 encodes a cytotoxic polypeptide contained in the eosinophil granules (9). PRG2 148Pro variant is rare or absent outside but relatively common in Sardinia (Minor Allele Frequency [MAF] Europe = 6.2 x 10−5, MAF Sardinia = 0.031, respectively in gnomAD browser) (Table 1). PRG2 represents the predominant constituent of the crystalline core of the eosinophil granule protein, and enhances natural killer cell activity, providing microbicidal activity against fungi and gram-negative and gram-positive bacteria (10). It may also be involved in antiparasitic defense mechanisms as cytotoxin and helminthoxin and in immune hypersensitivity reactions as well as in non-cytolytic histamine release from human basophils. After conditional analysis to identify independent association signals in the same locus, we discovered another association in the same region. Allele A of rs80116434 was associated with reduction of EOS. This variant is located within the P2RX3 gene that encodes for the purinergic receptor P2X P-value = 1.37 x 10−14, effect = −0.5928 (Table 1, Fig. 1, Supplementary Material, Fig. S2B). This variant is described to reduce eosinophil count (1). Interestingly, rs80116434 is located in the eosinophil regulation innate immunity genes cluster (P2RX3, PRG3, PRG2) on chromosome 11.

Step-wise conditional analyses of the region of PRG2 on chromosome 11 associated with EOS. (A) EOS association with the top variant 11:57156106; (B-C) first and second conditional analysis association. After the first conditional analysis the association of the second variant region 11:57110622 with EOS has been showed. In each plot, the y-axis shows the association strength (−log10(P-value)) versus genomic position (hg19/GRCh37) of the most significant SNP on the x-axis. Other SNPs are colored to reflect their LD with the top SNP. Genes and positions of exons are shown below.
Figure 1

Step-wise conditional analyses of the region of PRG2 on chromosome 11 associated with EOS. (A) EOS association with the top variant 11:57156106; (B-C) first and second conditional analysis association. After the first conditional analysis the association of the second variant region 11:57110622 with EOS has been showed. In each plot, the y-axis shows the association strength (−log10(P-value)) versus genomic position (hg19/GRCh37) of the most significant SNP on the x-axis. Other SNPs are colored to reflect their LD with the top SNP. Genes and positions of exons are shown below.

The associated variant in CDK6 included rs445, with allele T increasing the volume of myeloid lineage eosinophils and neutrophils: (EOV) P-value = 9.68 x 10−31, effect = 0.7048; (NEV), P-value = 3.94 x 10−15, effect = 0.4902 (Table 1, Supplementary Material, Figs S2C, D and S3A and B). Notably, rs445 had already been associated with increased volumes of granulocytes and monocytes in a previous study that used flow cytometry FACS data (5). In addition, it has been previously associated with reduced counts of myeloid cell line subpopulations (1,2). Here, we discovered that it is associated with eosinophil and neutrophil volume, pointing to a role for this variant in the myeloid cell lineage.

Known loci

As we previously reported in flow cytometry FACS data, the nonsynonymous LYZ: p.Thr88Asn mutation, rs1800973 located in the lysogene (LYZ) gene (allele 88Asn) is associated with reduced monocyte scatter (MOS) P-value = 1.54 x 10−12, effect = −0.6264 (Table 1, Supplementary Material, Figs S2E and S3C). This variant was classified as benign by ClinVar. Moreover, this variant increases monocyte counts, reduces granulocyte and lymphocyte count and side scatter in monocytes in our recent study (5). It creates an eQTL and pQTL in blood, augmenting both transcript and protein levels, as reported in Open Targets data (11). Thus, it has been associated with increased LYZ protein in blood plasma and augmented gene expression (12).

In addition, we detected an intronic variant rs149760691 within the FAM117B gene. Allele CT at this locus augmented monocyte volume (MOV) P-value = 2.76 x 10−9, effect = 0.2969 (Table 1, Supplementary Material, Supplementary Figs S2F and S3D). This variant has been previously associated with side scatter in flow cytometry data (5).

We also replicated another of our recent discovery in flow cytometry FACS data of a founder Sardinian mutation p.Gln40Ter (rs11549407) within HBB gene. Allele 40Ter was associated with reduced lymphocyte volume (LYV; Table 1, Supplementary Material, Figs S2G and S3E) (5).

Replicated loci in INTERVAL study

We replicated five signals in the INTERVAL cohort (Table 2) (6,13,14). For that study, the cohort of 11 822 whole-genome sequenced samples was used. In the INTERVAL study, 63 non classical phenotypes were measured, of which 11 summarize measurements of intracellular complexity/granularity (SSC), 16 summarize measurements of cell nucleic acid content/membrane lipid content (SFL) and 15 summarize measurements of cell morphology/volume (FSC). About 20 of the traits were platelets phenotypes, 10 were red cell phenotypes and 33 were white cell phenotypes (6). We considered a replication threshold of P-value = 0.05 adjusted for multiple variant testing using the Bonferroni method (0.05/7 variants to be replicated, thus P-value = 0.007). At this threshold, we replicated five signals, of which three novel. The remaining two signals are rare, Sardinian specific variants in PRG2 and HBB (MAF = 6.2E-5 and 6.2E-4 in Europeans in GnomAD browser, gnomad.broadinstitute.org, respectively), and were not replicated in the INTERVAL cohort study (Tables 1 and 2).

Table 2

The replication of associations in interval study

TraitChr.Position
Hg37
Effect allele/other alleleMAF SardiniaMAF IntervalP-valueBETASE
EOSa1157 156 106G/A0.031NANANANA
EOS1157 110 622A/G0.1140.1359.29E-72−0.37040.0205
EOV792 408 370T/C0.1830.0991.61E-180.20630.0235
NEV792 408 370T/C0.1830.0991.03E-270.25110.0230
MOS1269 744 014A/C0.0820.0602.78E-20−0.26260.0284
MOV2203 591 785CT/C0.5050.2990.00380.04300.0149
LYVa115 248 004A/G0.058NANANANA
TraitChr.Position
Hg37
Effect allele/other alleleMAF SardiniaMAF IntervalP-valueBETASE
EOSa1157 156 106G/A0.031NANANANA
EOS1157 110 622A/G0.1140.1359.29E-72−0.37040.0205
EOV792 408 370T/C0.1830.0991.61E-180.20630.0235
NEV792 408 370T/C0.1830.0991.03E-270.25110.0230
MOS1269 744 014A/C0.0820.0602.78E-20−0.26260.0284
MOV2203 591 785CT/C0.5050.2990.00380.04300.0149
LYVa115 248 004A/G0.058NANANANA

aFounder Sardinian mutations.

Table 2

The replication of associations in interval study

TraitChr.Position
Hg37
Effect allele/other alleleMAF SardiniaMAF IntervalP-valueBETASE
EOSa1157 156 106G/A0.031NANANANA
EOS1157 110 622A/G0.1140.1359.29E-72−0.37040.0205
EOV792 408 370T/C0.1830.0991.61E-180.20630.0235
NEV792 408 370T/C0.1830.0991.03E-270.25110.0230
MOS1269 744 014A/C0.0820.0602.78E-20−0.26260.0284
MOV2203 591 785CT/C0.5050.2990.00380.04300.0149
LYVa115 248 004A/G0.058NANANANA
TraitChr.Position
Hg37
Effect allele/other alleleMAF SardiniaMAF IntervalP-valueBETASE
EOSa1157 156 106G/A0.031NANANANA
EOS1157 110 622A/G0.1140.1359.29E-72−0.37040.0205
EOV792 408 370T/C0.1830.0991.61E-180.20630.0235
NEV792 408 370T/C0.1830.0991.03E-270.25110.0230
MOS1269 744 014A/C0.0820.0602.78E-20−0.26260.0284
MOV2203 591 785CT/C0.5050.2990.00380.04300.0149
LYVa115 248 004A/G0.058NANANANA

aFounder Sardinian mutations.

Molecular dynamic analysis of PRG2 mutation

To examine possible consequences of the PRG2:p.Ser148Pro mutation on molecular structure and conformational changes of PRG2 (also known as eosinophil granule major basic protein or MBP-1) we performed molecular modeling analysis based on the X-ray crystal structure (PDB ID 1H8U) (15). Analysis of the trajectories revealed that the mutation of Ser148-to-Pro148 mainly affects the atomic fluctuations (RMSF) of the loop 164–171 and the surroundings of position 148 (Fig. 2A-C). The presence of a Pro residue at position 148 restricts the atomic fluctuations of the neighboring residues. The stiffness induced by proline is transmitted through the contiguous beta-strand to the residues located on loop 164–171, whose positions and fluctuations are modified (Fig. 2D-E).

Consequences of the PRG2:p.Ser148Pro mutation on molecular structure and conformational changes of PRG2. (A) Root mean square deviation (RMSD), (B) radius of gyration (RG) and (C) root mean square fluctuations (RMSF) of the simulated systems with WT (black) and S148P (red) PRG2. Region corresponding to C-end of helix 138–152 and loop 164–171 is shadowed in gray, and the serine residue that is mutated is labeled by dashed line. (D) Average ribbon representation of the above systems after MD simulation were calculated from the trajectories of the last 50 ns of simulation. The mutated position (S148P) is colored in blue. (E) Detailed view of the vibrational transmission from extended, rigid C-end Pro148-containing α helix (red) to loop 164–171 (purple) through contiguous β strands (blue) of S148P PRG2.
Figure 2

Consequences of the PRG2:p.Ser148Pro mutation on molecular structure and conformational changes of PRG2. (A) Root mean square deviation (RMSD), (B) radius of gyration (RG) and (C) root mean square fluctuations (RMSF) of the simulated systems with WT (black) and S148P (red) PRG2. Region corresponding to C-end of helix 138–152 and loop 164–171 is shadowed in gray, and the serine residue that is mutated is labeled by dashed line. (D) Average ribbon representation of the above systems after MD simulation were calculated from the trajectories of the last 50 ns of simulation. The mutated position (S148P) is colored in blue. (E) Detailed view of the vibrational transmission from extended, rigid C-end Pro148-containing α helix (red) to loop 164–171 (purple) through contiguous β strands (blue) of S148P PRG2.

Moreover, an in-depth analysis of the trajectories and contacts between residues showed that substitution of Ser148 by Pro148 also affects the contact network of Tyr109 (Supplementary Material, Fig. S4). Tyr109 establishes close contact with Ser148 along the trajectory of WT PRG2. Pro148 impairs this contact and induces the rotation of Tyr109 sidechain that forms a different contact network compared to the WT PRG2. This change in the sidechain configuration could contribute to the transmission of the perturbation induced by proline to the loop 164–171. In addition, Ser148-by-Pro148 mutation also affects the electrostatic surface potential of residue located nearby loop 164–171, as Asp176 is more exposed to solvent (Supplementary Material, Fig. S5).

Functional annotation of the novel signals

We performed comprehensive queries of the Open Target PHEWAS catalog for novel variants (genetics.opentargets.org). Multiple associations with white blood subpopulations were obtained for rs445 including reduced counts of neutrophils and eosinophils, and for rs80116434 with reduced eosinophil counts, corroborating their involvement in these cells function (Supplementary Material, Table S2).

According to the Human Protein Atlas (proteinatlas.org) RNA single cell analysis confirms that PRG2 is expressed in granulocytes and in flow sorting data that include RNA-seq data set that is expressed in eosinophils. In the same analysis, CDK6 is ubiquitously expressed gene in different tissues that includes granulocytes in RNA single cell data set and eosinophil and neutrophil white blood cells in flow cytometry RNA data set.

Moreover, according to the UCSC genome browser (genome.ucsc.edu) rs445 that relay in the second intron of CDK6 gene is located in the regulatory region marked by the H3K27Ac site in GM12878 cell line, in the DNase I hypersensitivity cluster and in the enriched of transcription factor binding sites region (Supplementary Material, Fig. S6). Also, GM12878 cells chromatin state segmentation by HMM from ENCODE/Broad in UCSC genome browser shows strong enhancer feature in the rs445 region (Supplementary Material, Fig. S6). We used assay for transposase-accessible chromatin using sequencing (ATAC-seq) data, which interrogate chromatin accessibility that depicted enhancer presence in the rs445 region (Supplementary Material, Fig. S7). We showed that rs445 shows significant chromatin accessibility in all hematopoietic linages. This suggests that rs445 is located within active enhancer in all hematopoietic lineages. Moreover, chromatin regulation 3D datasets show a contact between the region that includes rs445 and the promoters of genes CDK6 and AC002454.1 (Supplementary Material, Fig. S8A). The gene expression data in different blood cells showed that CDK6 is significantly co-expressed with AC002454.1 (antisense located in the 5’UTR of CDK6) in hematopoietic precursors that include myeloid cell line precursors, showing potential regulatory character of rs445 in those cells (Supplementary Material, Fig. S8B).

Discussion

A population sequencing GWAS for non-classical morphologic parameters of white blood cells enabled the detection of a coding variant in the PRG2 gene affecting eosinophil scatter (EOS)-reduced cell granularity. After conditional analysis a second variant in the P2RX3 gene that reduced cell granularity in eosinophils (EOS) was discovered. Furthermore, a variant in the CDK6 gene, previously associated with myeloid cell counts and intracellular complexity, was found here associated with neutrophil and eosinophil volume (EOV, NEV) (Supplementary Material, Fig. S3).

PRG2 novel locus

The most interesting result is the association of EOS with the Sardinian-specific nonsynonymous p.Ser148Pro variant within proteoglycan 2 (PRG2), which is relatively frequent in Sardinia. p.Ser148Pro was also associated with a null/very low phenotype of eosinophil counts (almost complete absence or low of eosinophils in four waves data; unpublished own data). Thus, individuals homozygous for rs769591668-C were characterized by absence of eosinophils or, in some instances, very reduced eosinophil levels. Furthermore, eosinophils, when detected, were also characterized by reduced granularity. Importantly, Prg2tm1Nal/Prg2tm1Nal mice have abnormal inflammatory responses along with respiratory system inflammation and abnormal eosinophil morphology (MGI; http://www.informatics.jax.org), in agreement with our observations. Moreover, in another study, apparently healthy MBP1−/−/EPX−/− double knockout mice displayed significantly fewer circulating peripheral blood eosinophils along with disruption of eosinophilopoiesis (16). However, the presence of reduced scatter in our data on this cell population suggests that there are present eosinophils that are missed in the classical measurement due to the absence of granules, similar to the observations in the mouse knockout model.

The relatively high frequency of PRG2 148Pro variant in Sardinia was unexpected, in light of recent findings that innate immunity genes globally evolved to remove loss-of-function variants through purifying selection, which could explain some auto-inflammatory and hypersensitivity reactions now observed in humans (17,18). This relatively high frequency of morphological changes in eosinophils further agrees with the lower levels of eosinophils reported recently in children exposed to dust in traditional Amish farming (19).

Replication in independent cohort

We replicated five of the seven associated signals in the independent INTERVAL cohort. Notably, the two not replicated loci PRG2 and HBB are Sardinian-specific, founder mutations that are almost absent outside of Sardinia. Sardinians are one of the oldest European population where the distinct evolutionary history and genetic structure, genetic drift and natural selection caused the higher allele frequencies that are rare in the world. However, the second signal in HBB locus was previously described in our FACS data. With these results we demonstrated that our data are robust and warranted followed-up with functional analyses.

Molecular dynamic analysis of PRG2 mutation

PRG2 toxicity is believed to function by disrupting membranes of parasites and bacteria. PRG2 (MBP-1) is packed in the specific granules of human eosinophils as a distinctive non crystalline structure, permitting the storage of otherwise toxic proteins. PRG2 toxicity is restrained via crystallization in eosinophil secretory granules with tinctorial properties typical of amyloid-like structures (20). Soragni and coworkers have shown that residues 114–119, 131–143, 146–159 and 194–202 of PRG2 have the highest aggregation propensity. Among them, the stretch between residues 131 and 159 form a nearly continuous solvent-exposed aggregation-prone area. Our computational analysis showed that substitution of Ser148 by Pro148 not only alters the dynamic properties of the loop between residues 164–171 but also affects the contact network of Tyr 109. The changes observed in the eosinophil morphology could be explained based on these findings. Furthermore, because Ser148 is located in the middle of the solvent-exposed aggregation-prone area (residues 131–159), we propose that mutation by Pro can interfere with the formation of PRG2 non-toxic granules, tuning the crystallization/aggregation processes that mediate PRG2 toxicity.

Functional annotation of the novel signals

We observed that all three novel variants regulate eosinophil morphology and in parallel two of them (rs445, rs80116434) were associated with reduced eosinophil counts in public data pointing on the genetic regulation of those traits. Additionally, consistently with CDK6 function, the rs445 regulated neutrophil volume and was associated with reduced counts of different myeloid cell subpopulations. This is in agreement with its ubiquitous expression in different tissues that included neutrophils and eosinophils. Using public data we showed the regulatory character of rs445 that is located on active enhancer in all hematopoietic lineages. In particular, rs445 region is included in the contact of promoters of genes CDK6 and nearby located in 5’UTR antisense AC002454.1 . In line with rs445 function in myeloid cell line and underlying its regulatory character, those two genes were co-expressed in myeloid cell line precursors.

Materials and Methods

Cohorts and sample description

The SardiNIA project is a longitudinal cohort study of about 8000 general population volunteers (57% females), ranging from 18 to 102 years recruited in the Lanusei Valley, Sardinia. All volunteers are deeply genetically characterized and phenotyped for thousands of quantitative trait. In this study, we used data from 869 individuals. All participants gave informed consent to the study protocols, which were approved by the Sardinian local research committee Ethical Committee of ASSLL of Sassari (2171/CE).

The INTERVAL cohort includes blood donors aged 18 years or older from donor centers of NHS Blood and Transplant across England, UK (13,14). For this study, 11 822 whole-genome sequenced samples were used.

Phenotype measurement

We studied 12 non-diagnostic blood cell phenotypes: cell volume (V), conductivity (C) and light scatter (S) in four white blood cell subpopulations (eosinophils, lymphocytes, monocytes, neutrophils). These traits were measured with the UniCel DxH Coulter Cellular Analysis system of Beckman Coulter according to the manufacturer’s instructions. Thus, we studied associations in eosinophils (volume EOV, conductivity EOC, scatter EOS), lymphocytes (volume LYV conductivity LYC, scatter LYS), monocytes (volume MOV, conductivity MOC, scatter MOS) and neutrophils (volume NEV, conductivity NEC, scatter NES) (Supplementary Material, Table S1).

In the INTERVAL study, the phenotypes were acquired with the Sysmex XN-1000 instrument. The fluorescence intensity of light emitted by the dye measures the nucleic acid content and membrane permeability of the cell, while the intensities of diffracted light scattered sideways (SSC) and forward (FSC) by the cell measure intracellular organelle complexity and cell size, respectively (6).

Genotyping and imputation

Genetic analyses were performed on 869 samples genotyped with OmniExpress Illumina array as previously described for the entire SardiNIA cohort (21). Imputation was performed on a genome-wide scale using the Sardinian sequence-based reference panel of 3514 individuals and the Minimac software on pre-phased genotypes. After imputation, only markers with imputation quality (RSQR) > 0.3 for MAF ≥ 1% or RSQR > 0.6 for MAF < 1% were retained for association analyses (21), yielding over 18 425 755 variants (17 209 121 SNPs and 1 216 634 indels) useful for analyses.

Association analysis

Before performing GWAS, the 12 immune phenotypes were both inverse-normalized and adjusted by sex, age, age2 as covariate. Additive effects were searched using EPACTS version 3.2.6 (22). The software implements a linear mixed model adjusted with a genomic-based kinship matrix calculated using all quality-checked genotyped SNPs with MAF > 1%. The advantage of this model is that the kinship matrix encodes a wide range of sample structures, including both cryptic relatedness and population stratification. As a proof of appropriate adjustment of all confounders, the genomic inflation factor (lambda GC) for 12 GWAS was on average 1.005 (range 0.990–1.016). To identify independent signals, a conditional GWAS analysis for each trait was performed by adding the leading SNPs found in the primary GWAS as covariates to the basic model. A SNP reaching the standard genome-wide significance threshold (P-value < 5 x 10−8) adjusted using Bonferroni multiple testing method (P-value < 4.17 x 10−9) was considered significant.

Genotyping and quality controls for the INTERVAL study was described in Astle, Elding, Jiang et al. (2).

The replication P-value threshold was calculated by adjusting the nominal P-value (0.05) for multiple variant testing using the Bonferroni method (0.05/7 variants to be replicated), obtaining a P-value = 0.007.

The proportion of trait variance explained by each variant was calculated as the R2 of the single variant association model in EPACS. The heritability explained by each variant was obtained using the following formula:

|${\textrm{Effect}}^2\times 2\times \textrm{MAF}\times \Big(1\hbox{--} \textrm{MAF}\Big)$|

Molecular modeling analysis

Initial structure for Homo sapiens PRG2 was based on the crystallographic structure of PRG2 (PDB ID 1H8U) (15). Sulfate and glycerol molecules present in the crystallographic structures were removed prior to the simulations. PRG2S148P variant was built using WT protein as a template with UCFS Chimera (23). MD simulations of WT and S148P proteins were performed with OpenMM software (24) in a CUDA platform with AMBER suite ff19SB force field (25), particle mesh Ewald for long-range electrostatic interactions with and Ewald summation cutoff of 10A, and constraint of the length of all hydrogen bonds. Each system was neutralized with chlorine and sodium counter-ions according to the total charge of the protein and solvated with optimal-3-charge, 4-point rigid water model (OPC) water molecules (26). Firstly, each system was subjected to 2500 steps of energy minimization at 298 K. Langevin thermostat was used to control the temperature with a friction coefficient of 1 ps−1 and a step size of 0.002 ps. Each system was subjected to 250 ns MD simulation and snapshots of the trajectories were saved every 1 ps to evaluate the motions of residues and atoms. Finally, the trajectory analysis was performed with the CPPTRAJ module of AMBER suite (27). The coulombic electrostatic potentials of both proteins at physiological ionic strength were calculated with UCFS Chimera program from the average structures after the MD simulation. Graphic representations and further analysis were performed with OriginPro 2019b and molecular graphics with UCFS Chimera.

Acknowledgements

We thank all the volunteers who generously participated in this study.

Conflict of Interest statement. C.S. and P.A. are currently the employee of Regeneron Pharmaceuticals and the beneficiary of stock options and grants in Regeneron. The other authors declare no competing interests.

Funding

Intramural Research Program of the National Institute on Aging (N01-AG-1-2109 and HHSN271201100005C); National Institutes of Health (NIH); by research grants from the Ministry of Science and Innovation (PGC2018-096049-B-I00); European Regional Development Fund (FEDER); Andalusian Government (BIO-198, US-1254317, US-1257019, P18-FR-3487 and P18HO-4091, US/JUNTA/FEDER, UE), University of Seville (VI PPIT) and the Ramón Areces Foundation. G.P.-M. was awarded a PhD fellowship from the Spanish Ministry of Education, Culture and Sport (FPU17/04604).

References

1.

Vuckovic
,
D.
,
Bao
,
E.L.
,
Akbari
,
P.
,
Lareau
,
C.A.
,
Mousas
,
A.
,
Jiang
,
T.
,
Chen
,
M.-H.
,
Raffield
,
L.M.
,
Tardaguila
,
M.
,
Huffman
,
J.E.
 et al. (
2020
)
The polygenic and monogenic basis of blood traits and diseases
.
Cell
,
182
,
1214
1231.e11
.

2.

Astle
,
W.J.
,
Elding
,
H.
,
Jiang
,
T.
,
Allen
,
D.
,
Ruklisa
,
D.
,
Mann
,
A.L.
,
Mead
,
D.
,
Bouman
,
H.
,
Riveros-Mckay
,
F.
,
Kostadima
,
M.A.
 et al. (
2016
)
The allelic landscape of human blood cell trait variation and links to common complex disease
.
Cell
,
167
,
1415
1429.e19
.

3.

Onengut-Gumuscu
,
S.
,
Chen
,
W.-M.
,
Burren
,
O.
,
Cooper
,
N.J.
,
Quinlan
,
A.R.
,
Mychaleckyj
,
J.C.
,
Farber
,
E.
,
Bonnie
,
J.K.
,
Szpak
,
M.
,
Schofield
,
E.
 et al. (
2015
)
Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers
.
Nat. Genet.
,
47
,
381
386
.

4.

Richardson-Jones
,
A.
(
1990
)
An automated hematology instrument for comprehensive WBC, RBC, and platelet analysis
.
Am. Clin. Lab.
,
9
,
18
22
.

5.

Orrù
,
V.
,
Steri
,
M.
,
Sidore
,
C.
,
Marongiu
,
M.
,
Serra
,
V.
,
Olla
,
S.
,
Sole
,
G.
,
Lai
,
S.
,
Dei
,
M.
,
Mulas
,
A.
 et al. (
2020
)
Complex genetic signatures in immune cells underlie autoimmunity and inform therapy
.
Nat. Genet.
,
52
,
1036
1045
.

6.

Akbari
,
P.
,
Vuckovic
,
D.
,
Jiang
,
T.
,
Kundu
,
K.
,
Kreuzhuber
,
R.
,
Bao
,
E.L.
,
Mayer
,
L.
,
Collins
,
J.H.
,
Downes
,
K.
,
Georges
,
M.
 et al. (
2020
)
Genetic analyses of blood cell structure for biological and pharmacological inference
.
bioRxiv
. https://doi.org/10.1101/2020.01.30.927483.

7.

Shapiro
,
H.M.
(
1995
) Practical Flow Cytometry.
Practical Flow Cytometry
, 3rd edn.
Wiley-Liss
,
New York
.

8.

Pilia
,
G.
,
Chen
,
W.-M.
,
Scuteri
,
A.
,
Orrú
,
M.
,
Albai
,
G.
,
Dei
,
M.
,
Lai
,
S.
,
Usala
,
G.
,
Lai
,
M.
,
Loi
,
P.
 et al. (
2006
)
Heritability of cardiovascular and personality traits in 6,148 Sardinians
.
PLoS Genet.
,
2
, e132.

9.

McGrogan
,
M.
,
Simonsen
,
C.
,
Scott
,
R.
,
Griffith
,
J.
,
Ellis
,
N.
,
Kennedy
,
J.
,
Campanelli
,
D.
,
Nathan
,
C.
and
Gabay
,
J.
(
1988
)
Isolation of a complementary DNA clone encoding a precursor to human eosinophil major basic protein
.
J. Exp. Med.
,
168
,
2295
2308
.

10.

Yoshimatsu
,
K.
,
Ohya
,
Y.
,
Shikata
,
Y.
,
Seto
,
T.
,
Hasegawa
,
Y.
,
Tanaka
,
I.
,
Kawamura
,
T.
,
Kitoh
,
K.
,
Toyoshima
,
S.
and
Osawa
,
T.
(
1992
)
Purification and cDNA cloning of a novel factor produced by a human T-cell hybridoma: sequence homology with animal lectins
.
Mol. Immunol.
,
29
,
537
546
.

11.

Ghoussaini
,
M.
,
Mountjoy
,
E.
,
Carmona
,
M.
,
Peat
,
G.
,
Schmidt
,
E.M.
,
Hercules
,
A.
,
Fumis
,
L.
,
Miranda
,
A.
,
Carvalho-Silva
,
D.
,
Buniello
,
A.
 et al. (
2021
)
Open targets genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics
.
Nucleic Acids Res.
,
49
,
D1311
D1320
.

12.

Sun
,
T.
,
Holowka
,
T.
,
Song
,
Y.
,
Zierow
,
S.
,
Leng
,
L.
,
Chen
,
Y.
,
Xiong
,
H.
,
Griffith
,
J.
,
Nouraie
,
M.
,
Thuma
,
P.E.
 et al. (
2012
)
A Plasmodium-encoded cytokine suppresses T-cell immunity during malaria
.
Proc. Natl. Acad. Sci. U. S. A.
,
109
,
E2117
E2126
.

13.

Di Angelantonio
,
E.
,
Thompson
,
S.G.
,
Kaptoge
,
S.
,
Moore
,
C.
,
Walker
,
M.
,
Armitage
,
J.
,
Ouwehand
,
W.H.
,
Roberts
,
D.J.
,
Danesh
,
J.
and
INTERVAL Trial Group
(
2017
)
Efficiency and safety of varying the frequency of whole blood donation (INTERVAL): a randomised trial of 45 000 donors
.
Lancet
,
390
,
2360
2371
.

14.

Kaptoge
,
S.
,
Di Angelantonio
,
E.
,
Moore
,
C.
,
Walker
,
M.
,
Armitage
,
J.
,
Ouwehand
,
W.H.
,
Roberts
,
D.J.
,
Danesh
,
J.
,
Thompson
,
S.G.
and
INTERVAL Trial Group
(
2019
)
Longer-term efficiency and safety of increasing the frequency of whole blood donation (INTERVAL): extension study of a randomised trial of 20 757 blood donors
.
Lancet Haematol.
,
6
,
e510
e520
.

15.

Swaminathan
,
G.J.
,
Weaver
,
A.J.
,
Loegering
,
D.A.
,
Checkel
,
J.L.
,
Leonidas
,
D.D.
,
Gleich
,
G.J.
and
Acharya
,
K.R.
(
2001
)
Crystal structure of the eosinophil major basic protein at 1.8 A. An atypical lectin with a paradigm shift in specificity
.
J. Biol. Chem.
,
276
,
26197
26203
.

16.

Doyle
,
A.D.
,
Jacobsen
,
E.A.
,
Ochkur
,
S.I.
,
McGarry
,
M.P.
,
Shim
,
K.G.
,
Nguyen
,
D.T.C.
,
Protheroe
,
C.
,
Colbert
,
D.
,
Kloeber
,
J.
,
Neely
,
J.
 et al. (
2013
)
Expression of the secondary granule proteins major basic protein 1 (MBP-1) and eosinophil peroxidase (EPX) is required for eosinophilopoiesis in mice
.
Blood
,
122
,
781
790
.

17.

Dannemann
,
M.
,
Andrés
,
A.M.
and
Kelso
,
J.
(
2016
)
Introgression of neandertal- and denisovan-like haplotypes contributes to adaptive variation in human toll-like receptors
.
Am. J. Hum. Genet.
,
98
,
22
33
.

18.

Deschamps
,
M.
and
Quintana-Murci
,
L.
(
2016
)
Innate immunity and human diseases: from archaic introgression to natural selection
.
Med. Sci. (Paris)
,
32
,
1079
1086
.

19.

Stein
,
M.M.
,
Hrusch
,
C.L.
,
Gozdz
,
J.
,
Igartua
,
C.
,
Pivniouk
,
V.
,
Murray
,
S.E.
,
Ledford
,
J.G.
,
Marques Dos Santos
,
M.
,
Anderson
,
R.L.
,
Metwali
,
N.
 et al. (
2016
)
Innate immunity and asthma risk in Amish and Hutterite farm children
.
N. Engl. J. Med.
,
375
,
411
421
.

20.

Soragni
,
A.
,
Yousefi
,
S.
,
Stoeckle
,
C.
,
Soriaga
,
A.B.
,
Sawaya
,
M.R.
,
Kozlowski
,
E.
,
Schmid
,
I.
,
Radonjic-Hoesli
,
S.
,
Boutet
,
S.
,
Williams
,
G.J.
 et al. (
2015
)
Toxicity of eosinophil MBP is repressed by intracellular crystallization and promoted by extracellular aggregation
.
Mol. Cell
,
57
,
1011
1021
.

21.

Sidore
,
C.
,
Busonero
,
F.
,
Maschio
,
A.
,
Porcu
,
E.
,
Naitza
,
S.
,
Zoledziewska
,
M.
,
Mulas
,
A.
,
Pistis
,
G.
,
Steri
,
M.
,
Danjou
,
F.
 et al. (
2015
)
Genome sequencing elucidates Sardinian genetic architecture and augments association analyses for lipid and blood inflammatory markers
.
Nat. Genet.
,
47
,
1272
1281
.

22.

Kang
,
H.M.
,
Sul
,
J.H.
,
Service S.K
,
Zaitlen
,
N.A.
,
Kong
,
S.-Y.
,
Freimer
,
N.B.
,
Sabatti
,
C.
and
Eskin
,
E.
(
2010
)
Variance component model to account for sample structure in genome-wide association studies
.
Nat. Genet.
,
42
,
348
354
.

23.

Pettersen
,
E.F.
,
Goddard
,
T.D.
,
Huang
,
C.C.
,
Couch
,
G.S.
,
Greenblatt
,
D.M.
,
Meng
,
E.C.
and
Ferrin
,
T.E.
(
2004
)
UCSF chimera--a visualization system for exploratory research and analysis
.
J. Comput. Chem.
,
25
,
1605
1612
.

24.

Eastman
,
P.
,
Swails
,
J.
,
Chodera
,
J.D.
,
McGibbon
,
R.T.
,
Zhao
,
Y.
,
Beauchamp
,
K.A.
,
Wang
,
L.-P.
,
Simmonett
,
A.C.
,
Harrigan
,
M.P.
,
Stern
,
C.D.
 et al. (
2017
)
OpenMM 7: rapid development of high performance algorithms for molecular dynamics
.
PLoS Comput. Biol.
,
13
, e1005659.

25.

Tian
,
C.
,
Kasavajhala
,
K.
,
Belfon
,
K.A.A.
,
Raguette
,
L.
,
Huang
,
H.
,
Migues
,
A.N.
,
Bickel
,
J.
,
Wang
,
Y.
,
Pincay
,
J.
,
Wu
,
Q.
and
Simmerling
,
C.
(
2020
)
ff19SB: amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution
.
J. Chem. Theory Comput.
,
16
,
528
552
.

26.

Izadi
,
S.
,
Anandakrishnan
,
R.
and
Onufriev
,
A.V.
(
2014
)
Building water models: a different approach
.
J. Phys. Chem. Lett.
,
5
,
3863
3871
.

27.

Roe
,
D.R.
and
Cheatham
,
T.E.
(
2013
)
PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data
.
J. Chem. Theory Comput.
,
9
,
3084
3095
.

Author notes

Michele Marongiu and Gonzalo Pérez-Mejías authors contributed equally to this work.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)