-
PDF
- Split View
-
Views
-
Cite
Cite
Manuela Sironi, Franca Rosa Guerini, Cristina Agliardi, Mara Biasin, Rachele Cagliani, Matteo Fumagalli, Domenico Caputo, Andrea Cassinotti, Sandro Ardizzone, Milena Zanzottera, Elisabetta Bolognesi, Stefania Riva, Yasuyoshi Kanari, Masaaki Miyazawa, Mario Clerici, An Evolutionary Analysis of RAC2 Identifies Haplotypes Associated with Human Autoimmune Diseases, Molecular Biology and Evolution, Volume 28, Issue 12, December 2011, Pages 3319–3329, https://doi.org/10.1093/molbev/msr164
- Share Icon Share
Abstract
The human RAC2 gene encodes a small GTP-binding protein with a pivotal role in immune activation and in the induction of peripheral immune tolerance through restimulation-induced cell death (RICD). Different human pathogens target the protein product of RAC2, suggesting that the gene may be subject to natural selection, and that variants in RAC2 may affect immunological phenotypes in humans. We scanned the genomic region encompassing the entire transcription unit for the presence of putative noncoding regulatory elements conserved across mammals. This information was used to select two RAC2 gene regions and analyze their intraspecific genetic diversity. Results suggest that a region covering the 3′ untranslated region has been a target of multiallelic balancing selection (or diversifying selection), and three major RAC2 haplogroups occur in human populations. Haplotypes belonging to one of these clades are associated with increased susceptibility to multiple sclerosis (P = 0.022) and earlier onset of disease symptoms (P = 0.025). This same haplogroup is significantly more common in patients with Crohn's disease compared with healthy controls (P = 0.048). These data reinforce recent evidences that susceptibility alleles/haplotypes are shared among multiple autoimmune disorders and support a causal “role for RAC2” variants in the pathogenesis of autoimmune diseases. Other genes with a role in RICD have previously been associated with autoimmunity in humans, suggesting that this pathway and RAC2 may represent novel therapeutic targets in autoimmune disorders.
Introduction
The human RAC2 (ras-related C3 botulinum toxin substrate 2) gene encodes a member of the Ras superfamily of GTP-binding proteins primarily expressed in cells of hematopoietic origin (reviewed in Heasman and Ridley 2008). The protein product of RAC2 regulates several processes central to immune and inflammatory responses including proliferation of primary T cells and differentiation towards the Th1 subtype, dendritic cell migration, neutrophil nicotinamide adenine dinucleotide phosphatase oxidase activity, and B cell maturation (Knaus et al. 1991; Li et al. 2000, 2002; Kim and Dinauer 2001; Yu et al. 2001; Croker et al. 2002; Benvenuti et al. 2004; Decoursey and Ligeti 2005). Therefore, RAC2 can be regarded as a pivotal gene in the elicitation of immune responses, but it also plays a role in the induction of peripheral immune tolerance. Indeed, RAC2 represents an essential component of restimulation-induced cell death (RICD) via the Fas/FasL pathway (Ramaswamy et al. 2007).
Genes involved in immune response are believed to be frequent targets of natural selection (reviewed in Hurst 2009; Barreiro and Quintana-Murci 2010); in particular, balancing selection, which is thought to be a relatively rare phenomenon in humans (Asthana et al. 2005; Charlesworth 2006), has targeted a few genes involved in both innate and adaptive immunity (Ferrer-Admetlla et al. 2008; Barreiro and Quintana-Murci 2010; Sironi and Clerici 2010). This observation has been explained by the need to finely tune effective responses to pathogens on one side and development of pathogenic autoimmune/inflammatory reactions on the other (Ferrer-Admetlla et al. 2008). Yet, the role of autoimmunity and chronic inflammatory diseases as selective pressure during the course of human evolutionary history remains to be evaluated. Therefore, the maintenance of susceptibility alleles for autoimmune diseases may also be regarded as a possible by-product of adaptation to pathogen exposure (reviewed in Sironi and Clerici 2010). In line with this view, susceptibility alleles for several human diseases have been shown to have increased in frequency in human populations as a result of recent positive selection (Barreiro and Quintana-Murci 2010), and several alleles that predispose to inflammatory bowel disease (IBD) have been a target of pathogen-driven selection (Fumagalli, Pozzali, et al. 2009). These observations suggest that past selective pressures exerted by infectious agents might have contributed to shaping the frequency spectrum of susceptibility alleles for autoimmune diseases.
Irrespective of the underlying mechanisms responsible for the maintenance of genetic diversity in humans, the identification of genes/gene regions subjected to natural selection can provide relevant information for the identification of functional variants that may influence complex phenotypic traits.
Here, we integrated the analysis of inter- and intraspecies genetic diversity to analyze the evolutionary pattern of RAC2 and demonstrate its role as a susceptibility gene for multiple sclerosis (MS) and Crohn's disease (CD).
Materials and Methods
Analysis of Functional Elements and Sequence Conservation
Data concerning DNaseI hypersensitive sites in CD4+ T cells derive from a previous work (Boyle et al. 2008) and were retrieved from the University of California–San Cruz (UCSC) annotation tables (http://genome.ucsc.edu, Duke DNaseI HS track). PhastCons elements were derived from the UCSC website (tracks: phastConsElements44wayPlacMamm and phastConsElements44wayPrimates). Only conserved elements longer than 15 bp were considered. Similarly, information on histone marks was obtained from UCSC annotation tracks (ENCODE Integrated Regulation Tracks).
HapMap Samples and Sequencing
Human genomic DNA for the three human HapMap populations (Europeans, CEU; Yoruba, YRI; and Asian, AS) was obtained from the Coriell Institute for Medical Research. The two regions in RAC2 were polymerase chain reaction (PCR) amplified and directly sequenced; primer sequences are available upon request. PCR products were treated with ExoSAP-IT (USB Corporation, Cleveland, OH), directly sequenced on both strands with a Big Dye Terminator sequencing Kit (v3.1 Applied Biosystems, Foster City, CA) and run on an Applied Biosystems ABI 3130 XL Genetic Analyzer (Applied Biosystems). Sequences were assembled using AutoAssembler version 1.4.0 (Applied Biosystems) and inspected manually by two distinct operators.
Patients, Controls, and Genotyping
For the MS case/control association study, a total of 722 individuals were enrolled: 387 relapsing-remitting multiple sclerosis (RRMS) patients (266 females and 121 males) and 335 age- and sex-matched healthy individuals (206 females and 129 males) of the same geographic origin (collection site). All patients and controls were Italians of Caucasian ethnicity. The RRMS patients were followed by MS Center of Don Gnocchi Foundation in Milan, Italy. Median age was 41.9 ± 10.9 and 42.32 ± 22.05 years for RRMS and controls, respectively. All patients underwent a standard battery of examinations, including medical history, physical and neurological examination, screening laboratory test, and brain magnetic resonance imaging. All patients with MS fulfilled the McDonald’s criteria (McDonald et al. 2001). As for the CD case/control study, we recruited 313 individuals: 149 suffering from CD (96 males and 53 females) and 164 age- and sex-matched healthy individuals (104 males and 60 females). All subjects were Italians of Caucasian ethnicity. Patients referred since diagnosis to the IBD Unit of the Luigi Sacco Hospital in Milano, a third level center for the management of IBD patients, were enrolled. The diagnosis of CD was based on international published criteria, according to clinical, endoscopic, histological, and/or radiological data (Lennard-Jones 1989). A detailed clinical history, as well as laboratory and instrumental diagnostic data, were collected.
This study was designed and carried out in accordance with the Helsinki Declaration. Only patients signing an informed consent form were enrolled in the study, which was approved by the institutional review board.
All RRMS subjects were genotyped for the HLA-DRB1 locus by sequence-specific primed PCR using Histo Type DNA well plates (BAG, Formedic diagnostici, Milan, Italy).
Genomic DNA was used as template for PCR amplification using TaqMan probes specifically designed to perform a single nucleotide polymorphism (SNP) genotyping assay for rs2899284 (A/G) and rs739041 (A/G) (TaqMan SNP genotyping assay; Applied Biosystems) and using the allelic discrimination real-time PCR method. The PCR consisted of a hot start at 95 °C for 10 min followed by 40 cycles of 94 °C for 15 s and 60 °C for 1 min. Fluorescence detection took place at a temperature of 60 °C. All assays were performed in 10 μl reactions, using TaqMan Genotyping Master Mix on 96-well plates using an ABI 7000 instrument (Applied Biosystems). After Bonferroni correction, the two SNPs complied to Hardy–Weinberg equilibrium (HWE) in all cohorts.
As for the 15 null SNPs used to test for population structure, TaqMan SNP genotyping assays were also used with the same conditions detailed above. The dbSNP IDs and reported minor allele frequencies in CEU were as follows: rs10113320 (0.319), rs10488619 (0.42), rs10785952 (0.363), rs12313915 (0.332), rs12618959 (0.283), rs2220858 (0.362), rs2625956 (0.164), rs2842063 (0.159), rs310644 (0.111), rs6687440 (0.347), rs289816 (0.111), rs4783432 (0.161), rs6001728 (0.221), rs16877243 (0.192), rs1885167 (0.204). All SNPs were in HWE in both RRMS and control subjects.
Population Genetic Analysis
Tajima’s D (Tajima 1989) and Fu and Li’s D* and F* (Fu and Li 1993) statistics, as well as diversity parameters θw (Watterson 1975) and π (Nei and Li 1979), were calculated using “libsequence” (Thornton 2003), a C++ class library providing an object-oriented framework for the analysis of molecular population genetic data. Calibrated coalescent simulations were performed using the “cosi” package (Schaffner et al. 2005) and its best-fit parameters for YRI, CEU, and AS populations with 10,000 iterations. Additional coalescent simulations were computed with the ms software (Hudson 2002) specifying the number of chromosomes; the mutation parameter estimated from the data and the recombination rate with 10,000 iterations for each demographic model. The other parameters for each model were set as previously proposed (Marth et al. 2004; Voight et al. 2005). The maximum likelihood ratio Hudson–Kreitman–Aguadé (MLHKA) test was performed using the MLHKA software (Wright and Charlesworth 2004), as previously proposed (Fumagalli, Cagliani, et al. 2009). Briefly, 16 reference loci were randomly selected among National Institute of Environmental Health Sciences (NIEHS) loci shorter than 20 kb that have been resequenced in the three populations; the only criterion was that Tajima’s D statistic did not suggest the action of natural selection (i.e., Tajima’s D statistic is higher than the 5th and lower than the 95th percentiles in the distribution of NIEHS genes). The reference set was accounted for by the following genes: VNN3 (MIM 606592), PLA2G2D (MIM 605630), MB (MIM 160000), MAD2L2 (MIM 604094), HRAS (MIM 190020), CYP17A1 (MIM 609300), ATOX1 (MIM 602270), BNIP3 (MIM 603293), CDC20 (MIM 603618), NGB (MIM 605304), TUBA1 (MIM 191110), MT3 (MIM 139255), NUDT1 (MIM 600312), PRDX5 (MIM 606583), RETN (MIM 605565), and JUND (MIM 165162). In all analyses, the chimpanzee was used as the outgroup.
Data Retrieval, Haplotype Construction, and TMRCA Calculation
Genotype data for 238 resequenced human genes were derived from the NIEHS SNPs Program Web site. In particular, we selected genes that had been resequenced in populations of defined ethnicity including CEU, YUI, and AS (NIEHS panel 2).
For HapMap samples, haplotypes were inferred using PHASE version 2.1 (Stephens et al. 2001; Stephens and Scheet 2005), a program for reconstructing haplotypes from unrelated genotype data through a Bayesian statistical method. Haplotypes for individuals resequenced in this study are available as supplementary table S1, Supplementary Material online.
The median-joining network was constructed using NETWORK 4.5 (Bandelt et al. 1999). Estimate of the time to the most recent common ancestor (TMRCA) was obtained using a maximum likelihood coalescent method implemented in GENETREE (Griffiths and Tavare 1994, 1995). The mutation rate μ was obtained on the basis of the divergence between human and chimpanzee, and under the assumption both that the species separation occurred 6 Ma (Glazko and Nei 2003) and of a generation time of 25 years. Using this μ, a maximum likelihood estimate of θ (θML) of 8 was obtained, resulting in an estimated effective population size (Ne) of 16,260, a value comparable to most figures reported in the literature (Tishkoff and Verrelli 2003). With these assumptions, the coalescence time, scaled in 2Ne units, was converted into years. For the coalescence process, 106 simulations were performed.
Estimates of allelic richness and private allelic richness for the three observed haplogroups were derived via Monte Carlo simulations by applying a rarefaction procedure to account for differences in sample size among clades (Kalinowski 2004). Specifically, we performed 1,000 simulations by randomly drawing g chromosomes (with g = 20) without replacement within each haplogroup and counting the number of polymorphic variants and private alleles within each clade. The average number of polymorphisms and private variants in the simulated samples represents our estimate of allelic richness and private allelic richness.
Association Analysis and Population Structure
Haplotype association analysis was performed using PLINK (Purcell et al. 2007) by logistic regression. Haplotype probabilities of individual subjects were incorporated as covariates in the regression models, which estimate the odds ratios associated with having a specific haplotype under an additive model. Sex was also used as a covariate. Wald tests were used to calculate P values.
In order to test for the presence of population stratification, 15 “null” SNPs (IDs are reported above) were selected. Specifically, 15 SNPs from a panel of ancestry informative markers (Enoch et al. 2006) were selected to be located on different chromosomes and for having a minor allele frequency in HapMap CEU > 0.10. Using these data, the inflation factor λ (Devlin et al. 2001) was calculated through the use of the Genomic Control (GC) function in the SNPassoc R package (Gonzalez et al. 2007). For estimation of the ancestry proportion for each RRMS and control subject, we used STRUCTURE 2.1 (Pritchard et al. 2000; Falush et al. 2003) based on the genotype information of the 15 null SNPs. Simulation parameters were set to 100,000 burn-ins followed by 100,000 runs. The most common criterion to detect the true number of populations (K) is based on computing an estimate of the posterior probability of the data for a given K, L(K). However, choosing K which maximizes the value of L(K) tends to overestimate the true number of populations, especially when a correction for the correlated population allele frequencies is adopted (Evanno et al. 2005). To overcome this issue, we computed a statistics based on the second rate of change of L(K) with respect to the value of K, as previously proposed (Evanno et al. 2005). The peak of this new function detects the most likely value of K, which, in our case, was estimated to be equal to 2. Individual admixture estimates obtained from STRUCTURE 2.1 were added as continuous covariates (with haplotypes and sex) in a second logistic regression analysis to correct for possible cryptic substructure. Only 1 of the 2 (K = 2) admixture proportions was included because the two proportions sum to 1, making them collinear.
For analysis of association between specific RAC2 haplotypes and age at onset, linear regression analyses were performed using PLINK (Purcell et al. 2007). For RRMS subjects, presence/absence of the HLA-DRB*15 allele was used a covariate in the regression.
Results
Analysis of Interspecies Sequence Conservation
Given the central role of RAC2 in immune responses, it is conceivable that variants located within or in proximity of the gene affect the susceptibility to infectious or autoimmune diseases in humans. If this were the case, such variants may also represent natural selection targets.
No nonsynonymous SNP in the gene has been described in dbSNP (Build 131, http://www.ncbi.nlm.nih.gov/projects/SNP); similarly, inspection of the 1000 Genomes Project data (http://browser.1000genomes.org) identified no amino acid replacement variant, suggesting that protein sequence variation in RAC2 does not contribute significantly to intraspecific and phenotypic diversity in human populations. Therefore, we aimed at identifying RAC2 genic regions that may be more likely to harbor noncoding functional elements and, possibly, regulatory variants.
To this purpose, we relied on the concept whereby noncoding elements that are conserved across multiple species are more likely to be functional (Sironi et al. 2005). We scanned a genomic region covering the whole RAC2 transcription unit and flanking regions for the presence of DNaseI hypersensitive sites in CD4+ T cells, noncoding regions conserved in placental mammals, and noncoding sequences conserved across primates (see Materials and Methods for details). As reported in figure 1, two distinct regions displayed more than one conserved element and DNaseI hypersensitive sites: One is located in intron 5 and the other covers the 3′ untranslated region and a genomic portion downstream the transcription end site. We therefore selected these two regions, hereafter referred to as RAC2in5 (4,202 bp) and RAC23′ (5,560 bp), for further analysis.

Schematic diagram of the exon–intron structure of RAC2. Along the gene, gray boxes represent exons (coding regions and UTRs are denoted by having different height). The location of DNaseI hypersensitive sites in CD4+ T cells is shown below the gene diagram (green boxes). Noncoding conserved elements in primate (red) and placental mammals (blue) are also reported. The two regions selected for resequencing are delimited by the hatched lines.
Nucleotide Diversity, Neutrality Tests, and Time Estimate
In order to get a complete view of the genetic diversity in RAC2in5 and RAC23′, and analyze their evolutionary pattern in humans, we resequenced the two regions in three HapMap populations, namely YRI, CEU, and AS.
Overall, only one coding SNP was identified (Lys84Lys). For both regions, we calculated nucleotide diversity by means of two indexes: θw (Watterson 1975), an estimate of the expected per site heterozygosity, and π (Nei and Li 1979), the average number of pairwise sequence nucleotide differences. In order to compare the values we obtained for the two RAC2 regions, we calculated θw and π for 5 kb regions deriving from 238 genes resequenced by the NIEHS program in the same population samples; the percentile rank corresponding to RAC2in5 and RAC23’ in the distribution of values for NIEHS genes is reported in table 1 and indicates that RAC23′ displays extremely high nucleotide diversity in all populations; conversely, no exceptional values are observed for RAC2in5. These data suggest that genetic diversity in the RAC23’ region might be maintained in human populations by a selective process (e.g., balancing selection).
Region | Pop.a | Sb | θw (×10−4) | π (×10−4) | Tajima’s D Statistic | Fu and Li’s D* | Fu and Li’s F* | ||||||||
Rank | Rank | Rank | P | Rank | P | Rank | P | ||||||||
RAC2i5 (4,202 bp) | YRI | 21 | 11.75 | 0.78 | 13.89 | 0.93 | 0.61 | 0.90 | 0.085 | 0.021 | 0.66 | 0.38 | 0.26 | 0.75 | 0.24 |
CEU | 14 | 7.83 | 0.78 | 11.00 | 0.87 | 1.28 | 0.87 | 0.079 | −0.28 | 0.47 | 0.41 | 0.27 | 0.62 | 0.37 | |
AS | 14 | 7.83 | 0.82 | 10.01 | 0.86 | 0.88 | 0.77 | 0.22 | 0.17 | 0.65 | 0.43 | 0.47 | 0.70 | 0.34 | |
RAC23′ (5,560 bp) | YRI | 53 | 22.38 | 0.99 | 27.41 | 0.99 | 0.80 | 0.95 | 0.032 | 0.72 | 0.90 | 0.052 | 0.89 | 0.94 | 0.028 |
CEU | 42 | 17.74 | 0.98 | 23.51 | 0.99 | 1.15 | 0.83 | 0.11 | 0.95 | 0.85 | 0.10 | 1.21 | 0.90 | 0.073 | |
AS | 39 | 16.47 | 0.99 | 15.84 | 0.95 | −0.14 | 0.47 | 0.39 | 1.66 | >0.99 | 0.0038 | 1.23 | 0.91 | 0.086 |
Region | Pop.a | Sb | θw (×10−4) | π (×10−4) | Tajima’s D Statistic | Fu and Li’s D* | Fu and Li’s F* | ||||||||
Rank | Rank | Rank | P | Rank | P | Rank | P | ||||||||
RAC2i5 (4,202 bp) | YRI | 21 | 11.75 | 0.78 | 13.89 | 0.93 | 0.61 | 0.90 | 0.085 | 0.021 | 0.66 | 0.38 | 0.26 | 0.75 | 0.24 |
CEU | 14 | 7.83 | 0.78 | 11.00 | 0.87 | 1.28 | 0.87 | 0.079 | −0.28 | 0.47 | 0.41 | 0.27 | 0.62 | 0.37 | |
AS | 14 | 7.83 | 0.82 | 10.01 | 0.86 | 0.88 | 0.77 | 0.22 | 0.17 | 0.65 | 0.43 | 0.47 | 0.70 | 0.34 | |
RAC23′ (5,560 bp) | YRI | 53 | 22.38 | 0.99 | 27.41 | 0.99 | 0.80 | 0.95 | 0.032 | 0.72 | 0.90 | 0.052 | 0.89 | 0.94 | 0.028 |
CEU | 42 | 17.74 | 0.98 | 23.51 | 0.99 | 1.15 | 0.83 | 0.11 | 0.95 | 0.85 | 0.10 | 1.21 | 0.90 | 0.073 | |
AS | 39 | 16.47 | 0.99 | 15.84 | 0.95 | −0.14 | 0.47 | 0.39 | 1.66 | >0.99 | 0.0038 | 1.23 | 0.91 | 0.086 |
Population.
Number of segregating sites.
Region | Pop.a | Sb | θw (×10−4) | π (×10−4) | Tajima’s D Statistic | Fu and Li’s D* | Fu and Li’s F* | ||||||||
Rank | Rank | Rank | P | Rank | P | Rank | P | ||||||||
RAC2i5 (4,202 bp) | YRI | 21 | 11.75 | 0.78 | 13.89 | 0.93 | 0.61 | 0.90 | 0.085 | 0.021 | 0.66 | 0.38 | 0.26 | 0.75 | 0.24 |
CEU | 14 | 7.83 | 0.78 | 11.00 | 0.87 | 1.28 | 0.87 | 0.079 | −0.28 | 0.47 | 0.41 | 0.27 | 0.62 | 0.37 | |
AS | 14 | 7.83 | 0.82 | 10.01 | 0.86 | 0.88 | 0.77 | 0.22 | 0.17 | 0.65 | 0.43 | 0.47 | 0.70 | 0.34 | |
RAC23′ (5,560 bp) | YRI | 53 | 22.38 | 0.99 | 27.41 | 0.99 | 0.80 | 0.95 | 0.032 | 0.72 | 0.90 | 0.052 | 0.89 | 0.94 | 0.028 |
CEU | 42 | 17.74 | 0.98 | 23.51 | 0.99 | 1.15 | 0.83 | 0.11 | 0.95 | 0.85 | 0.10 | 1.21 | 0.90 | 0.073 | |
AS | 39 | 16.47 | 0.99 | 15.84 | 0.95 | −0.14 | 0.47 | 0.39 | 1.66 | >0.99 | 0.0038 | 1.23 | 0.91 | 0.086 |
Region | Pop.a | Sb | θw (×10−4) | π (×10−4) | Tajima’s D Statistic | Fu and Li’s D* | Fu and Li’s F* | ||||||||
Rank | Rank | Rank | P | Rank | P | Rank | P | ||||||||
RAC2i5 (4,202 bp) | YRI | 21 | 11.75 | 0.78 | 13.89 | 0.93 | 0.61 | 0.90 | 0.085 | 0.021 | 0.66 | 0.38 | 0.26 | 0.75 | 0.24 |
CEU | 14 | 7.83 | 0.78 | 11.00 | 0.87 | 1.28 | 0.87 | 0.079 | −0.28 | 0.47 | 0.41 | 0.27 | 0.62 | 0.37 | |
AS | 14 | 7.83 | 0.82 | 10.01 | 0.86 | 0.88 | 0.77 | 0.22 | 0.17 | 0.65 | 0.43 | 0.47 | 0.70 | 0.34 | |
RAC23′ (5,560 bp) | YRI | 53 | 22.38 | 0.99 | 27.41 | 0.99 | 0.80 | 0.95 | 0.032 | 0.72 | 0.90 | 0.052 | 0.89 | 0.94 | 0.028 |
CEU | 42 | 17.74 | 0.98 | 23.51 | 0.99 | 1.15 | 0.83 | 0.11 | 0.95 | 0.85 | 0.10 | 1.21 | 0.90 | 0.073 | |
AS | 39 | 16.47 | 0.99 | 15.84 | 0.95 | −0.14 | 0.47 | 0.39 | 1.66 | >0.99 | 0.0038 | 1.23 | 0.91 | 0.086 |
Population.
Number of segregating sites.
Under neutral evolution, values of θw and π are expected to be roughly equal; for RAC23′, this is not verified, π being higher than θw in YRI and CEU (table 1). Tajima’s D statistic (DT) (Tajima 1989) was specifically developed to evaluate departure from neutrality by comparing θw and π; positive values of DT indicate an excess of intermediate frequency variants and are a hallmark of balancing selection. Similarly to DT, Fu and Li’s F* and D* are based on SNP frequency spectra, but they also take into account whether mutations occur in external or internal branches of a genealogy (Fu and Li 1993). As population history, in addition to selective processes, is known to affect the site frequency spectrum (SFS) and all related statistics, we performed coalescent simulations using a population genetics model that incorporates demographic scenarios (Schaffner et al. 2005). As an empirical comparison, we also exploited the availability of 5 kb windows to obtain a reference distribution of SFS-based statistics in the three populations. Neutrality tests for RAC23′ suggested departure from neutrality in all populations with significantly positive values for at least one statistic in YRI and AS (table 1). In line with these results, the value of DT in YRI and Fu and Li’s D* in AS were higher than the 95th percentile calculated over the distribution of NIEHS 5 kb windows. Conversely, no departure from neutrality was observed for RAC2in5 (table 1). Coalescent simulations using different demographic models (Marth et al. 2004; Voight et al. 2005) yielded very similar results and are available as supplementary table S2, Supplementary Material online.
In order to further explore the possibility that nucleotide diversity at the RAC23′ region has been maintained by selection, we applied an MLHKA by comparing polymorphisms and divergence levels at RAC2in5 and RAC23′ with 16 NIEHS genes resequenced in the three populations we analyzed (see Materials and Methods). Under neutral evolution, the amount of within-species diversity is predicted to correlate with levels of between-species divergence since both depend on the neutral mutation rate (Kimura 1983). The MLHKA test (Wright and Charlesworth 2004) is commonly used to verify this expectation. Results are shown in table 2 and indicate that for RAC23′, a significant excess of polymorphisms compared with divergence is observed in all populations. Conversely, no deviations from neutral expectations was observed for the RAC2in5 region (with only borderline significance in YRI).
Region | Fixed Suba | MLHKA | ||||||
YRI | CEU | AS | ||||||
kb | P | kb | P | kb | P | |||
RAC2in5 | 30 | 2.35 | 0.048 | 2.24 | 0.13 | 2.25 | 0.89 | |
RAC23′ | 59 | 3.24 | 2.6 × 10−3 | 3.72 | 4.0 × 10−4 | 3.50 | 8.6 × 10−4 |
Region | Fixed Suba | MLHKA | ||||||
YRI | CEU | AS | ||||||
kb | P | kb | P | kb | P | |||
RAC2in5 | 30 | 2.35 | 0.048 | 2.24 | 0.13 | 2.25 | 0.89 | |
RAC23′ | 59 | 3.24 | 2.6 × 10−3 | 3.72 | 4.0 × 10−4 | 3.50 | 8.6 × 10−4 |
Number of fixed substitutions (human/chimpanzee).
Selection parameter (k > 1 indicates an excess of polymorphism relative to divergence).
Region | Fixed Suba | MLHKA | ||||||
YRI | CEU | AS | ||||||
kb | P | kb | P | kb | P | |||
RAC2in5 | 30 | 2.35 | 0.048 | 2.24 | 0.13 | 2.25 | 0.89 | |
RAC23′ | 59 | 3.24 | 2.6 × 10−3 | 3.72 | 4.0 × 10−4 | 3.50 | 8.6 × 10−4 |
Region | Fixed Suba | MLHKA | ||||||
YRI | CEU | AS | ||||||
kb | P | kb | P | kb | P | |||
RAC2in5 | 30 | 2.35 | 0.048 | 2.24 | 0.13 | 2.25 | 0.89 | |
RAC23′ | 59 | 3.24 | 2.6 × 10−3 | 3.72 | 4.0 × 10−4 | 3.50 | 8.6 × 10−4 |
Number of fixed substitutions (human/chimpanzee).
Selection parameter (k > 1 indicates an excess of polymorphism relative to divergence).
Under a balancing selection regime, polymorphisms may be maintained in populations for a time, which is longer than expected under neutrality. We used GENETREE to estimate the TMRCA of the RAC23′ haplotype genealogy. The method is based on a maximum likelihood coalescent analysis (Griffiths and Tavare 1994, 1995) and assumes an infinite-site model without recombination. Therefore, haplotypes and sites that violate these assumptions need to be removed. Given the relatively low recombination rate in the region, only nine single segregating sites had to be removed. The resulting gene tree, rooted using the chimpanzee sequence, is partitioned into two deep clades, with clade A further divided into two minor branches (fig. 2) (clades A1 and A2). Using this method, the TMRCA of the whole genealogy amounted to 2.54 My (standard deviation: 0.374 My), whereas the two subclades (A1 and A2) have a shallower TMRCA of around 1.1 My.

Estimated haplotype tree for the RAC23′ gene region. Mutations are represented as black dots and named for their physical position along the regions. The absolute frequency of each haplotype is reported.
Haplotype Analysis and Association with Autoimmune Diseases
In order to gain insight into the distribution of RAC2 haplotypes in human populations, we constructed a median-joining network (Bandelt et al. 1999) using all variants identified in RAC2in5 and RAC23′ with the exclusion of singletons (fig. 3). The topology largely recapitulates the one obtained using GENETREE with two major clades (A and B), and haplotypes in clade A subdivided into two further haplogroups (referred to as A1 and A2) (fig. 3). There are clear differences in the distribution of RAC2 haplotypes among the three populations we analyzed, although Fst values (Wright 1950) were not exceptional compared with those calculated for 5 kb reference windows (supplementary table S3, Supplementary Material online). Estimates of allelic richness (and private allelic richness) within haplogroups were calculated using a rarefaction procedure (Kalinowski 2004) to account for different haplogroup frequency; results indicated that haplogroup A2 had the lowest genetic diversity among the three haplotype clades, whereas B haplotype tended to have high allelic richness (supplementary table S4, Supplementary Material online).

Median-joining network of RAC2 haplotypes. Each node represents a different haplotype, with the size of the circle proportional to frequency. Branch lengths are proportional to the number of nucleotide differences. Circles are color coded according to population (green: YRI, blue: CEU, and red: AS). The most recent common ancestor (MRCA) also shown (black circle). The three major haplogroups are evidenced, as well as the position of the two SNPs genotypes in patients and controls.
The pivotal role of RAC2 in immune response led us to verify whether the three major haplogroups were differentially represented when healthy controls were compared with subjects suffering from autoimmune diseases such as MS and CD, two diseases with a strong genetic basis (Weng et al. 2007; Nielsen et al. 2008; Langer-Gould et al. 2010) that suggests the presence of shared genetic determinants.
We selected two variants: rs2899284, located along the major branch separating clades A and B, and rs739041 that identifies haplogroup A1 (fig. 3). The typing of these two variants allows unequivocal haplogroup inference in European populations, as they are nonrecurrent in the phylogeny. The allele frequency of rs2899284 (T) and rs739041 (C) in the three populations we analyzed were as follows: CEU, 0.22 and 0.45; YRI, 0.27 and 0.67; and AS, 0.10 and 0.30.
The two SNPs were genotyped in 387 patients with RRMS and in 149 subjects suffering from CD; two independent cohorts of sex- and aged-matched controls were also analyzed. Using logistic regression analysis, haplotypes belonging to haplogroup B resulted to be significantly associated to disease status in both the RRMS and CD case/control studies (table 3).
Haplogroup | Frequency | OR (95% CI) | P | OR (95% CI)a | Pa | |
RRMS (n = 387) | Controls (n = 335) | |||||
A1 | 0.270 | 0.289 | 0.91 (0.71–1.13) | 0.426 | 0.91 (0.71–1.14) | 0.416 |
A2 | 0.390 | 0.424 | 0.85 (0.69–1.07) | 0.166 | 0.85 (0.68–1.07) | 0.161 |
B | 0.340 | 287 | 1.30 (1.04–1.64) | 0.024 | 1.31 (1.05–1.66) | 0.022 |
CD (n = 149) | Controls (n = 164) | |||||
A1 | 0.267 | 0.287 | 0.90 (0.64–1.30) | 0.556 | NP | NP |
A2 | 384 | 0.439 | 0.78 (0.58–1.09) | 0.136 | NP | NP |
B | 0.349 | 0.274 | 1.41 (1.01–1.99) | 0.048 | NP | NP |
Haplogroup | Frequency | OR (95% CI) | P | OR (95% CI)a | Pa | |
RRMS (n = 387) | Controls (n = 335) | |||||
A1 | 0.270 | 0.289 | 0.91 (0.71–1.13) | 0.426 | 0.91 (0.71–1.14) | 0.416 |
A2 | 0.390 | 0.424 | 0.85 (0.69–1.07) | 0.166 | 0.85 (0.68–1.07) | 0.161 |
B | 0.340 | 287 | 1.30 (1.04–1.64) | 0.024 | 1.31 (1.05–1.66) | 0.022 |
CD (n = 149) | Controls (n = 164) | |||||
A1 | 0.267 | 0.287 | 0.90 (0.64–1.30) | 0.556 | NP | NP |
A2 | 384 | 0.439 | 0.78 (0.58–1.09) | 0.136 | NP | NP |
B | 0.349 | 0.274 | 1.41 (1.01–1.99) | 0.048 | NP | NP |
Note.—CI, confidence interval; NP, not performed; OR, odds ratio.
Adjusted for admixture.
Haplogroup | Frequency | OR (95% CI) | P | OR (95% CI)a | Pa | |
RRMS (n = 387) | Controls (n = 335) | |||||
A1 | 0.270 | 0.289 | 0.91 (0.71–1.13) | 0.426 | 0.91 (0.71–1.14) | 0.416 |
A2 | 0.390 | 0.424 | 0.85 (0.69–1.07) | 0.166 | 0.85 (0.68–1.07) | 0.161 |
B | 0.340 | 287 | 1.30 (1.04–1.64) | 0.024 | 1.31 (1.05–1.66) | 0.022 |
CD (n = 149) | Controls (n = 164) | |||||
A1 | 0.267 | 0.287 | 0.90 (0.64–1.30) | 0.556 | NP | NP |
A2 | 384 | 0.439 | 0.78 (0.58–1.09) | 0.136 | NP | NP |
B | 0.349 | 0.274 | 1.41 (1.01–1.99) | 0.048 | NP | NP |
Haplogroup | Frequency | OR (95% CI) | P | OR (95% CI)a | Pa | |
RRMS (n = 387) | Controls (n = 335) | |||||
A1 | 0.270 | 0.289 | 0.91 (0.71–1.13) | 0.426 | 0.91 (0.71–1.14) | 0.416 |
A2 | 0.390 | 0.424 | 0.85 (0.69–1.07) | 0.166 | 0.85 (0.68–1.07) | 0.161 |
B | 0.340 | 287 | 1.30 (1.04–1.64) | 0.024 | 1.31 (1.05–1.66) | 0.022 |
CD (n = 149) | Controls (n = 164) | |||||
A1 | 0.267 | 0.287 | 0.90 (0.64–1.30) | 0.556 | NP | NP |
A2 | 384 | 0.439 | 0.78 (0.58–1.09) | 0.136 | NP | NP |
B | 0.349 | 0.274 | 1.41 (1.01–1.99) | 0.048 | NP | NP |
Note.—CI, confidence interval; NP, not performed; OR, odds ratio.
Adjusted for admixture.
Although we did not expect substantial levels of population structure in our cohorts as all subjects were Italians with Caucasian ancestry, we verified that the results we obtained are not secondary to cryptic substructure. To this aim, we genotyped RRMS subjects and controls for 15 null SNPs (see Materials and Methods). This procedure was not performed for CD patients as additional genetic material was unavailable. We used null SNPs to apply the GC method (Devlin et al. 2001): Calculation of the inflation parameter λ resulted in a value of 0.92, suggesting that there is no substantial stratification in our samples. Yet, the GC method may not be conservative when few markers are used (Balding 2006). Thus, we used the program STRUCTURE 2.1 (Pritchard et al. 2000; Falush et al. 2003) and included the inferred ancestry proportions of individual cases and controls as continuous covariates in the logistic regression analysis. A significant result was again obtained for haplotypes belonging to haplogroup B.
We next verified whether RAC2 haplotypes also affected disease expression in RRMS and CD by testing for correlations between haplotypes and age at disease onset. This latter information was available for 325 and 147 RRMS and CD patients, respectively. As shown in table 4, no association was observed between age at onset in CD and RAC2 haplotypes. Conversely, in the case of MS, haplotypes in group B were associated with a significantly earlier presentation of symptoms, whereas the opposite was observed for haplotypes in cluster A2 (table 4). The same results were obtained after accounting for the presence of the HLA-DRB*15 allele, a known risk factor for RRMS in Caucasians (supplementary table S5, Supplementary Material online).
Disease | Haplogroup | BETA | P |
RRMS (n = 325) | A1 | 0.382 | 0.602 |
A2 | 1.551 | 0.029 | |
B | −1.519 | 0.025 | |
CD (n = 147) | A1 | −0.137 | 0.940 |
A2 | −1.558 | 0.349 | |
B | 1.578 | 0.326 |
Disease | Haplogroup | BETA | P |
RRMS (n = 325) | A1 | 0.382 | 0.602 |
A2 | 1.551 | 0.029 | |
B | −1.519 | 0.025 | |
CD (n = 147) | A1 | −0.137 | 0.940 |
A2 | −1.558 | 0.349 | |
B | 1.578 | 0.326 |
Note.—BETA is the regression coefficient.
Disease | Haplogroup | BETA | P |
RRMS (n = 325) | A1 | 0.382 | 0.602 |
A2 | 1.551 | 0.029 | |
B | −1.519 | 0.025 | |
CD (n = 147) | A1 | −0.137 | 0.940 |
A2 | −1.558 | 0.349 | |
B | 1.578 | 0.326 |
Disease | Haplogroup | BETA | P |
RRMS (n = 325) | A1 | 0.382 | 0.602 |
A2 | 1.551 | 0.029 | |
B | −1.519 | 0.025 | |
CD (n = 147) | A1 | −0.137 | 0.940 |
A2 | −1.558 | 0.349 | |
B | 1.578 | 0.326 |
Note.—BETA is the regression coefficient.
Discussion
RAC2 is a protein expressed in the hematopoietic cell lineage and is involved in signal transduction in various cellular processes, such as chemotaxis, cytoskeletal rearrangement, cellular differentiation, and proliferation (Heasman and Ridley 2008). Production of proinflammatory cytokines and cysteine cysteine (CC) chemokines, secretion of reactive oxygen species by neutrophils, and cyclooxygenase-2 (COX-2) production by macrophages are also modulated by RAC2 (Heasman and Ridley 2008). Thus, RAC2 is a key player in the inflammation process, acting at different levels, and being involved in multiple mechanisms of immunity.
The state of RAC2 expression influences the ability of this gene to trigger inflammatory responses. Some pathogens, in fact, downregulate RAC2 transcription to escape cellular innate immune response (i.e., respiratory burst activation), facilitating their intracellular survival (Carlyon et al. 2002). Similarly, both bacterial and viral pathogens were shown to have the ability to target the protein product of RAC2 and alter its activity (Janardhan et al. 2004; Chung et al. 2010; Groves et al. 2010). Among these pathogens, the SIV- and HIV-encoded Nef proteins interact with RAC2 to alter T-cell function (Janardhan et al. 2004). Notably, recent data indicate a significant association between chromosome microsatellite markers in 22q12-13 (where RAC2 is located) and resistance to HIV infection in HIV exposed but uninfected individuals (Kanari et al. 2005). An accurate analysis of this region allowed the identification of RAC2 gene SNPs that were significantly associated with the resistance phenotype displayed by exposed uninfected individuals and with the development of particularly potent HIV-specific cell-mediated immunity (Kanary Y, Hakata Y, et al. 2012).
In general, these observations indicate that RAC2 may be involved in host–pathogen genetic conflicts; consequently, the gene may represent a target of natural selection and harbor variants that affect the susceptibility to infectious diseases or other immunological phenotypes in humans. Nonetheless, inspection of available databases revealed no polymorphic amino acid variation segregating at detectable frequency in human populations, suggesting that functional polymorphisms in RAC2, if present, may occur within regulatory noncoding regions. We therefore exploited interspecific genetic diversity data to identify gene regions that may harbor functional noncoding sequences and analyzed their evolutionary pattern in three human populations. Our data indicate that the genomic portion covering the 3′ gene region displays extremely high nucleotide diversity, an excess of intermediate frequency alleles, and a higher level of within-species diversity compared with interspecific divergence, as assessed by the MLHKA test.
All these features strongly suggest the action of balancing selection. In particular, analysis of haplotype genealogy indicated the presence of three clades, which is consistent with a model of multiallelic balancing selection. This observation is in line with values of SFS-based statistics (Tajima’s D and Fu and Li’s F* and D* statistics), which are not strikingly positive, as the skew towards intermediate frequency variants tends to be less marked in a multiallelic selection model than in the case of biallelic selection. Calculation of the TMRCA for the RAC23′ haplotype genealogy yielded and estimate of 2.54 My, a coalescence time deeper than those estimated for most neutrally evolving autosomal human loci, which range from 0.8 to 1.5 My (Garrigan and Hammer 2006). Overall, these data strongly suggest that long-term balancing selection has maintained distinct functional alleles in RAC2. Inspection of polymorphisms located along the major branches of the genealogy indicated that several variants in DNaseI hypersensitive sites divide clade A and B haplotypes. In particular, variants rs933222, rs933221, rs4821610, and rs4821609 occur within a 600 bp portion located downstream the transcription end site; this region displays histone marks associated with enhancers (H3K4Me1 and H3K27Ac) in lymphoblastoid cell lines, suggesting that one or a combination of these SNPs alter RAC2 transcriptional activity and represent functional variants. Experimental analyses will be required to evaluate this possibility. Similarly, rs9798725 (immediately downstream the transcription end site) occurs within a sequence conserved in mammals and separates the two major clades of the phylogeny.
It is worth mentioning that evolutionary scenarios different from balancing selection might account for the results we obtained. One alternative possibility is that the high nucleotide diversity we observed at the RAC23′ gene region is the result of a relaxation of selective constraint (which allows accumulation of new mutations). Yet, the region was selected because of the presence of DNaseI hypersensitive sites and sequence elements conserved across mammals and primates. Thus, relaxation of functional constraints would have occurred recently, with the only involvement of human populations. Moreover, nucleotide diversity in the region is extremely high, ranking above the 98th percentile in the distribution of reference windows. These latter are randomly drawn from resequenced genes and most of them cover intronic regions, where functional constraints are expected to be relatively relaxed; thus, we believe that the high diversity at RAC23′ is more likely to result from a selective process. Indeed, an alternative explanation to balancing selection is diversifying selection, which refers to a situation whereby genotypes are favored merely because they are different and therefore results in the maintenance of multiple alleles. This is an interesting possibility that might fit the evolutionary history of a gene involved in immune response that, as mentioned above, is targeted by different pathogen species. Yet, the molecular signatures of balancing and diversifying selection are difficult to disentangle, and population genetics analyses only provide a snapshot of a dynamic evolutionary process, making it difficult to precisely determine the underlying selective regime. For example, although FST was not exceptionally high, clear differences can be appreciated in the distribution of RAC2 haplotypes across ethnic groups, suggesting that locally exerted selective pressures might have been acting during a more recent time frame.
From a biological perspective, one possibility is that the three major RAC2 haplotypes are differentially active in distinct cell types or modulate gene expression in response to diverse stimuli. A similar hypothesis has been proposed to explain long-term balancing selection in the promoter region of major histocompatibility complex class II genes (Cowell et al. 1998; Beaty et al. 1999; Loisel et al. 2006). This view is consistent with the recent demonstration (Dimas et al. 2009) that the majority of variants affecting gene regulation in the human genome are cell-type specific, and that cell-specific regulatory elements tend to localize relatively distant from the transcription start site. An alternative possibility is that distinct alleles may have been selected to prevent the down-modulation or misregulation of RAC2 transcription exerted by pathogens (Carlyon et al. 2002). Under these scenarios, the selective pressure acting on RAC2 is expected to be pathogen driven. Yet, other authors (Ferrer-Admetlla et al. 2008) have recently suggested that the maintenance of genetic diversity at immune response loci may result from the need to balance protection against invading pathogens with maintenance of self-tolerance. Indeed, our data indicate that haplotypes in RAC2 associate with predisposition to MS and CD. Specifically, haplotypes belonging to clade B were significantly more common in patients compared with controls, and in RRMS patients, the presence of B haplotypes correlated with an earlier presentation of disease symptoms. Growing evidences in recent years have indicated that a portion of susceptibility alleles is shared among two or more autoimmune conditions (reviewed in Zenewicz et al. 2010). This observation has been interpreted in terms of common disease pathways being involved in the pathogenesis of autoimmune disorders and partially explains the cooccurrence of distinct autoimmune conditions in patients and families. Indeed, several studies have indicated that some degree of comorbidity is observed between MS and CD in affected individuals and their family members (Weng et al. 2007; Nielsen et al. 2008; Langer-Gould et al. 2010). Therefore, our association of the same haplotype with susceptibility to both MS and CD can be regarded as a strong confirmation of the causal role of RAC2 variants in the pathogenesis of autoimmune diseases.
As mentioned above, RAC2 is thought to have a central role in the regulation of RICD, a mechanism that contributes to the maintenance of CD4+ T-cell tolerance. Physiologically, RICD occurs more often with self rather that foreign antigens and is therefore considered as a “propriocidal” form of cell death that limits T-cell expansion in the presence of persistent antigen (reviewed in Snow et al. 2010). In this light, gene products involved in RICD can be regarded as excellent candidates to carry variants associated with autoimmune diseases. Indeed, polymorphisms in Fas and FasL have previously been associated with susceptibility to MS (Zayas et al. 2001; van Veen et al. 2002; Kantarci et al. 2004; Lucas et al. 2004), and a recent genome-wide association study for primary sclerosing cholangitis (Melum et al. 2011), an autoimmune conditions that often occurs in IBD patients, identified an SNP close to the BCL2L11 gene (also known as BIM), another key player in the elicitation of RICD (Snow et al. 2010). Therefore, our data are in agreement with the conundrum whereby distinct susceptibility variants for a given trait may occur in genes that participate in a shared molecular pathway.
Despite the fact that our data indicate RAC2 as a susceptibility gene for autoimmune diseases, it remains to be evaluated whether the predisposition to autoimmunity can be regarded as a selective pressure during the evolutionary history of humans. In fact, Plenge (2010) reported that the prevalence of several autoimmune disorders, including MS, is increasing in human communities (at least in developed countries) but their contribution to population fitness throughout human history, although unknown, can hardly be regarded as comparable to that of infectious diseases. Therefore, we tend to favor a model whereby balancing selection in the RAC2 regulatory 3′ gene region is pathogen driven, and the resulting maintenance of susceptibility alleles for CD and MS can be regarded as a by-product of long-term selection (reviewed in Sironi and Clerici 2010). Indeed, pathogen-driven balancing selection has previously been shown to maintain a subset of susceptibility alleles for IBD in human populations (Fumagalli, Pozzali, et al. 2009).
In summary, data herein show that integration of different approaches can provide valuable insight into the location of putative functional variants and on haplotype structure, which in turn can be applied to association studies. We describe for the first time an association between specific haplotypes of the RAC2 gene and autoimmunity. Therapeutic approaches aimed at inhibiting or, at least, decreasing RAC2 expression and/or activity could be beneficial in these patients. Statins were shown to interfere with RAC2 activity, limiting both T-cell activation and COX-2 macrophages production (Brinkkoetter et al. 2006; Habib et al. 2007). Notably, recent results indicated that atorvastatin, a statin, has beneficial antiinflammatory effects in CD patients (Grip et al. 2008).
Therefore, these results, although needing validation in bigger cohorts, suggest that RAC2 plays a potentially important role in the immunopathogenesis of autoimmune disease, possibly as a consequence of the ability of this gene to regulate T lymphocytes activity. Modulation of RAC2 activity might be a novel potential therapeutic target in patients with autoimmune diseases.
M.C. is supported by grants from The Eli and Edythe Broad Foundation (IBD-0294), Istituto Superiore di Sanita' “Programma Nazionale di Ricerca sull’ AIDS”, the EMPRO and AVIP EC WP6 Projects, the nGIN EC WP7 Project, the Japan Health Science Foundation, 2008 Ricerca Finalizzata (Italian Ministry of Health), 2008 Ricerca Corrente (Italian Ministry of Health), Progetto FIRB RETI: Rete Italiana Chimica Farmaceutica CHEM-PROFARMA-NET (RBPR05NWWC), and Fondazione CARIPLO.
References
Author notes
These authors contributed equally to this work.
Associate editor: Willie Swanson