Tail Wags the Dog? Functional Gene Classes Driving Genome-Wide GC Content in Plasmodium spp.

Abstract Plasmodium parasites are valuable models to understand how nucleotide composition affects mutation, diversification, and adaptation. No other observed eukaryotes have undergone such large changes in genomic Guanine–Cytosine (GC) content as seen in the genus Plasmodium (∼30% within 35–40 Myr). Although mutational biases are known to influence GC content in the human-infective Plasmodium vivax and Plasmodium falciparum; no study has addressed how different gene functional classes contribute to genus-wide compositional changes, or if Plasmodium GC content variation is driven by natural selection. Here, we tested the hypothesis that certain gene processes and functions drive variation in global GC content between Plasmodium species. We performed a large-scale comparative genomic analysis using the genomes and predicted genes of 17 Plasmodium species encompassing a wide genomic GC content range. Genic GC content was sorted and divided into ten equally sized quantiles that were then assessed for functional enrichment classes. In agreement that selection on gene classes may drive genomic GC content, trans-membrane proteins were enriched within extreme GC content quantiles (Q1 and Q10). Specifically, variant surface antigens, which primarily interact with vertebrate immune systems, showed skewed GC content distributions compared with other trans-membrane proteins. Although a definitive causation linking GC content, expression, and positive selection within variant surface antigens from Plasmodium vivax, Plasmodium berghei, and Plasmodium falciparum could not be established, we found that regardless of genomic nucleotide composition, genic GC content and expression were positively correlated during trophozoite stages. Overall, these data suggest that, alongside mutational biases, functional protein classes drive Plasmodium GC content change.


Introduction
The genus Plasmodium is formed of vector-borne obligate parasites capable of infecting a wide range of vertebrates. Notably, five species-Plasmodium vivax, Plasmodium knowlesi, Plasmodium malariae, Plasmodium ovale, and Plasmodium falciparum-cause disease in humans (malaria). Despite control and eradication efforts, malaria continues to have an enormous impact on global health. In 2015 alone, the World Health Organization reported an estimated 1 million new malaria cases and over 400,000 malaria-related deaths (WHO 2016). Furthermore, Plasmodium parasites negatively affect endemic biodiversity. Specifically, the introduction of Plasmodium relictum to the Hawaiian islands has caused significant declines in native bird population size (Atkinson et al. 2010;Samuel et al. 2015Samuel et al. , 2017. In conjunction with their significance as human and animal pathogens, Plasmodium parasites are also valuable models for studying fundamental aspects of genome sequence evolution. Perhaps one of the most noteworthy features of Plasmodium genomes is their rapid change in genomic Guanine-Cytosine (GC) content. Within the span of $35-40 Myr (Pacheco et al. 2011), GC content has increased by roughly 30% across the genus. For example, ancestral species such as Plasmodium gaboni have a genomic GC content of 17.85%, whereas more recently divergent species such as P. vivax have a genomic GC content of 44.87% (Carlton et al. 2008;Sundararaman et al. 2016). Interestingly, coding sequence has a higher GC content than noncoding sequence in all Plasmodium genomes thus far sequenced (Castillo et al. 2018).
It has been suggested that mutational biases are driving this genome-wide nucleotide composition change. In a study using mutation accumulation laboratory lines, researchers proposed that an overrepresentation of spontaneous A/T substitutions has pushed the P. falciparum genome sequence toward higher AT content (Hamilton et al. 2016). Species closely related to P. falciparum (Laverania subgenus) are equally AT rich, implying that the substitution bias predates the divergence of the Laverania group (Hamilton et al. 2016). It has been hypothesized that a later shift in this mutational bias occurred sometime during Plasmodium evolution. Explicitly, P. vivax is thought to be subject to increased GCbiased heteroduplex repair and decreased AT-biased mutation (Nikbakht et al. 2015); however, the exact evolutionary mechanisms driving this shift are unknown. Although mutational biases have a clear role in shifting Plasmodium GC content, several unanswered questions remain. The evolutionary origin, causes, and mechanisms behind these mutational biases remain poorly understood. Furthermore, we lack insight in how the extreme changes in nucleotide composition influence transcription, translation, and protein structure across orthologs, and how different gene functions might favor or be resilient to nucleotide changes.
In other biological systems, natural selection is thought to have partly driven changes in GC content within certain protein families. For instance, in bacterial genomes, the core genome (genes present in all evaluated strains) is significantly more GC rich than the accessory genome (genes absent in certain strains), largely due to purifying selection constraining GC content on core genes (Bohlin et al. 2017). In humans, functional classes such as "Transcription," "Signaling and transduction," "Carbohydrate transport and metabolism," "Amino acid transport and metabolism," and "Inorganic ion transport and metabolism" have higher average GC content at the third nucleotide position (GC 3 ) than the rest of the genome (D'Onofrio et al. 2007). Other studies have also suggested that lineage-specific differences in genomic GC content might be adaptive to environmental changes (Luo et al. 2015). For example, in plant monocots, it is believed that GC content may have shifted in response to desiccation stress in cold and dry climates ( Smarda et al. 2014). These studies clearly show that in certain lineages and gene functional classes, changes in GC content can be adaptive.
Compared with other eukaryotes, the shift in nucleotide composition seen in the genus Plasmodium is unparalleled. Previous evidence shows that Plasmodium parasites have developed adaptive strategies to successfully infect both members of their complex life cycle: the Anopheles vector and the vertebrate hosts (Molina-Cruz and Barillas-Mury 2014; Scully et al. 2017). Moreover, frequent vertebrate host-switches during Plasmodium evolution have also created an opportunity for adaptive change to occur (Stoltzfus and McCandlish 2017). Whether such adaptive change could influence the genome-wide compositional change seen in the genus Plasmodium remains to be fully explored.
Within the Plasmodium genus, certain protein families are strongly affected by natural selection. For example, Plasmodium antigen evolution is thought to occur under intense selective pressure imposed by host immunity (Loy et al. 2017). Among the proteins interacting with the vertebrate immune systems, variant surface antigens (VSAs) have a critical role in parasite virulence and immune evasion. VSAs are gene families characterized by multiple exons, accelerated sequence evolution, rapid gene turnover, and expression encompassing the trophozoite through schizont stages of the Plasmodium intraerythrocytic developmental cycle (Chen et al. 1998;Bozdech et al. 2008;Frech and Chen 2013;Yam et al. 2016;Martins et al. 2017). Even though they might share a common evolutionary origin (Janssen et al. 2002(Janssen et al. , 2004, VSA family members are largely species specific and share little sequence similarity between species. VSAs include species-specific families such as the Plasmodium Interspersed Repeats (pir) family: P. vivax (vir) and Plasmodium berghei (bir); the P. falciparum-specific Repetitive Interspersed Family (rif) coding for the rifin protein; the Subtelomeric Variant Open Reading frame (stevor) family; and var genes encoding the Erythrocyte Membrane Protein 1 (PfEMP1) family. Thus, the VSA proteins may be ideal candidates for driving genome sequence evolution in the genus Plasmodium.
Here, we evaluate the hypothesis that the rapid shift in genomic GC content seen in Plasmodium parasites is driven, in part, by adaptive changes in certain functional classes of genes. We characterized the contribution of genes belonging to different functional groups to the overall variation of coding GC content seen across Plasmodium lineages. Among all assessed groups, we found that trans-membrane proteins, particularly VSAs, had extreme GC content values compared with other functional classes within a species. Trans-membrane proteins represent $20-30% of protein coding genes in most Plasmodium species (e.g., P.
Plasmodium species were organized ( fig. 1a) according to their phylogenetic relations (Pacheco et al. 2011). CoGe's GenomeList tool was used to calculate the average genomic, coding, and GC 3 content across Plasmodium species ( fig. 1b). The CodeOn tool was applied to each genome sequence to characterize the distribution of coding GC content per amino acid. This information was used to estimate the speciesspecific contribution of individual coding sequences (CDS) to variations in GC content. The number of CDS in 5% GC content increments was calculated for each Plasmodium species and normalized by the species with the highest number of reported CDS: Plasmodium fragile ( fig. 1c). The kurtosis, skewedness, deviation from normality (Shapiro-Wilk's normality test), mean, and standard deviation were estimated for each species using RStudio v3.3 (supplementary table S2, Supplementary Material online). Additionally, the genic GC content for the entire gene repertoire was programmatically accessed from CoGe's database through CoGe's REST API (version 1.0) and using custom Python and shell scripts (https://tinyurl.com/y996ygrf; last accessed February 2, 2019).

GC Content Distribution for Different Functional Classes
Genes were sorted by their GC content and divided into ten equally sized quantiles ($500-600 genes) within each Plasmodium species ( fig. 2). The first quantile (Q1) was composed of genes with lowest GC contents and the last quantile (Q10) was composed of genes with highest GC contents. Intermediate quantiles (Q2-Q9) were composed of genes with incrementally higher GC content values. We identified functional annotation clusters that were overrepresented (enriched) within a given GC quantile using the Functional Classification Tool included in the Database for Annotation, Visualization, and Integrated Discovery (DAVID v6.8). DAVID (Huang et al. 2009) was used to identify and group genes with similar annotated functionality. Functional enrichment analyses were performed on Plasmodium genomes with annotations and ENTREZ IDs available on NCBI (supplementary tables S3-S5, Supplementary Material online). The ENTREZ IDs were found for all genes within each quantile (Q1-Q10) and used as an input for DAVID's functional clustering. A variable number of annotation clusters were generated based on the grouped functional categories identified on each quantile. Clusters were organized from those most overrepresented or with higher enrichment scores (ESs) (Annotation Cluster 1) to those least overrepresented or with lowest ESs.
Clusters defined by the "Trans-membrane/Integral component of membrane" terms tended to have higher ESs in quantiles Q1 and Q10 and thus were analyzed in further detail ( fig. 3). The identifying functions of the genes belonging to these annotation clusters were obtained from PlasmoDB using their ENTREZ IDs as queries. Genes belonging to the same protein families or performing similar functions were grouped in sets (supplementary table S6, Supplementary Material online), and their GC content distribution was tabulated. Within the genes identified inside the "Trans-membrane/Integral component of membrane" cluster, members of the VSA multigene families showed the most extreme GC content values. For each species, the differences in GC content distribution between VSA families and an equally sized randomized sample taken from the rest of the coding genome were evaluated with a Wilcoxon rank sum test in RStudio v3.

GC Content Distribution and Natural Selection on VSA Families
We assessed if there were signs of positive selective pressure that could hypothetically influence GC content within these families. Species-specific multiple sequence alignments were performed on the following VSA families: vir (P. vivax Salvador-1 strain [Carlton et al. 2008]); bir (P. berghei ANKA strain [Khasawneh and Kornreich 2005]); and stevor, rifin, and var (P. falciparum 3D7 strain [Gardner et al. 2002]).
Codon alignments for the complete CDS of VSA sequences were performed using the MUSCLE algorithm (Edgar 2004) found on MEGA v7 (Kumar et al. 2016). Pseudogenes and incomplete CDS (e.g., sequences missing an exon) were excluded. Maximum likelihood (ML) trees were generated for each VSA family alignment using PhyML (Guindon et al. 2010). Each ML tree used a general time reversible model and 500 bootstrap replicates to evaluate node confidence

GC Content Distribution and Gene Expression during the Trophozoite Life Stage
The relationship between gene expression and GC content was evaluated in the three Plasmodium species with publicly available RNA-Seq data: P. vivax strain Salvador-1 (Zhu et al. 2016), P. berghei strain ANKA (Otto et al. 2014), and P. falciparum strain 3D7 (Otto et al. 2010). A significant portion of the Plasmodium life cycle is spent consecutively infecting and replicating within vertebrate's erythrocytes (intraerythrocytic developmental cycle). During this phase, Plasmodium parasites are metabolically dependent on the infected erythrocyte and are exposed to host immunity. VSA expression peaks in the Plasmodium intraerythrocytic developmental cycle during the trophozoite stage (Chen et al. 1998;Bozdech et al. 2008;Martins et al. 2017). Thus, the Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values for the trophozoite stage were downloaded from PlasmoDB and log 10 transformed. A scatterplot of the genic GC content and log 10 FPKM values was created in RStudio v3.3, and the relation between these two variables was evaluated with a Pearson correlation test. This analysis was conducted both for VSA family members (
Given their contribution to extreme GC content values in many Plasmodium species, genes with an associated "Transmembrane/Integral component of membrane" term were further clustered based on described gene function ( fig. 3). For all evaluated Plasmodium species, the largest clusters were composed of trans-membrane proteins with unknown or  fig. 3a-c). Plasmodium vivax's VSA family members (vir) had lower GC content values compared with other "Trans-membrane/Integral component of membrane" associated genes (t ¼ À29.83, P value ¼ 2.2 e-16, fig. 3a). In contrast, P. falciparum VSA families (var, stevor, and rifin) had higher GC content values in comparison to the rest of "Trans-membrane/Integral component of membrane" associated genes (t ¼ 40.31, P value ¼ 2.2 e-16, fig. 3c). A similar trend was observed for VSA family members in taxa related to P. vivax and P. falciparum, respectively. Specifically, VSA family members found in P. cynomolgi (sister taxa to P. vivax) also had lower coding GC values compared with the rest of the purifying selection (Cornejo et al. 2014). Moreover, purifying selection is thought to be intensified by the complex Plasmodium parasitic life cycle (Chang et al. 2013). Contrary to the rest of the genome, positive selection has been detected in genes related to the development of antimalarial drug resistance, parasite development in mosquitoes, and variant-expressed multigene families (Park et al. 2012;Cornejo et al. 2014;Duffy et al. 2015). Among these, VSA families are thought to evolve under strong host imposed immune-response selective pressures (Loy et al. 2017); thus, putative signatures of positive selection within VSA family trees were evaluated. In addition, because variations in GC content are known to affect gene expression (Newman et al. 2016), the expression for these gene families was also examined using publicly available RNA-Seq data for the trophozoite stage of P. vivax, P. berghei, and P. falciparum (Otto et al. 2010(Otto et al. , 2014Zhu et al. 2016).
The correlation between VSA genes under positive selection and their expression was assessed (figs. 4 and 5). Expressed genes and phylogenetic branches showing significant signs of positive selection were interspersed across both phylogeny and GC content values ( fig. 4a-c). Significant signs of positive selection were found in the var family tree ( fig. 4c) in accordance with previous reports (Plotkin et al. 2004). However, this family also presents the largest number of identified recombination breakpoints compared with other VSAs. It has been suggested that both forces act simultaneously in var genes, with mitotic recombination facilitating gene diversity, and positive selection favoring gene variants better equipped to evade host's immune responses (Bopp et al. 2013;Claessens et al. 2014). Genic GC content was significantly higher for VSA P. falciparum genes showing signs of positive selection compared with nonselected genes (t ¼ 9.893, P value ¼ 6.76 e-16). It was not possible to compare this trend in P. berghei or P. vivax due to the small number of sequences with signatures of positive selection detected. VSA GC content and expression during the trophozoite stage was positively correlated in both P. falciparum (r ¼ 0.282, P value ¼ 1.06 e-5) and P. vivax (r ¼ 0.364, P value ¼ 3.36 e-8), but not in P. berghei (fig. 5). Similarly, a significant and positive correlation between GC content and expression during the trophozoite life cycle stage was observed when examining all genes in all three species (supplementary fig. S9a-c, Supplementary Material online).

Discussion
Here, a large-scale comparative genomic approach was used to examine factors influencing observed shifts in genomic GC content within the Plasmodium genus ( fig. 1 and supplementary table S7, Supplementary Material online). In agreement with previous reports (Castillo et al. 2018), coding GC content in all Plasmodium spp. was found to be elevated in comparison to genomic GC content. An interesting pattern where GC 3 content was higher than genomic and coding GC content in the simian clade, but lower in the rodent clade, Laverania subgenus, and the ancestral lineage, P. relictum, was also observed. In eukaryotes, protein expression (Yadav and Swati 2012), function (Louie et al. 2003), and structure (Basile et al. 2017) have been known to limit coding GC content, partly due to selective constraints. These selective constraints may result from mutational biases, such as the elevated rate of G/C to A/T transitions compared with the rate of A/T transitions to G/C observed in P. falciparum mutation accumulation lines (Hamilton et al. 2016) or the elevated GC 3 content in P. vivax (Nikbakht et al. 2015). The shifts in GC 3 content observed here suggest that putative differences in mutational bias are not limited to either P. vivax or P. falciparum, but instead show a strong phylogenetic pattern and likely originate from ancestral events during the evolution of the genus. Furthermore, the bimodal GC 3 distributions observed in some Plasmodium species (supplementary fig. S10, Supplementary Material online) hint to groups of genes with distinct mutational biases or selective pressures within a single species.
Genes with associated GO-terms, "Trans-membrane/ Integral component of membrane" was the dominant GOterm recovered for most GC content quantiles ( fig. 2). Moreover, this GO-term, which contains genes essential for parasite survival (Sullivan and Krishna 2005;Curtidor et al. 2008;Kirk and Lehane 2014), was highly enriched in boundary GC content quantiles (Q1 and Q10), especially in species with extreme nucleotide compositions. During erythrocyte invasion and intraerythrocytic development (Curtidor et al. 2008), some Plasmodium integral membrane proteins are exposed to the immune system, thus providing an opportunity for strong positive selection. In addition, proteins with trans-membrane motifs evolve significantly faster than other Plasmodium genes (Carlton et al. 2008). A further examination of this GO-term class revealed that, with some exceptions, GC content was particularly skewed to the extremes in the VSA family of trans-membrane proteins, though a significant number of trans-membrane proteins are annotated as "unknown/hypothetical proteins" (fig. 3). Similar to our observation in species of the subgenus Laverania, in Haemoproteus tartakovskyi (a bird-infective obligate parasite closely related to the genus Plasmodium), VSA family members are enriched for higher GC content relative to other coding genes (Bensch et al. 2016). In contrast, VSA family members of the simian clade harbored a lower GC content relative to the rest of the genome sequence ( fig. 3). Thus, regardless of the species, it is clear that VSA genes display GC contents that seem to be at odds with the rest of the genome sequence. This suggests that host-related selective pressures drive rapid sequence evolution, which may drive genomic GC content change in those genes. Nonetheless, it remains unclear if the selection acting on VSA is driving species-level shifts in genomic GC content, or if GC content on these genes is simply lagging behind the rest of the genome sequence.
Attempts to address this question were confounded by the lack of a clear relationship between VSA GC content and signatures of selection at the genic level ( fig. 4a-c). Significant signs of positive selection were interspersed in both internal and terminal branches of each phylogeny, regardless of genic GC content at the tips. This suggests that the enrichment of VSA genes in extreme GC quantiles is not a result of selection along specific phylogenetic branches of gene paralogs. The evidence of selection revealed is likely linked to specific gene functions instead and may hint at rapid diversifying selection operating on many genes simultaneously. An alternative explanation for why VSA GC content appears to be evolving in the opposite direction of other genes may be found in nonadaptive forces such as recombination, GC-biased gene conversion, and local nucleotide biases. Both mitotic recombination (Bopp et al. 2013;Claessens et al. 2014) and variation in local nucleotide composition (Carlton et al. 2008;Tachibana et al. 2012) have been considered as potential factors mediating VSA evolution and might result in the trends observed here.
Interestingly, with the exception of P. knowlesi (Pain et al. 2008), P. coatneyi (Kissinger et al. 2016), and species from the subgenus Laverania, VSA families are located in subtelomeric chromosome regions (Gardner et al. 2002;Carlton et al. 2008;Tachibana et al. 2012;Rutledge et al. 2017). Nucleotide composition in these regions is strongly AT biased in P. vivax and P. cynomolgi (Carlton et al. 2008;Tachibana et al. 2012), but not so in other Plasmodium species (supplementary fig. S11, Supplementary Material online). Significant changes in VSA nucleotide composition compared with the rest of the coding genome were observed in Plasmodium species with VSAs clustered in the subtelomeric chromosome regions as well as in species with VSAs distributed across the chromosome length (e.g., P. falciparum and P. knowlesi). In species with significantly GC-poor subtelomeres (P. vivax and P. cynomolgi), VSAs were the most numerous protein group within 30 kb of the chromosome ends. In addition, the GC content distribution within this region emulated that previously observed for the Trans-membrane/Integral component of membrane groups, specifically VSAs (figs. 2a and 3a). On the other hand, in GC-rich Plasmodium species with clustered subtelomeric VSAs (e.g., P. malariae) and species with scattered VSAs (e.g., P. knowlesi and P. coatneyi), the GC content distribution of CDS within the same 30 kb region emulated that observed on the entire coding genome (supplementary fig. S12, Supplementary Material online). Nonetheless VSAs in these species had significantly smaller GC content differences compared with the rest of the genome (supplementary fig.  S7, Supplementary Material online). Overall, though there are some species-specific variations, the GC content of VSA family members remains at odds with the GC content of the rest of the genome sequence, irrespective of their genomic location.
Whether the lower GC content observed in P. vivax and P. cynomolgi VSA families originates from unique AT-biased mutation patterns within these regions, or if VSA-specific processes drive subtelomeric GC content to lower values is something that remain inconclusive. Regions of lower amino acid diversity compared with other coding regions have previously been described in P. falciparum (Zilversmit et al. 2010). Moreover, they have been commonly described in intragenic regions of VSA families across Plasmodium species (Janssen et al. 2004;Joannin et al. 2008). Recent studies have found that regions of low complexity tend to remain unchanged over evolutionary time and that composition preferences in the region are unrelated to compositional biases observed in the rest of the genome (Chaudhry et al. 2018). Furthermore, the authors suggest that natural selection might play a key role in determining the amino acid composition in regions of low complexity (Chaudhry et al. 2018). Therefore, it is possible that the nucleotide composition differences in VSAs compared with the rest of the coding genome might be the product of selective forces acting on low complexity region across Plasmodium species. Previous work showed that highly expressed genes favor usage of AT-ending codons in both P. falciparum and P. vivax (Yadav and Swati 2012). Thus, the positive correlation between VSA expression and their GC content observed in P. vivax and P. falciparum during the trophozoite stage of the Plasmodium life cycle is noteworthy ( fig. 5) because it contradicts the general trend observed across all genes. In P. vivax and Plasmodium chabaudi (Khasawneh and Kornreich 2005), some VSAs mediate cytoadherence (Bernabeu et al. 2012), rosette formation, and cell invasion (Yam et al. 2016) and are trafficked to different cellular compartments. A preference for higher GC content in certain VSA family members may thus reflect differential subcellular erythrocyte localization and their unique functions (Bernabeu et al. 2012). In addition, intrafamily differences in GC content might be connected to a proposed strategy whereby P. falciparum alters expression patterns of VSA genes to avoid immunological detection and promote survival (Abdi et al. 2016).
Previous studies have led to contradictory conclusions regarding the genome-wide relation between GC content and gene expression. Although some studies show a negative correlation between coding GC content and gene expression in eukaryotes (Rao et al. 2013), others have described a positive correlation between both factors (Kudla et al. 2006). Despite the drastic differences in GC content observed within the Plasmodium genus, all tested species displayed a preference for high GC content during the trophozoite stage, which aligns with the latter model (supplementary fig. S9, Supplementary Material online). Previous studies have proposed that Plasmodium transcription is highly optimized and that most genes function at multiple points in the life cycle (Bushell et al. 2017). Therefore, it is possible that Plasmodium genic GC content changes could be related to mRNA transcription and processing efficiency. Alternatively, although other studies suggest that translational selection has a limited role in shaping global codon and amino acid usage bias within the genus Plasmodium; the correlation between amino acid frequency of highly expressed genes and their respective tRNA is slightly higher in some species (Yadav and Swati, 2012). Therefore, it would be of interest to further evaluate this pattern specifically in the VSA gene family. Given the importance of efficiently expressing VSA genes, their translational efficiency may be driving genome-wide changes in GC content. Indeed, the evidence of biased GC content in the subtelomeric genomic regions harboring VSA genes (e.g., P. vivax) versus the rest of the genome may be a footprint of this effect. Although the presented data do not refute that hypothesis, additional populationlevel genomic and functional surveys as well as long-term evolution studies are needed in Plasmodium. Ideally, an artificial selection scheme similar to the long-term adaptive evolution studies in Escherichia coli (Blount et al. 2012) would be performed to definitively determine if the tail wags the dog for genomic GC composition evolution in Plasmodium species.
In sum, large-scale comparative genomic analyses are powerful means to understand how genome sequences evolve, particularly regarding genome-wide GC content shifts. Nucleotide compositional changes are crucial in genome evolution, phylogenetic reconstruction, and in evaluating expression and translation dynamics. Plasmodium genomes are incredibly valuable models to explore the causes and effects of such nucleotide composition shifts. Moreover, the availability of many sequenced genomes within the genus provides a unique resource to better understand the forces driving rapid shifts in genomic GC content. When attempting to understand what is driving differences in genomic GC content within the Plasmodium genus, these data suggest that, alongside mutational biases, functional protein classes drive GC content changes across the genome.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.