Pervasive Selection against MicroRNA Target Sites in Human Populations

Abstract MicroRNA target sites are often conserved during evolution and purifying selection to maintain such sites is expected. On the other hand, comparative analyses identified a paucity of microRNA target sites in coexpressed transcripts, and novel target sites can potentially be deleterious. We proposed that selection against novel target sites pervasive. The analysis of derived allele frequencies revealed that, when the derived allele is a target site, the proportion of nontarget sites is higher than expected, particularly for highly expressed microRNAs. Thus, new alleles generating novel microRNA target sites can be deleterious and selected against. When we analyzed ancestral target sites, the derived (nontarget) allele frequency does not show statistical support for microRNA target allele conservation. We investigated the joint effects of microRNA conservation and expression and found that selection against microRNA target sites depends mostly on the expression level of the microRNA. We identified microRNA target sites with relatively high levels of population differentiation. However, when we analyze separately target sites in which the target allele is ancestral to the population, the proportion of single-nucleotide polymorphisms with high Fst significantly increases. These findings support that population differentiation is more likely in target sites that are lost than in the gain of new target sites. Our results indicate that selection against novel microRNA target sites is prevalent and, although individual sites may have a weak selective pressure, the overall effect across untranslated regions is not negligible and should be accounted when studying the evolution of genomic sequences.


Introduction
MicroRNAs are small endogenous RNAs that can regulate virtually any type of biological process. Following their discovery in humans this century (Pasquinelli et al. 2000), there are now over 2,500 human microRNA precursors annotated in miRBase (Kozomara and Griffiths-Jones 2014), although <900 are classified with high confidence. Soon after microRNAs were found in multiple animal species (Lagos-Quintana et al. 2001;Lau et al. 2001;Lee and Ambros 2001), the first target prediction tools became available (Enright et al. 2003;Lewis et al. 2003;Stark et al. 2003). Only in the last few years have these developments permitted the evolutionary analysis of target sites (Farh et al. 2005;Grün et al. 2005;Lewis et al. 2005;Stark et al. 2005;Sood et al. 2006) revealing that many microRNA target sites are highly conserved among species. In contrast, although some microRNA families have been conserved for millions of years, their targets appear to differ between species (Hui et al. 2013). Indeed, evidence from vertebrates suggests that gains and losses of target sites may be more important than changes in the microRNAs themselves during the evolution of microRNA-based gene regulation, as microRNA genes are usually highly conserved (see Discussion in Marco 2018a).
Several studies have found that gene transcripts are depleted of target sites for coexpressed microRNAs (Farh et al. 2005;Stark et al. 2005;Sood et al. 2006). In particular long 3 0 -UTRs might accumulate microRNA target sites by random mutation, yet they actually have a lower frequency than expected by chance, suggesting that there has been selection against these sequences (Farh et al. 2005). These missing sites have been called "anti-targets" (Bartel and Chen 2004;Farh et al. 2005). Interestingly, target sites for the same microRNAs tend to be conserved in transcripts expressed in neighboring tissues (Stark et al. 2005). These studies have shown that selection against microRNA target sites can be inferred from comparisons among distantly related species. However, the relative impact of selection against microRNA target sites in human populations is not known.
Analysis of human populations has suggested purifying selection was particularly strong at microRNA target sites, even in nonconserved sites (Chen and Rajewsky 2006;Saunders et al. 2007). Negative selection against gaining microRNA target sites has also been described in Yoruban populations, but the pattern was not detected in other populations (Chen and Rajewsky 2006). Correspondingly, we have previously found evidence of selection against microRNA target sites in a study in Drosophila populations (Marco 2015). Specifically, we found selection against target sites of maternal microRNAs in maternally deposited transcripts within the egg and early ambryo. More recently it has been shown that this effect is particularly strong for the mir-309 cluster, whose microRNAs are abundant in the egg and almost absent in the zygote (Zhou et al. 2018). Characterizing this type of selection in humans would reveal to which extent it shapes our genomes. However, the strength and prevalence of selection against target sites is human populations is still unknown.
Here we investigate single-nucleotide polymorphisms (SNPs) at human microRNA target sites and evaluated the impact of selection against target sites. To do so, we consider pairs of segregating alleles in which one of the allele as a target site and the other was not. Then we compare the allele frequency distributions with that of estimated background distributions to quantify the strength of selection for or against microRNA target sites.

Bias toward MicroRNA Nontarget Alleles in Human Populations
In order to investigate the selective pressures on microRNA target sites in human populations, we first mapped human SNPs to putative canonical microRNA target sites such that one allele is a target site and the alternative allele is not a target site (see Materials and Methods for details). The nontarget allele in this pair is called a "near-target" (Marco 2018b). We first compared the derived allele frequency (DAF) distribution of target sites for highly expressed microRNAs with a background (expected) distribution obtained by conducting the same analysis on the reverse complement sequences of 3 0 -UTRs (see Materials and Methods). We considered those SNPs for which the derived allele is the target allele. If the overall selective pressure is to fix new target sites (positive selection favoring new interactions), DAF values should be the higher than expected. On the other hand, if there is selection against microRNA target sites, the DAF distribution should be skewed toward smaller values ( fig. 1). We observe an overall excess of low frequency derived alleles ( fig. 2A; It is expected that a selective pressure on coexpressed microRNA/mRNA pairs will be tissue-specific. Therefore, we compared the DAF distributions as above for ten specific tissues, considering only interactions in which the microRNA and the potential target are coexpressed. For instance, for lung coexpressed microRNA/transcripts we found a DAF distribution significantly skewed toward the nontarget allele, indicating selective pressures against the target allele (one-tailed Kolmogorov-Smirnov test, P ¼ 0.00012; fig. 2B). For all tissues analyzed we observe a shift in the DAFs toward the nontarget allele (supplementary fig. 1 Genes can have alternative polyadenylation in different tissues, affecting microRNA target sites. However, in the context of microRNAs, this has been explored mostly in cell lines (Nam et al. 2014). We found alternative polyadenilation information for two of the tissues here investigated: blood and kidney (Müller et al. 2014). When we consider target sites whose 3 0 -UTR has been experimentally detected in those tissues, the shift to the nontarget allele is significant Alternatively, we can compare the allele frequency distributions between target sites for highly expressed microRNAs versus target sites for nonexpressed (nondetected) microRNAs in a given tissue. To do so we identified microRNAs with no read counts detected across multiple expression experiments (see Materials and Methods) and used their target sites as our background (expected) frequencies. Consistently with the results above, we found that for most analyzed tissues the DAF distribution when the ancestral allele is a nontarget is skewed to the nontarget allele (supplementary fig. 2 and supplementary table 2, Supplementary Material online). In summary, the distribution of allele frequencies shows evidence of selection against microRNA target sites in human populations. Again, the results are consistent when taking into account alternative polyadenylation in blood and kidney (supplementary table 2  Wobble pairing has been also observed in microRNA/transcript interactions. Under our hypothesis, selection against target sites will be weaker at sites in which the nontarget allele is a potential target with a wobble pairing (GU). Hence, we partitioned the data set of targets in those whose near-target allele is a wobble paired position and those which are not. We observed that for near-target-wobble the shift to the nontarget allele is not significant (0.682), whereas for the nonwobble there was a significant shift (0.

The Effect of MicroRNA Expression Levels and Evolutionary Conservation
We next considered the potential impact of microRNA conservation. On the one hand, evolutionarily conserved microRNAs may have a weaker effect on selection against microRNA targets, as partly deleterious target alleles may have been cleared from the population. On the other hand, evolutionarily conserved microRNAs tend to be highly expressed, and therefore it is expected that the selective pressure to avoid targets for such microRNAs should be stronger. In other words, expression and conservation are not independent to each other. We therefore included both factors in the analysis: the level of expression of the microRNA and a measure of the phylogenetic conservation of the microRNA sequence.
We analyzed cases where the derived allele is a target site (as in the previous section); testing whether frequency spectra were different across different levels of microRNA conservation (human-primate specific, conserved in mammals, and conserved in animals) and different levels of expression (low, mid, and high as described in Materials and Methods). To evaluate the joint effect of conservation and expression, we build a linear model of ranked independent variables with interactions (this is equivalent to the Scheirer-Ray-Hare test [Sokal and Rohlf 1995, pp. 445; see Materials and Methods]). The interaction term was not statistically relevant in any of the models (supplementary table 3, Supplementary Material online). In general, from the fitted model it is evident that expression has a significant impact in the microRNA selective avoidance, whereas conservation has no detectable impact, once the expression level has been taken into account ( More specifically, in figure 3 the contribution of expression and conservation to the ranks in the linear model are plotted for all ten tissues. The smaller the rank, the greater the shift to the ancestral nontarget allele in a comparison of DAFs. Overall, we observe that as the expression level increases the rank decreases, whereas the rank is similar for all three conservation levels ( fig. 3). The effect is particularly clear in lung, liver and kidney. For brain and breast, we did not find such an association ( fig. 3   Selection against MicroRNA Target Sites . doi:10.1093/molbev/msaa155 MBE concluded that expression level, rather than microRNA evolutionary conservation, determines the selective pressure against microRNA target sites.

Population Differentiation at Target Sites
If a microRNA target site is under selective constraints, we should expect differentiation among populations at these sites to be relatively low. To investigate this prediction, we grouped SNPs at target sites for highly expressed microRNAs according to their F st and compared the relative frequency of these SNPs compared with the background (see Materials and Methods). However, in figure 4A we observed an enrichment in high F st values. To further explore the relative contribution of microRNA target gains and losses during population differentiation we split the data set in two groups, depending on whether the target allele was ancestral or derived. Strikingly, polymorphic microRNA target sites where the ancestral allele is the target show high levels of population differentiation ( fig. 4A, red line). In contrast, for novel microRNA target sites there is a deficit of high F st SNPs compared with the background expectations ( fig. 4A, blue line). When we repeated the analysis for moderately expressed microRNA we found a similar pattern ( fig. 4B). This may reflect that positive selection driving the generation of novel microRNA target sites is negligible. Also, evolution by microRNA target site loss seems important in human populations. In table 1, we show SNPs in microRNA target sites with a F st >0.6. As suggested in figure 4A and B, in a majority of target sites with a high degree of population differentiation the target allele was ancestral (21 out of 35), many being probably target losses in out-of-Africa populations (12 out of 21). Predicted target sites for microRNAs with no detectable expression level did not show any significant level of population differentiation ( fig. 4C).
One of the SNPs in table 1 has been recently associated with skin color variation in Indian populations: rs2470102 (Sarkar and Nandineni 2018). Interestingly, this is in a predicted a target site for miR-1180-3p in MYEF2. This gene has been associated with skin color as well, although the functional relationship is not clear (Mishra et al. 2017). We investigated a potential relationship between absolute latitude (distance to the Ecuador) and the frequency of the target allele and we found a strong association ( fig. 4D; P ¼ 0.0015, R-squared adjusted ¼ 0.4223). Given that the microRNA itself originated in primates (Arcila et al. 2014), and that the target allele is ancestral and conserved, the emergence of a new microRNA may have imposed a selective pressure to loss the target site in populations with less exposure to UV light. However interesting, these associations remain speculative at the moment, and demographic differences between populations may also have an impact in this observed association.
These results further suggest that loss rather than conservation of targets may be more frequent in population dynamics. Thus, we revisited the analysis of DAFs in figure 2 and considered this time target sites where the ancestral allele was a target site. If there is a detectable selective pressure to maintain target sites we will expect the DAF to be biased toward the target (ancestral) allele. In figure 5, we plot the bias (as the Hatlen and Marco . doi:10.1093/molbev/msaa155 MBE D-statistic in a Kolmogorov-Smirnov test) and the significance (as the log 10 of the P-value) and we observe that in this set the bias was not significant (blue dots). On the contrary, if we compare these results with those obtained for nontargets as ancestral sequences we observe a clear and significant biased toward the nontarget allele ( fig. 5).
In one case we found a microRNA with an SNP within its seed sequence (the region that determines the targeting property of the microRNA), which shows some evidence of population differentiation (rs7210937; F st ¼ 0.3314). In this case, the F st value between African and European populations is remarkably high (F st ¼ 0.6129; Nei's estimate [Nei 1986;Bhatia et al. 2013]). In European populations, 92.5% of sequenced individuals present the ancestral form of miR-1269b, whereas in African population, the derived version is more frequent (59.8%). As a shift in the seed sequence may have an impact on the evolution of 3 0 -UTRs, we further studied target sites whose ancestral form is a target for the derived miR-1269 microRNA (344 in total). Then we compared the frequency of the target site allele between European and African populations. We found that in African populations the frequency of target alleles is lower than in European populations at these sites (Wilcoxon nonparametric test for paired samples; P ¼ 0.021), whereas for the ancestral miR-1269b we did not find any significant difference. These results suggest that a shift in the allele frequencies affecting the seed sequence of a microRNA can have an effect on the allele frequencies at the novel target sites, specifically toward the nontarget allele.

Discussion
The study of allele frequencies has been extensively used to detect selective pressures in human populations (Chen and Rajewsky 2006;Barreiro et al. 2008;Enard et al. 2014). Here, we show that the patterns of allele frequencies at 3 0 -UTRs show evidence of selection against most microRNA target sites. First, the allele frequencies at target sites are biased toward the nontarget allele when the derived allele is a target sequence. These effects are strongest in the cases where the corresponding microRNAs are highly expressed, suggesting that interaction between the microRNA and the target is a key is the source of selection against target sequences. The microRNAs that have been conserved over longer periods of vertebrate evolution did not impose detectably greater selection against their target sequences, once the effect of expression levels had been taken into account. On the other hand we failed to detect any noticeable effect when the target sites . For each range of F st values, the proportion of sites in that range was calculated for the microRNA target sites (P t ) and for the background sites (P b ). The value plotted is the log(P t /P b ). (A) All targets sites for highly expressed microRNAs (black, dashed line); target sites where the target allele is ancestral (red) and targets sites where the target site is not ancestral (blue). (B and C) As in (A) but for moderately expressed and zero expressed microRNAs, respectively. (D) Regression between absolute latitude in samples human populations and the target allele frequency associated to SNP rs2470102. Selection against MicroRNA Target Sites . doi:10.1093/molbev/msaa155 MBE was ancestral, and therefore we did not find statistical signal supporting selection to maintain existing microRNA target sites. Than doesn't mean that there is no purifying selection to keep functional sites, but that this is comparably small compare to selection to avoid new, potentially deleterious, microRNA target sites. As a matter of fact, novel microRNA target sites has been identified in various studies as disease causing mutations (Abelson et al. 2005;Dusl et al. 2015).
The most popular microRNA target prediction programs rely on target site conservation to reduce the number of false positives (Agarwal et al. 2015) and/or do not provide a standalone version to run on custom data sets (Paraskevopoulou et al. 2013). Therefore, we used a naive microRNA target prediction method that reports canonical targets and neartarget sites (Marco 2018b). That allowed us to study pairs of alleles segregating at target sites without any other constraint. On the other hand, we would expect a high number of false positive in target predictions (reviewed in Alexiou et al. 2009). Remarkably, we found a significant pattern of selection against microRNA target sites. This reinforces our initial hypothesis and suggests that, if we would be able to restrict the analysis to bona fide target sites, the signal might be stronger. One possibility is to evaluate experimentally validated target sites. However, these experiments are based on reference genomes, so segregating target sites whose target allele is not in the reference genome will be lost from the analysis. The way forward may be to perform highthroughtput microRNA target experiments, like HITS-CLIP (Chi et al. 2009), in cells derived from different populations. The continuous drop in the costs of sequencing and highthroughput experiments may allow this in the near future. Indeed, high-throughput experimental evaluation of segregating alleles at regulatory motifs (transcription factor binding sites, RNA binding sites, etcetera) is a promising area of research which will help us to move from a typological (reference genome) to a population view of gene regulation.
Another way to study the effect of selection in populations is to evaluate the population differentiation (Coop et al. 2009;Li et al. 2012). We found that, at microRNA target sites in general, there is an enrichment in high population  Li et al. (2012). However, we found that this trend only holds for those sites at which the ancestral allele was a microRNA target site and the derived allele is a nontarget. The loss of a microRNA target (as in the examples reported in a previous work [Li et al. 2012]) may be relatively frequent. It follows that the nontarget allele might be neutral or even advantageous in some of these cases. It is noticeable that in the examples in which the derived allele reaches a high frequency, that occurs in the non-African populations, which is the pattern that would be expected if a neutral derived allele spread by genetic drift during founder events. Loss of target sites could also have advantageous effects, though the complex interactions that occur in regulatory networks. For example, it has been proposed that in the human lineage the loss of microRNA target sites contributed to an increase in the expression levels of some genes (Gardner and Vinther 2008). We also recently reported the loss of multiple target sites in Out-of-African populations . Our work suggests that this loss of targets may be continuing now in human populations. Selection in favor of new target sites appears to be rarer: we found a strong signal of purifying selection against novel microRNA target sites. It is worth mentioning that our result as well as other works are based on precomputed F st values that may be affected by systematic biases in allele imputation. As more genome sequences of high quality are released, and new F st estimations become available, it will be important to reevaluate the impact of population differentiation in microRNA target sites. In general, our results also depend on the ability to predict microRNA target sites from primary sequences. As the number of polymorphic microRNA target sites experimentally validated is very limited, at the moment our approach is the only possible. However, it'll be important to re-evaluate our results in due course as more data sets become available. Also, our statistical analysis relies on two types of null models: microRNAs with not detected expression, and the analysis of the reverse complement sequence of 3 0 -UTRs. It is possible that microRNAs expressed in only a few cells are wrongly attributed to the "null" class. Likewise, the reverse complement of 3 0 -UTR could also be potentially transcribed (antisense transcripts) and, hence, wrongly selected as "null" class. In any case, the use of complementary null models leading to similar results supports our hypothesis, and indicates that the use of better annotated data sets in the future may even increase the statistical evidence in favor of selection against microRNA target sites.
Our results suggest that new microRNA target site generating mutations (the derive allele is the target) are selected against. This is a case of the classic selection-mutation balance in which the mutations are deleterious. Selection against deleterious mutations has been extensively studied in population genetics (reviewed in Charlesworth 2012). For instance, strong purifying selection produces a phenomenon called background selection, in which loci linked to the selected site experience a reduction in their effective population size (Charlesworth et al. 1993). That is, purifying selection reduces the influence of selection at linked sites. For weakly selected sites, a similar process has been described: weak selection Hill-Robertson interference (wsHR [Hill and Robertson 1966;McVean and Charlesworth 2000]). Under wsHR, multiple alleles are under a weak selective force, very close together so that recombination is small or negligible between sites, interfering with other selective pressures in the area. Multiple weakly deleterious mutation at transcription factor binding sites has been reported indeed (Abecasis et al. 2012). We believe that this is the case for the selection against microRNA target sites here described: weak selection against multiple target/near-target sites will shape the evolutionary landscape of the entire untranslated region.
Our study suggests that the mutation rate in humans may be high enough to produce a significant selective pressure against novel microRNA target sites. New target sites will emerge at a significant rate because many mutations can potentially introduce a new site for one of the many microRNAs. More specifically, there are $2,000 microRNA families described in TargetScan (see Materials and Methods), defined by 7-nt seed sequences. Assuming that 3 0 -UTRs are composed of nonoverlapping 7-mers (a simplifying yet conservative assumption) the expected number of near-target sites per kilobase (kb) is $75. With a mutation rate of 2.5 Â 10 À5 per kb (Nachman and Crowell 2000) and a total length of the genome that encode 3 0 -UTRs of 34 Mb, it can be shown that there will be on average one novel microRNA target site on a 3 0 -UTR per genome per generation. That is, one potential deleterious microRNA target site per person per generation.
FIG. 5. Allele preference for ancestral targets and nontargets. The shift of the distribution of derived allele frequencies is measured as the D statistic form the Kolmogorov-Smirnov test (x-axis). When the distribution is biased toward the derived allele the D statistic is negative (-D -), and if the bias is to the ancestral allele the D value is D þ . The yaxis plots the -log 10 of the P-value obtained from the corresponding one-tailed Kolmogorov-Smirnov test. For each tissue, the graph represents the DAF bias when the ancestral allele is a target (blue circles) and when it is a nontarget (red circles).
Selection against MicroRNA Target Sites . doi:10.1093/molbev/msaa155 MBE It is expected that other regulatory motifs influence the evolution of 3 0 -UTRs. For instance, Savisaar and Hurst (2017) have described selection against RNA-binding motifs. The selective avoidance of transcription factor binding sites (Hahn et al. 2003;Babbitt 2010) and of mRNA/ncRNA regulatory interactions in bacteria (Umu et al. 2016) have been also described. These works are based on comparative genomics between different species and not on variation within populations. However, some theoretical models take into account the selective pressure against regulatory motifs (Berg et al. 2004;Stewart et al. 2012). It is likely that on top of all the selective forces that are usually taken into account, there is a layer of selection against weakly deleterious regulatory motifs that will be influencing the evolution of the genome. In conclusion, selection against microRNA target sites in prevalent in human populations, and it may constrain other selective forces in posttranscriptional regulatory regions.

Materials and Methods
MicroRNA mature sequences were downloaded from miRBase v.22, considering only sequences annotated with "high confidence" (Kozomara et al. 2018). MicroRNA target and near-target sites were predicted with seedVicious (v.1.1 [Marco 2018b]) against 3 0 -UTR as annotated in Ensembl version 96 (Kinsella et al. 2011) for the human genome assembly hg38. SNPs for the 1000 Genomes project (1000Genomes Project Consortium et al. 2015 were retrieved from dbSNP (build 151) (Sherry et al. 2001) and mapped to our target predictions. Ensembl sequences and polymorphism data were downloaded using the BiomaRt R package (Durinck et al. 2009). By mapping SNPs to targets and to near-targets (as described in Marco 2018b) we are able to identify pairs of alleles in which only one of the allele is a target site, so allele frequencies can be computed as target allele frequencies . When plotting allele frequencies ( fig. 2) we only considered segregating alleles in which the minor allele is present in at least 1% of the sampled population, as reported dbSNP (see Hatlen et al. 2019 for details). The ancestral allele status was also retrieved from the precomputed values available from dbSNP. To compute the background (randomly expected) allele distributions we repeated the process but finding targets in the reverse complement strand of the 3 0 -UTR, to control for sequence length and composition. In the analysis of allele frequencies we used a second microRNA target prediction program based on a different principle, miRanda (Enright et al. 2003), which takes into account the binding energy of the RNA: RNA duplex, and we use the default parameters. About 25% of polymorphic canonical target sites were also predicted as targets by miRanda.
Expression information for microRNAs was obtained our own database PopTargs (https://poptargs.essex.ac.uk; Hatlen et al. 2019). In summary, high-throughput experiments from Meunier et al. (2013) and miRMine (Panwar et al. 2017) were downloaded, reads were mapped to miRBase mature microRNA sequences and the average reads per million for each microRNAs was consider to measure the expression level in a tissue. Coding genes were considered to be expressed in a given tissue as precomputed in the Bgee database ("gold" set, version 14; Bastian et al. 2008). We studied the following tissues: lung, blood, placenta, liver, heart, brain, kidney, cerebellum, breast, and testis. Importantly, most microRNA and mRNA extractions were done in the same commercial samples by the same lab (Brawand et al. 2011;Meunier et al. 2013), which suggest that coexpressed microRNA/mRNA pairs were present in the same tissue. MicroRNAs with >50 RPM (reads per million) were considered moderately expressed, and those with over 500 RPM were labeled as highly expressed. For transcripts, we did not make distinctions between different expression levels as transcript levels are affected by the action of microRNAs (Nam et al. 2014). Coexpressed microRNA: gene pairs were those whose microRNA was moderately or highly expressed in a tissue (depending on the analysis described) and the transcript targeted is detected in the same tissue. For "blood" and "liver" tissues we perform additional analysis considering the tissue specific polyadenilation signals as reported in APADB version 2 (Müller et al. 2014). MicroRNAs were grouped into evolutionary conservation categories depending on the species spread (primates, mammals, and animals) of the seed family in TargetScan 7.2 (Agarwal et al. 2015).
F st values were retrieved from the 1000 Genomes Selection Browser 1.0 (Pybus et al. 2014). In figure 4, the fold enrichment was computed as the logarithm of the ratio between the proportion of SNPs at target sites and the proportion of SNPs at a background site for each F st bin. The latitude of specific populations was computed as in Helmy et al. (2019). First, the geographical locations of the studied populations were obtained from the sample descriptions at https://www. coriell.org/1/NHGRI/Collections/1000-Genomes-Collections/ 1000-Genomes-Project, using the location of the capital of the country in cases where the location was ambiguous or nonspecific. For multiple collection points we computed the centroid middle point. Populations that migrated in recent times were discarded (CHD, CEU, ASW, ACB, GIH, STU, and ITU as described at IGSR: The International Genome Sample Resource [http://www.internationalgenome.org/category/ population/]).
All statistical analyses were done with R (v. 3.4.3, R Development Core Team 2009). All processed data sets and online tools to compute the tests here reported are available at our dedicated web server PopTargs (https://poptargs.essex. ac.uk; Hatlen et al. 2019). All the code use for the generation of the results, figures and tables is freely available from FigShare at https://doi.org/10.6084/m9.figshare.9539645.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.
Performance Computing Facility (Ceres) and its associated support services at the University of Essex in the completion of this work. This research was supported by the Wellcome Trust (Grant No. 200585/Z/16/Z).