Effective Population Size Predicts Local Rates but Not Local Mitigation of Read-through Errors

Abstract In correctly predicting that selection efficiency is positively correlated with the effective population size (Ne), the nearly neutral theory provides a coherent understanding of between-species variation in numerous genomic parameters, including heritable error (germline mutation) rates. Does the same theory also explain variation in phenotypic error rates and in abundance of error mitigation mechanisms? Translational read-through provides a model to investigate both issues as it is common, mostly nonadaptive, and has good proxy for rate (TAA being the least leaky stop codon) and potential error mitigation via “fail-safe” 3′ additional stop codons (ASCs). Prior theory of translational read-through has suggested that when population sizes are high, weak selection for local mitigation can be effective thus predicting a positive correlation between ASC enrichment and Ne. Contra to prediction, we find that ASC enrichment is not correlated with Ne. ASC enrichment, although highly phylogenetically patchy, is, however, more common both in unicellular species and in genes expressed in unicellular modes in multicellular species. By contrast, Ne does positively correlate with TAA enrichment. These results imply that local phenotypic error rates, not local mitigation rates, are consistent with a drift barrier/nearly neutral model.

. Pruned phylogenetic tree describing the eukaryotic species used in PGLS analysis. 15 species were used in our phylogenetically-controlled tests for correlation, pruned from 24 species to remove species with low divergence time. The tree was derived using TimeTree, which requires a species list to be uploaded.

Supplementary fig. S5. Correlation analysis between Ne and four measures of ASC enrichment in 24 eukaryotes.
To investigate the possibility of a relationship between Ne and ASC enrichment, we calculate ASC enrichment scores for each genome using two methods. First, we consider a score which takes the average ASC enrichment at each position (from +1 to +6 downstream; see methods of the main paper). Second, given genes possessing an ASC are unlikely to select for a third stop, we consider just the maximum score (at any position from +1 to +6). There is, however, an argument that position +1 should be ignored when considering ASC enrichment due to the possible selection of extended termination motifs immediately proximal to the primary stop. For each method, we hence calculate one score that includes position +1 and one score that excludes it. We find that all four measures of genome ASC enrichment are positively correlated with log(Ne) before Bonferroni correction (enrichment score including +1: p = 0.0080, enrichment score excluding +1: p = 0.0090, max score including +1: p = 0.025, max score excluding +1: p = 0.010).
Supplementary fig. S6. Additional stop codon (ASC) frequency comparison between bacterial genomes with and without an annotated ArfA gene. ArfA is associated with ribosome rescue in mRNAs that do not contain a stop codon in bacteria, hence we predict genomes without an ArfA annotated gene to have greater selection for fail-safe ASCs. To test this prediction, we calculate ASC frequencies for all ArfA-absent genomes (n=212) available for download from EMBL, for all ArfA-absent genomes that are relatively phylogenetically independent (one genome per genus, n=6) and for similarly independent ArfA-present genomes (n = 639). Considering all ArfA-absent genomes, ASC frequencies are significantly lower than observed in the ArfA-present group (p = 2.9 x 10 -15 , Wilcoxon signed-rank test). This is corroborated when using just independent ArfA-absent species (p = 0.0060, Wilcoxon signed-rank test).
Supplementary table T2. Consensus sequences for TGA-terminating highly expressed genes in 19 eukaryotes. Nucleotides A, T, G and C are called if there exists significant enrichment (p < 0.05) for these bases compared to null expectations (generated from genes of all expression level in the genome) according to Chi 2 tests. A minus sign indicates significant under-enrichment compared to null. An 'N' is called is there exists no significant deviation one way or another for all bases at this position.

Species
Supplementary table T3. Consensus sequences for TAG-terminating highly expressed genes in 19 eukaryotes. Nucleotides A, T, G and C are called if there exists significant enrichment (p < 0.05) for these bases compared to null expectations (generated from genes of all expression level in the genome) according to Chi 2 tests. A minus sign indicates significant under-enrichment compared to null. An 'N' is called is there exists no significant deviation one way or another for all bases at this position.