Genome-Wide Biases in the Rate and Molecular Spectrum of Spontaneous Mutations in Vibrio cholerae and Vibrio fischeri

The vast diversity in nucleotide composition and architecture among bacterial genomes may be partly explained by inherent biases in the rates and spectra of spontaneous mutations. Bacterial genomes with multiple chromosomes are relatively unusual but some are relevant to human health, none more so than the causative agent of cholera, Vibrio cholerae. Here, we present the genome-wide mutation spectra in wild-type and mismatch repair (MMR) defective backgrounds of two Vibrio species, the low-%GC squid symbiont V. fischeri and the pathogen V. cholerae, collected under conditions that greatly minimize the efficiency of natural selection. In apparent contrast to their high diversity in nature, both wild-type V. fischeri and V. cholerae have among the lowest rates for base-substitution mutations (bpsms) and insertion–deletion mutations (indels) that have been measured, below 10−3/genome/generation. Vibrio fischeri and V. cholerae have distinct mutation spectra, but both are AT-biased and produce a surprising number of multi-nucleotide indels. Furthermore, the loss of a functional MMR system caused the mutation spectra of these species to converge, implying that the MMR system itself contributes to species-specific mutation patterns. Bpsm and indel rates varied among genome regions, but do not explain the more rapid evolutionary rates of genes on chromosome 2, which likely result from weaker purifying selection. More generally, the very low mutation rates of Vibrio species correlate inversely with their immense population sizes and suggest that selection may not only have maximized replication fidelity but also optimized other polygenic traits relative to the constraints of genetic drift.


Introduction
In 2000, the complete genome sequence of Vibrio cholerae, the first bacterial genome with multiple chromosomes to be completed, was published in Nature to acclaim (Heidelberg et al. 2000). Deemed a "treasure trove" for microbiology and genomics researchers (Waldor and RayChaudhuri 2000), the V. cholerae genome revealed several intriguing asymmetries between the chromosomes. The larger chromosome 1 (chr1) is $3 Mb in size and contains most essential genes and the pathogenicity elements required to cause the disease cholera, whereas the smaller chromosome 2 (chr2) is $1 Mb and contains few essential genes but many undefined genes acquired by horizontal transfer (Heidelberg et al. 2000;Cooper et al. 2010). The plasmid-like origin of replication on chr2 implies that it originated as a megaplasmid that became entrapped by the translocation of essential genes, and the greater variation in gene content among strains on chr2 has led many to speculate that the small chromosome somehow confers an evolutionary advantage in varied environments, perhaps by being enriched for conditionally useful traits (Schoolnik and Yildiz 2000). Indeed, this two-chromosome structure is found throughout the Vibrionaceae family, suggesting an uncertain evolutionary advantage in their aquatic habitats.
Subsequent studies of the evolution of Vibrio genomes have shown that not only is chr2 more variable in its content, its conserved genes tend to evolve more rapidly (Cooper et al. 2010). These elevated evolutionary rates likely stem from a gene dosage bias between chr1 and chr2 that emerges from their replication timing. Specifically, replication of chr2 occurs later than chr1, leading to synchronous termination of replication of both chromosomes, even though chr1 is larger (Egan and Waldor 2003;Duigou et al. 2006;Rasmussen et al. 2007). The earlier replication of two-thirds of chr1 generates a gene dosage bias that can be greatly compounded during rapid growth in which multiple replication forks are active (Stokke et al. 2011). Consequently, genes replicated early tend to be expressed more, experience greater purifying selection, and hence evolve more slowly (Cooper et al. 2010;Morrow and Cooper 2012). Effects of translocations between chromosomes also support this model: genes moving between chromosomes tend to exhibit altered expression rates that match their new neighbors (Morrow and Cooper 2012), and orthologs found on secondary chromosomes are predisposed to more rapid evolution even when found in related genomes with a single chromosome (Cooper et al. 2010).
While these comparative-genomic studies suggest that replication timing and its effects on gene dosage may influence the strength of purifying selection (and hence the fate of genetic variation), evidence also exists that replication timing positively correlates, albeit weakly, with mutation rates (the origin of variation) in all domains of life (Stamatoyannopoulos et al. 2009;Chen et al. 2010;Agier and Fischer 2012;Martincorena et al. 2012). Mutation rates may be nonuniform because of greater rates of damage, asymmetric nucleotide pools, structural differences affecting polymerase fidelity, biases in gene expression or replication, or variation in the effectiveness of DNA repair (Sharp et al. 1989;Zhang and Mathews 1995;Mira and Ochman 2002;Cooper et al. 2010;Lee et al. 2012). If late-replicated genes experience both weaker purifying selection and greater mutation rates relative to early-replicated genes, selection could therefore act on genome organization to influence both the origin and fate of genetic variation.
The goal of this study was to explore the properties of spontaneous mutation in two species of Vibrio, with emphasis on how mutation rates and spectra vary within and among chromosomes based on existing models of replication timing. Both V. cholerae and V. fischeri are globally significant and are often compared with one another because of their marine habitats and similar genome structures, but contrasting roles in pathogenesis and symbiosis (Goldberg and Murphy 1983;Thompson et al. 2004;Ruby et al. 2005). We also evaluated how the mismatch repair (MMR) system influenced the rates and spectra of spontaneous mutations in both organisms by removing the mutS gene from each genome. Our approach used the well-documented method of mutation accumulation (MA) paired with whole-genome sequencing (WGS) (Behringer and Hall 2016). Using the MA-WGS method, a single clonal ancestor is used to initiate many replicate lineages that are passaged through hundreds of single cell bottlenecks before each being subjected to WGS. The bottlenecking regime minimizes the efficiency of natural selection on mutations, and when the genome sequences are compared, a nearly unbiased picture of the natural mutation rates and spectra of the ancestor is revealed. A growing body of MA-WGS studies in microbes have revealed some general properties of spontaneous mutation (Lynch et al. 2008;Denver et al. 2009;Ossowski et al. 2010;Lee et al. 2012;Sung, Ackerman, et al. 2012;Sung, Tucker, et al. 2012;Schrider et al. 2013;Heilbron et al. 2014;Long et al. 2014Long et al. , 2015Zhu et al. 2014;Dillon et al. 2015;Sung et al. 2015;Dettman et al. 2016), but unique properties of the spontaneous mutation rates and spectra in many taxa have revealed the value of conducting detailed MA-WGS studies in a more comprehensive collection of species, particularly those with multiple chromosomes.
Here, we performed four detailed MA-WGS experiments on both wild-type and MMR-deficient strains of V. fischeri and V. cholerae. We identified a total of 439 wild-type mutations and 5,990 DmutS mutations distributed across both chromosomes of V. fischeri and V. cholerae, producing the first genome-wide perspective of the bpsm and indel biases in wild-type and MMR-deficient strains of these two significant bacterial species. In the MMR-deficient strains, mutation rates were nonuniform among genome regions and varied in patterns that appear to be associated with replication timing. Both wild-type genomes also exhibited distinct basesubstitution mutation spectra that correlated with their different %GC content and also experienced a surprising number of multi-nucleotide indel mutations. Remarkably, the loss of MMR caused both mutation spectra to converge towards similar patterns dominated by transition mutations and small indels, which suggests that variation in the MMR system may be partly responsible for producing differences in genome composition.

Results
The two ancestral genomes for this study share the twochromosome architecture common to Vibrio but differ in other important aspects. Vibrio fischeri strain ES114 was isolated from the light organ of a sepiolid squid, Euprymna scolopes, and has since served a model light organ symbiont (Boettcher and Ruby 1990;Ruby et al. 2005). The genome of V. fischeri ES114 harbors two chromosomes of 2.90 and 1.33 Mb that differ slightly in their %GC content (38.96% for chr1 and 37.02% for chr2). In addition, V. fischeri ES114 has a 45.85-kb plasmid with a %GC content of 38.44% (Ruby et al. 2005). Vibrio cholerae strain 2740-80 is a nonpathogenic, environmental V. cholerae isolate related to US epidemic strains (Kaper et al. 1982). The genome of V. cholerae 2740-80, which was previously deposited as a draft assembly, but was closed and polished for this study using a combination of PacBio sequencing, Illumina sequencing, and consensus assemblies (see "Materials and Methods" section). This V. cholerae genome includes a 2.99-Mb primary chromosome (chr1) and a 1.10-Mb secondary chromosome (chr2), with %GC contents of 47.85% and 46.83%, respectively. Four MA experiments were conducted with these strains using daily singlecell bottlenecks produced by isolation-streaking on agar plates, which limits the efficiency of natural selection to purge deleterious and enrich beneficial mutations. For the two wildtype (wt) experiments, V. fischeri ES114 (Vf-wt) and V. cholerae 2740-80 (Vc-wt) colonies were used to found 75 MA lineages, each of which was propagated for 217 days. For the two mutator (mut) experiments, V. fischeri ES114 (Vfmut) and V. cholerae 2740-80 (Vc-mut) strains lacking a mutS gene were used to found 48 MA lineages, each of which was propagated for 43 days. Because Vibrio species lack a functional very short patch (VSP) repair pathway (Bhakat et al. 1999;Banerjee and Chowdhury 2006), these mutSstrains are expected to abolish the activity of MMR without Dillon et al. . doi:10.1093/molbev/msw224 MBE also affecting other major repair pathways. The properties of each of these experiments, the observed mutation spectra, and the lack of genetic parallelism suggest that few of these mutations were subject to the biases of natural selection (see "Supplementary Material online" text). Summary parameters of each MA experiment, including the numbers of mutations identified in all final isolates, generations of growth, and overall mutation rates are summarized in table 1. Owing to the fitness cost of increasing mutational load, generations of growth per day declined over the course of each MA experiment, particularly in the mutator lineages (supplementary fig. S1, Supplementary Material online).

Wild-Type Base-Substitution Mutation Rates and Spectra
The wild-type base-substitution mutation rates for these Vibrio species are among the lowest per-generation rates that have been observed in any organism (supplementary fig. S2, Supplementary Material online). The bpsm rate for V. fischeri is 2.07 (0.207) • 10 À10 /bp/generation (SEM), which is approximately twice the rate of bpsm in V. cholerae, 1.07 (0.094) • 10 À 10 /bp/generation (SEM). These per-site bpsm rates correspond to genome-wide bpsm rates of 0.0009/genome/ generation for V. fischeri and 0.0004/genome/generation for V. cholerae (table 1). Bpsm rates differ somewhat between the chromosomes of V. fischeri, with chr2 experiencing significantly greater rates than chr1 (v 2 ¼ 4.799, d.f. ¼ 1, P ¼ 0.028), but no difference was found between the chromosomes of V. cholerae 1A). The increased bpsm rate on chr2 of V. fischeri cannot be explained by the relative nucleotide contents of the chromosomes (%GC: chr1, 39.0%; chr2, 37.0%). In fact, when we account for AT-biased mutation in V. fischeri, the difference between the expected and observed bpsms between the two chromosomes increased (v 2 ¼ 5.747, d.f. ¼ 1, P ¼ 0.017). Correcting for AT-biased mutation in V. cholerae (%GC: chr1, 47.9%; chr2, 46.8%) revealed no significant difference between the bpsm rates of chr1 and chr2 Base-substitution mutation spectra in both V. fischeri and V. cholerae were significantly AT-biased (Vf: 1C). Interestingly, the AT-bias is stronger in V. fischeri, which has the lower genome-wide %GC-content. However, consistent with previous MA studies of bacteria (Lind and Andersson 2008;Lee et al. 2012;Sung, Ackerman, et al. 2012;Dillon et al. 2015;Sung et al. 2015;Dettman et al. 2016), AT-mutation bias alone fails to explain realized genome-wide %GC-contents. For V. fischeri, the expected %GC-content under mutation-drift equilibrium is 0.20260.034 (SEM), 0.182 lower than the genome-wide %GC-content (0.384). This AT-bias is generated by high relative bpsm rates of both G:C > A:T transitions and G:C > T:A transversions, but it is the relative G:C > T:A transversion rate that is especially high in comparison to MA studies in other species. With a rate of 9.19 • 10 À9 /bp/generation, G:C > T:A transversions occur at a higher rate than any other bpsm, generating a transition/transversion ratio (T S /T V ) of 0.851. Similarly, the expected %GC content of V. cholerae under mutation-drift equilibrium is 0.28160.044 (SEM), 0.195 lower than the genome-wide %GC content (0.476). However, in V. cholerae, the AT-bias is generated mostly by G:C > A:T transitions rather than G:C > T:A transversions, resulting in a T S /T V of 1.59.
Despite the rarity of A:T > C:G and A:T > T:A transversions in both wild-type experiments, there was still a significant bias for A:T transversions to occur at the canonical DNA adenine methyltransferase (Dam) methylation site, GATC. In V. fischeri, only 1.00% of A:T bp occurred in GATC sequences, but 7/30 (23.33%) of the A:T transversions were observed at these sites (v 2 ¼ 119.910, d.f. ¼ 1, P < 0.0001). Similarly, in V. cholerae, 1.76% of A:T bp occurred in GATC sequences, whereas 6/23 (26.09%) of the A:T transversions occurred at these sites (v 2 ¼ 59.961, d.f. ¼ 1, P < 0.0001). This is consistent with prior observations in Escherichia coli , but is not universal across bacteria (Long et al. 2015;Sung et al. 2015). We also observed a moderately significant enrichment of bpsms on lagging strand genes in V. fischeri, where 86/176 (48.86%) coding bpsms occurred in regions containing lagging strand genes, despite only 40.41% of coding bases being on the lagging strand (v 2 ¼ 5.223, d.f. ¼ 1, P ¼ 0.022). However, this was not the case in V. cholerae, where 39/107 (36.45%) coding bpsms were on the lagging strand, consistent with the 40.89% of lagging strand coding bases genome-wide To test whether the bpsm spectra on chr1 were significantly different from those on chr2, we used the conditional bpsm rates (corrected for %GC-content) of chr1 to generate

Wild-Type Insertion-Deletion Mutation Rates
Indels occurred at approximately one-fifth the rate of bpsm and occurred predominantly in simple sequence repeats (SSRs). Cumulative indel mutation rates for V. fischeri and V. cholerae were 5.68 (0.691) • 10 À 11 /bp/generation and 1.71 (0.337) • 10 À 11 /bp/generation (SEM), respectively (table  1). These rates correspond to genome-wide indel rates of 0.0002/genome/generation for V. fischeri and 0.00007/genome/generation for V. cholerae (table 1). The indel spectra of both species were biased towards deletions, with deletions occurring at approximately twice the rate of insertions in V. fischeri (v 2 ¼ 4.267, d.f. ¼ 1, P ¼ 0.039), and 3 times the rate of insertions in V. cholerae (v 2 ¼ 6.546, d.f. ¼ 1, P ¼ 0.011) ( fig. 1). Indels were also significantly more frequent on chr1 in V. fischeri (v 2 ¼ 9.066, d.f. ¼ 1, P ¼ 0.003) and slightly more In contrast with previous reports from bacterial MA-WGS studies, we find many multi-nucleotide indels in both the Vfwt and Vc-wt MA experiments. In Vf-wt, only 20.00% of indels involved the insertion or deletion of a single nucleotide, whereas 36.36% of the Vc-wt indels involved a single nucleotide. Furthermore, the distribution of the number of indels identified for each length <10 bp demonstrates that short multi-nucleotide indels were relatively common, particularly in V. fischeri ( fig. 2).
More strikingly, in V. fischeri, the indel rates of SSRs scaled positively with the repeat unit length and the number of repeated units. 41 of the 60 indels (68.33%) observed in V. fischeri occurred in SSRs containing three or more repeated units, which is significantly more than we expect based on the frequency of bases in SSRs in the V. fischeri genome (v 2 ¼ 82.915, d.f. ¼ 1, P < 0.0001). When SSRs are categorized by their repeat unit length (mono-, di-, and tri-nucleotide), SSRs with longer repeat units experience elevated mutation rates ( fig. 3). Importantly, these rates vary over orders of magnitude and their observed frequencies differ significantly from the null expectation derived from their target sizes (v 2 ¼ 2.120 • 10 5 , d.f. ¼ 8, P < 0.0001). SSR indel rates also scale positively with the number of repeats in the SSR and again differ significantly from the null expectation derived from their genome target size (chi-squared test, repeat numbers 3-10; v 2 ¼ 5.590 • 10 2 , d.f. ¼ 7, P < 0.0001). Lastly, a few SSRs appear to be especially mutagenic because the same locus mutated independently in multiple lineages (supple mentary dataset S2, Supplementary Material online). We cannot ascertain whether similar SSR biases exist in V. cholerae because only 22 indels were observed and only eight of those were in SSRs with three or more repeats (36.36%). However, the elevated occurrence of indels in SSRs in V. cholerae is consistent with our observations in V. fischeri

Effects of Losing DNA Mismatch Repair
The deletion of the mutS gene results in a faulty MMR system and is expected to increase the rates and alter the spectra of bpsms and indels. In V. fischeri, DmutS produced a 317-fold increase in the bpsm rate and a 102-fold increase in the indel rate (table 1). This DmutS mutation also eliminated differences between chromosomes in both bpsm and indel rates observed in the Vf-wt lineages (bpsm: In V. cholerae, the mutS deletion produced an 85-fold increase in the bpsm rate and a 142-fold increase in the indel rate (table 1). However, in this case, slightly more bpsm occurred on chr1 in Vc-mut lineages than expected (bpsm: Losing functional mismatch repair caused the bpsm spectra of both V. fischeri and V. cholerae to change dramatically and converge ( fig. 4). Despite the significantly different bpsm spectra between wild-type species (v 2 ¼ 32.788, d.f. ¼ 5, P < 0.0001), the loss of mutS caused Vf-mut and Vc-mut lineages to accumulate a remarkably similar spectrum of bpsms (v 2 ¼ 10.178, d.f. ¼ 5, P ¼ 0.070). Specifically, both sets of mutator lines were dominated by A:T > G:C and G:C > T:A transitions, which would yield significantly more %GC-rich genomes of 0.48260.016 (SEM) in Vf-mut and 0.47560.011 (SEM) in Vc-mut at mutation-drift equilibrium. In addition, whereas V. fischeri continues to have a slight deletion bias with defective MMR, insertions actually occur at a slightly higher rate than deletions in the Vc-mut lineages, although not significantly so (Vf-mut: Lastly, consistent with the wild-type lineages, AT transversions occur at significantly higher rates at Dam methylation sites (Vf-mut: there was no enrichment of bpsms in lagging strand genes (Vf-mut: Relative frequency of insertion-deletion mutations (indels) of different lengths observed in the wild-type (wt) and mismatch repair deficient (mut) strains of Vibrio fischeri (A) and Vibrio cholerae (B). Note that the overall indel rates of Vf-mut and Vc-mut are 102-and 142-fold higher than the wild-type rates, respectively, but it is the relative frequencies of different indel lengths that are shown here. The increase in the number of single-nucleotide indels in mutator lines is significant (Vf-mut: Mutation Biases in Vibrio Species . doi:10.1093/molbev/msw224 While multi-nucleotide indels were relatively common in the wild-type MA lines, the vast majority of indels involved a single nucleotide in both the Vf-mut (93.62%) and Vc-mut (85.66%) lines ( fig. 2). The mutS deletion increased the single-nucleotide indel rate 473-fold but the multinucleotide indel rate only 13-fold in V. fischeri. Similar patterns were seen in V. cholerae, where DmutS increased the single-nucleotide indel rate 334-fold and the multinucleotide indel rate only 64-fold. Most of the increase in the multi-nucleotide mutation rate resulted from di-and trinucleotide indels that were rarely observed in the wild-type MA experiments ( fig. 2). Yet all indels detected in the mutator MA experiments up to 10 bp in length were also significantly over-represented in mutator MA experiments (supplementary table S2, Supplementary Material online), implying that mutS and the MMR system in general are involved in the repair of all indel mutations up to 10 bp in length.
Most single-nucleotide indels in Vf-mut and Vc-mut lines occurred in homopolymeric runs. The single-nucleotide indel mutation rate in both Vf-mut and Vc-mut correlates positively with the length of the homopolymer (fig. 5) and differs significantly from the null expectation derived from the genome target size (chi-squared test, repeat numbers 3-11; Vfmut: v 2 ¼ 3.189 • 10 4 , d.f. ¼ 8, P < 0.0001; Vc-mut: v 2 ¼ 6.850 • 10 4 , d.f. ¼ 8, P < 0.0001). Because indels in SSRs with longer repeated units were rarely observed in the mutator lineages, we cannot confirm whether the length of the repeated unit in an SSR also correlates positively with its indel rate in either of the mutator lineages. Wild-type insertion-deletion mutation (indel) rates per run per generation and frequencies in simple-sequence repeats (SSRs) containing three or more repeats in Vibrio fischeri. SSRs were categorized by the length of the repeated unit and indel rates per run per generation were calculated as the number of observed indels in that SSR category, divided by the product of the occurrence of that SSR type in the genome, the number of generations, and the number of MA lineages analyzed. Expected frequencies were calculated based on the target size of each SSR category in the genome. To this point we have only discussed inter-chromosomal differences in mutation rates between two autonomously replicating chromosomes. However, the distribution of bpsms and indels in V. fischeri and V. cholerae may also vary among regions within these circular chromosomes. We analyzed the genome-wide distribution of bpsms by dividing each chromosome into 100-kb intervals extending bi-directionally from the origin of replication. Despite apparent intra-chromosome variation in the bpsm rate of both wild-type ancestors ( fig. 6A and B, inner rings), the observed number of bpsms in 100-kb intervals does not differ significantly from random expectations based on the number of analyzed sites (Vf-wt Chr1: However, bpsms were not distributed evenly on chr1 in either of the mutator experiments (Vf-mut: v 2 ¼ 132.970, d.f. ¼ 29, P < 0.0001; Vc-mut: v 2 ¼ 102.420, d.f. ¼ 29, P < 0.0001), resulting in a mirrored wave-like pattern of bpsm rates on the right and left replichores ( fig. 6A and B; outer rings). Indeed, we find a significant positive relationship between the bpsm rates on the right replichore and concurrently replicated regions on the left replichore in both genomes (Linear In contrast to chr1, the observed distribution of bpsms on chr2 of the V. fischeri and V. cholerae mutator lineages does not differ from a null, constant rate among regions (Vf-mut: , suggesting that bpsm rates are more consistent across chr2 than they are on chr1 ( fig. 6A and B).
Our observation of an apparent effect of replication timing on bpsm rates on chr1 of the mutator lineages but not the wild-type lineages begs the question-is this effect a MMRspecific phenomenon, or is our wild-type dataset just too small to detect an effect? First, we performed a power analysis to estimate the number of mutations required on chr1 of both V. fischeri and V. cholerae to detect small (w ¼ 0.1), medium (w ¼ 0.3), and large (w ¼ 0.5) effects using a chisquared test at a significance level of 0.05 and a power of 0.95 (Cohen 1988). The degrees of freedom are 29 for both genomes when separated into 100-kb intervals because they are both $3 Mb in length. Therefore, estimates of the number of mutations required to reject the null hypothesis of a uniform distribution of spontaneous bpsms are the same for both V. fischeri and V. cholerae-3,508 for small effects, 390 for medium effects, and 140 for large effects. These values for Mutation Biases in Vibrio Species . doi:10.1093/molbev/msw224 chr1 would correspond to %5,174, 575, and 206 genome-wide bpsms to detect this effect with confidence in V. fischeri and %4,802, 787, and 281 genome-wide bpsms to detect this effect with confidence in V. cholerae, so it is not unsurprising that we failed to detect significant variation in genome-wide bpsm rates in the wild-type genomes, where there were only 219 and 138 genome-wide bpsms for V. fischeri and V. cholerae, respectively. Furthermore, by using the effect size observed in the mutator lineages, we can estimate that we would need a minimum of 932 bpsms on chr1 (%1,375 total) in V. fischeri and 323 bpsms on chr1 (%442 total) in V. cholerae to reject the null hypothesis of a uniform distribution of bpsms on chr1, which again, is substantially more bpsms than we observed in either wild-type experiment. Indels occur predominantly in SSRs, which are not uniformly distributed across the genome, and are thus also expected to vary significantly between genome regions. As was the case for bpsms, we tested for this variation by separating each chromosome into 100-kb intervals extending bidirectionally from the origin of replication. The observed distribution of indels in 100-kb intervals differs significantly from random expectations in all experiments except Vc-wt, likely because only 22 indels were detected (Vf-wt: v 2 ¼ 83.079, d.f. ¼ 43, P < 0.0001; Vc-wt: v 2 ¼ 42.224, d.f. ¼ 41, P ¼ 0.418; Vf-mut: v 2 ¼ 96.383, d.f. ¼ 43, P < 0.0001; Vc-mut: v 2 ¼ 157.240, d.f. ¼ 41, P < 0.0001). However, indel rates were not conserved on opposing replichores of either chromosome and there is no clear pattern to their genome-wide distribution beyond that of the strong SSR bias ( fig. 6C and D).

Causes and Effects of High Genetic Diversity within Vibrio
The unique and broadly significant biology of both V. cholerae and V. fischeri motivated this comprehensive study of their mutational processes in the near-absence of natural selection. Vibrio cholerae is one of the most significant human pathogens, infecting 3-5 million people and causing approximately 100,000 deaths annually from side effects of the profuse diarrhea caused by toxigenic strains (Lozano et al. 2010). V. cholerae also lives in a broad range of aquatic habitats associated with zooplankton and is often highly abundant, demonstrating that only a small number of strains are capable of causing cholera. This ecological diversity is borne out in the genetic diversity seen in sequenced genomes, namely a high nucleotide diversity at silent sites (p s ¼ 0.110) (Sung et al. 2016). This nucleotide diversity implies that the effective population size (N E ) of this species is %4.78Â10 8 individuals.
Vibrio fischeri, in apparent contrast, is well known as a beneficial symbiont of sepiolid squid and monocentrid fishes, in which it produces light that enables the host to evade predators by counter-illumination (Soto et al. 2014). The hosts it colonizes are distributed in different oceans around the world, but it can also be found as a free-living member of the ocean bacterioplankton (Soto et al. 2014). Here also the ecological diversity is borne out by high genetic diversity at a number of loci, with estimated p s ¼ 0.067 and an inferred N E of %1.62Â10 8 (Wollenberg and Ruby 2012   MBE and for the Vibrionacae more generally, what forces lead to such high genetic diversity within species? Both mutation and horizontal-gene transfer (HGT) are expected to contribute to Vibrio biodiversity, but evidence exists that mutation is the primary force driving diversification within Vibrio clades (Thompson et al. 2004;Sawabe et al. 2009;Vos and Didelot 2009). It has therefore been tempting to invoke high mutation rates in Vibrio species to explain their high genetic diversity. However, we show here that both bpsm and indel rates in V. fischeri and V. cholerae are low, even for bacteria (supplementary fig. S2, Supplementary Material online). In fact, V. cholerae has one of the lowest recorded genome-wide rates of bpsms (0.0004/genome/generation) and indels (0.00007/genome/generation) of any bacterial species (Sung, Ackerman, et al. 2012;Sung et al. 2016). Although a number of ecological and population genetic factors, like the extent of host-colonization in small, fragmented populations, may increase biodiversity despite low base-line mutation rates, it is nearly impossible to quantify their impact on an entire species complex. This is largely because both V. cholerae and V. fischeri both include many host-associated and nonhost-associated strains, and the relative contribution of these strains to the global diversity of these species is largely unknown. Instead, we suggest that the high genetic diversity and low mutation rates in these Vibrio species can be reconciled by the drift-barrier hypothesis, which states generally that any trait, including replication fidelity, may be refined by natural selection only to the point at which further improvement becomes overwhelmed by the power of genetic drift (Lynch 2010(Lynch , 2011Sung, Ackerman, et al. 2012;Sung et al. 2016). Natural selection is most powerful in large populations of organisms with genomes composed of a high amount of coding sequence. Although Vibrio genomes are not exceptionally large, most sites are coding and their effective population sizes are among the highest recorded (Wollenberg and Ruby 2012;Sung et al. 2016). Thus, both high amounts of coding sequence and high effective population size increase the ability of natural selection to reduce both bpsm and indel rates (Lynch 2010(Lynch , 2011Sung, Ackerman, et al. 2012), yet yield enormous allelic diversity at any given time in both of these Vibrio species (Thompson et al. 2004;Vos and Didelot 2009). If the extremely low mutation rates of these Vibrio species are a product of powerful selection enabled by massive populations to refine the machinery of DNA replication, it follows that many other heritable traits may have also been optimized in these organisms. One of the best known Vibrio traits is their extremely rapid growth rates, with several species achieving doubling times of 10-15 min (Brenner et al. 2005). Growth rate is a broad polygenic character requiring the coordination of essentially all systems, including ribosome synthesis, catabolic and anabolic pathways, and of course, replication. Vibrio species are also typified by having genomes with two chromosomes, which provide two origins of replication that could conceivably accelerate DNA replication. This genome architecture has clearly been refined by selection for rapid growth: not only does the experimental reduction of the V. cholerae genome from two chromosomes to one decrease growth rates by 25-40% (Val et al. 2012), the early-replicating chr1 is enriched relative to chr2 in genes required for rapid growth (Cooper et al. 2010). Other Vibrio traits (metabolism, regulatory capacity in fluctuating environments, stress tolerance) in addition to their high-fidelity replication and rapid growth may also conceivably have been exceptionally refined by selection. Put more broadly, the driftbarrier hypothesis may be extended and applied to many traits, rendering the prediction that DNA replication fidelity and the optimality of many other phenotypes may be positively correlated-the lower the mutation rate, the more selectively optimized the polygenic trait.

Mutation Rate and Spectra Variation among Genome Regions
This hypothesis that Vibrio replication has been exquisitely optimized relative to other bacterial genomes because of their high coding content and high effective population sizes does not mean that all regions of Vibrio genomes have been equivalently refined relative to one another. Both bpsm and indel rates and spectra varied among genomic regions of V. fischeri and V. cholerae, which may systematically affect genome composition. In the Vf-wt lines, both G:C > A:T transitions and G:C > T:A transversions occurred at higher rates on chr2, although only the rate of G:C > T:A transversions was significantly higher ( fig. 1C). Using experiment-wide estimates of each conditional bpsm rate in V. fischeri, we estimate that the %GC-content should be 0.184 in the absence of selection and recombination. Yet, the %GC-content of chr2 is expected to be 0.156 at mutation-drift equilibrium, 0.047 lower than expectations for chr1 (0.203). The actual %GC content of chr2 of the V. fischeri ES114 genome is also lower than chr1 (chr1: 0.390; chr2: 0.370) suggesting that bpsm biases on these chromosomes have contributed to this pattern. Even stronger biases differentiating the chromosomes are seen in the Vcwt lines, driven by significantly higher A:T > G:C transition rates on chr1 and by nonsignificant increases in G:C > A:T and G:C > T:A rates on chr2 ( fig. 1C). These spectra predict %GC-contents of 0.293 for chr1 and 0.201 for chr2 at mutation-drift equilibrium and likely contribute to the lower realized %GC content of chr2 in V. cholerae 2740-80 (chr1: 0.479; chr2: 0.468). Overall, these findings suggest that bpsm pressures contribute to genome-wide and intra-genome variation in %GC contents, but indel biases (Dillon et al. 2015), selection (Hershberg and Petrov 2010;Hildebrand et al. 2010), and/or biased gene conversion (Hershberg and Petrov 2010;Lassalle et al. 2015) must also contribute to the realized %GC contents in V. fischeri and V. cholerae.
Prior MA studies have reported that the majority of indels involve the loss or gain of a single nucleotide and occur predominantly in simple-sequence repeats (SSRs), where the number of repeated units scales positively with the indel rate (Xu et al. 2000;Lee et al. 2012;Long et al. 2014;Dettman et al. 2016 MBE of genes related to host colonization and disease (Moxon et al. 1994(Moxon et al. , 2006Field et al. 1999), begging the question of whether these mutation-prone sequences have indirectly evolved to enable this plasticity.
In V. fischeri, the insertion-deletion rates in SSRs correlated positively with both the number of repeated units and the length of the repeated unit ( fig. 2). While the correlation between indel rates of SSRs and the number of repeated units has been reported previously (Xu et al. 2000;Lee et al. 2012;Long et al. 2014;Dettman et al. 2016;Schroeder et al. 2016), we are not aware of any previous MA-WGS experiments that have reported a positive correlation between SSR indel rates and the length of each repeated unit in the SSR. One possible reason for this discrepancy might be an increased occurrence of SSRs with longer repeats in the V. fischeri ES114 genome (Ruby et al. 2005), which we find to be highly mutagenic ( fig.  3). There are 100 SSRs of three or more units in the V. fischeri ES114 genome where the repeated unit is at least 4 bp in length. A second possibility is that larger indels have gone undetected by prior MA-WGS analyses focused on MMRdeficient strains, in which single-nucleotide indels are evidently more common (supplementary dataset S2, Supplementary Material online). Further, longer indels, especially those in SSRs, are subject to increased false-negative rates due to limitations in the ability of short-read sequencing to resolve them. The majority of multi-nucleotide indels that we identified were supported with very low consensus in the initial alignments because of reads that only partly covered the SSR. Only when we filtered out reads that were not anchored by bp on both sides of the SSR did we achieve high consensus for these indels (supplementary dataset S2, Supplementary Material online). It will be interesting to apply these sensitive detection methods for long SSR-associated indels to future experiments to see whether other species also experience elevated indel rates in SSRs with longer repeat units. We emphasize that this experiment also demonstrates that the loss of MMR shifts the spectrum of indel mutations from multi-nucleotide indels in SSRs with longer repeated units towards single nucleotides in homopolymeric runs, a shift with potentially broad phenotypic consequences.
Our study adds to the theory that SSRs may generate localized hyper-mutation in coding regions that may serve as "contingency loci", enabling potentially beneficial population-level variation in both function and expression of the affected genes (van Belkum et al. 1998;Moxon et al. 2006). These experiments, particularly those conducted in a DmutS background, identified clusters of frameshift mutations found within genes that could yield adaptive mutant phenotypes in Vibrio populations (table 2). Some in fact have previously been reported to be hypermutable in experimental studies of bacterial function. In V. cholerae, tcpH is an important co-regulator that links temperature and pH to control expression of the major virulence regulon (Carroll et al. 1997). During experimental infections of the rabbit model, tcpH frameshift mutations occurred at a frequency of 10 À 4 owing to mutations in the same poly-G tract that accumulated in our experiment and they enhanced fitness in culture, which the authors speculated could reflect a benefit during dispersal from the host into the environment (Carroll et al. 1997). Another locus, mshQ, encodes a pilus-associated adhesion protein, in which mutants have reported to contribute to the rapid transition between opaque and translucent colonies in Vibrio species (Enos-Berlage et al. 2005). These colony types associate with different dynamics of biofilm formation in the laboratory and may reflect different capacity for attachment to natural substrates or other cells, facilitating biofilm differentiation (Yildiz and Visick 2009). Yet another mutagenic homopolymeric tract potentially generating population heterogeneity in biofilm production was found in vpsT in V. fisheri, a transcriptional activator that regulates polysaccharide production (Yildiz and Visick 2009). Other hypermutable mononucleotide tracts were observed in 14 other loci in the MBE V. cholerae and V. fischeri DmutS lines and were enriched in genes responsible for biofilm production, including putative diguanylate cyclases that affect the regulation of biofilm traits via the messenger molecule cyclic diguanylate (Yildiz and Visick 2009). Taken together, this study shows that despite inherently very low mutation rates, Vibrio populations likely harbor genetic variation at SSR's that affect traits in which plasticity may be beneficial-such as virulence or adherence mechanisms-and this is especially the case for populations with defective mismatch repair.

The Role of Mismatch Repair in Genome Evolution
The mismatch repair pathway is a primary DNA repair pathway in organisms across the tree of life (Kunkel and Erie 2005), but strains lacking MMR are common in nature (Hazen et al. 2009), chronic infections (Hall and Henderson-Begg 2006;Mena et al. 2008;Oliver 2010;Marvig et al. 2013), and longterm evolution experiments (Sniegowski et al. 1997). Loss of a functional MMR system can elevate mutation rates anywhere from 5-to 1000-fold, depending on both the defective component of the pathway and the genetic background (Lyer et al. 2006;Long et al. 2015;Reyes et al. 2015). The primary proteins involved in the MMR pathway in Vibrio include the MutS protein, which binds mismatches to initiate repair, the MutL protein, which coordinates multiple steps of MMR synthesis, and the MutH protein, which nicks the unmethylated strand to remove the replication error (Kunkel and Erie 2005). Importantly, because Vibrio species lack a functional VSP repair pathway (Bhakat et al. 1999;Banerjee and Chowdhury 2006), which also utilizes MutS and MutL (Bhagwat and Lieb 2002), we expect few pleiotropic consequences of our mutS knockout. The removal of the mutS gene in this study resulted in a 317-fold increase in the bpsm rate and a 102-fold increase in the indel rate in V. fischeri ES114. The removal of the mutS gene in V. cholerae had a less dramatic effect on the bpsm rate (85-fold increase), but a more dramatic effect on the indel rate (142-fold increase). Overall, this suggests that MMR is more central to the repair of bpsms in V. fischeri, whereas it is more important for the repair of indels in V. cholerae.
Despite the relatively wide range in the consequences of losing a functional MMR system for bpsm rates, the resulting bpsm spectra were remarkably similar in V. fischeri and V. cholerae. Specifically, unlike the wild-type experiments, the bpsm spectra of the Vf-mut and Vc-mut MA experiments were not significantly different, as both were dominated by G:C > A:T and A:T > G:C transitions. Furthermore, the vast majority of indels in the both the Vf-mut and Vc-mut experiments involve only a single nucleotide, most of which occurred in homopolymers, where their rates scaled positively with the length of the homopolymer (fig. 5). These observations are consistent with previous reports in other bacterial MA experiments using MMR-deficient strains (Lee et al. 2012;Long et al. 2014;Sung et al. 2015;Dettman et al. 2016), but reveal distinct mutation biases from genotypes with functional MMR. In fact, the strong site-specific biases in the mutation spectra generated by the loss of MMR (table 2) may help to explain the prevalence of mutator alleles in environmental and clinical settings, despite the inevitable fitness costs of the mutational load. Couce et al. have found that mutator alleles can modify the distribution of fitness effects of individual beneficial mutations by enriching a specific spectrum of spontaneous mutations, and impact the evolutionary trajectories of different strains (Couce et al. 2013(Couce et al. , 2015, although the generality of how mutator genotypes influence adaptive dynamics requires further study. The loss of MMR also helps reveal mutation biases associated with the replicative polymerase that cannot be observed using the low number of mutations generated in wild-type MA-WGS experiments (Lee et al. 2012;Sung et al. 2015;Dettman et al. 2016). A mirrored wave-like pattern of bpsm rates on opposing replichores has now been observed in multiple MMR-deficient species studied by MA-WGS, although the exact shape and strength of the pattern varies between species (Foster et al. 2013;Long et al. 2014;Dettman et al. 2016). We find the same mirrored wave-like pattern of bpsm rates on the opposing replichores of chr1 in MMR-deficient V. fischeri and V. cholerae ( fig. 6A and B) and show that bpsm rates of concurrently replicating regions of this chromosome are significantly correlated in both genomes. This suggests that bpsm rates are impacted by genome location and that regions replicated at similar times on opposing replichores experience similar bpsm rates, at least in MMR-deficient strains. However, we do not observe any significant variation in the bpsm rates on chr2 in the MMR-deficient experiments or on either chromosome in the wild-type experiments ( fig.  6A and B). We suggest that bpsm rates are less variable on chr2 because of their delayed replication. Specifically, chr2 replication is not initiated until a large portion of chr1 has already been replicated (Egan and Waldor 2003;Duigou et al. 2006;Rasmussen et al. 2007), which means that chr2 is not replicated during the primary peaks in the bpsm rate on the opposing replichores of chr1, and thus experience more consistent bpsm rates across the chromosome. A power analysis also reveals that the lack of significant variation in the bpsm rates in the wild-type lineages should not be taken as evidence that this wave-like pattern is characteristic of only MMR-deficient strains, as we need considerably more bpsms than we collected in the wild-type experiments to observe effects of this size. Rather, future MA experiments designed to test for an effect of genome location on bpsm rates should seek to accumulate more bpsms, either by increasing the length of the MA study or increasing the number of lineages. In sum, these patterns of variation in mutation rates among genome regions suggest that it is conceivable that selection may act to position some genes to avoid regions of higher mutation pressure, especially in populations of the size and diversity of Vibrio species.

Concluding Remarks
Mutation-accumulation experiments paired with wholegenome sequencing enable an unprecedented view of genome-wide mutation rates and spectra and reveal the underlying biases of spontaneous mutation. These underlying biases can explain why some genome regions evolve more rapidly than others, why the coding content of different Mutation Biases in Vibrio Species . doi:10.1093/molbev/msw224 MBE genome regions varies, and perhaps how clonal populations may generate adaptively diverse progeny. The primary properties of V. fischeri and V. cholerae spontaneous mutation that we have identified in this MA-WGS study are: (1) basesubstitution and insertion-deletion mutation rates are low, even relative to other bacterial species; (2) base-substitution mutation biases vary between chromosomes, but are consistently lower than their realized %GC contents; (3) both the length of repeat units and the number of repeated units in simple-sequence repeats correlate positively with the insertion-deletion rate of the SSR; (4) loss of a proficient mismatch repair system has inconsistent effects on basesubstitution and indel mutation rates in different taxa, but consistently generates convergent mutation spectra that are dominated by transitions and short indels; and (5) basesubstitution mutations in strains deficient in mismatch repair vary in a mirrored wave-like pattern on opposing replichores on chromosome 1, but variation is limited on chromosome 2. As we uncover properties of spontaneous mutation in diverse microbes, we can continue to assess the generality of mutational biases and more accurately evaluate the role of mutation bias in molecular evolution.

Bacterial Strains and Culture Conditions
The two wild-type MA experiments were founded from a single clone derived from V. fischeri ES114 and V. cholerae 2740-80, respectively. All MA experiments with V. fischeri were carried out on tryptic soy agar plates supplemented with NaCl (TSAN) (30 g/l tryptic soy broth powder, 20 g/l NaCl, 15 g/l agar) and were incubated at 28 . Frozen stocks of each MA lineage were prepared at the end of the experiment by growing a single colony overnight in 5 ml of tryptic soy broth supplemented with NaCl (TSBN) (30 g/l tryptic soy broth powder, 20 g/l NaCl) at 28 and freezing in 8% DMSO at À80 . For V. cholerae, all MA experiments were carried out on tryptic soy agar plates (TSA) (30 g/l tryptic soy broth powder, 15 g/l agar) and were incubated at 37 . Similarly, frozen stocks were prepared by growing a single colony from each lineage overnight in 5 ml of tryptic soy broth (TSB) (30 g/l tryptic soy broth powder) at 37 and were stored in 8% DMSO at À80 .
Mutator strains of V. fischeri ES114 and V. cholerae 2740-80 were generated by replacing the mutS gene in each genome with an erythromycin resistance cassette, as described previously (Datsenko and Wanner 2000;Heckman and Pease 2007;Val et al. 2012). Briefly, we used splicing by overlap extension (PCR-SOE) to generate two erythromycin resistance cassettes, one of which was flanked by %750 bp of the upstream and downstream regions of the mutS gene in V. fischeri ES114, whereas the second was flanked by %750 bp of the upstream and downstream regions of the mutS gene in V. cholerae 2740-80 (Heckman and Pease 2007). Both the V. fischeri ES114 and V. cholerae 2740-80 DmutS fragments were then cloned into the R6K c-ori-based suicide vector pSW7848, which contains a ccdB toxin gene that is arabinoseinducible and glucose-repressible (P BAD ) (Val et al. 2012).
Both of these pSW7848 plasmids, henceforth referred to as pSW7848-VfDmutS and pSW7848-VcDmutS, were transformed into Escherichia coli pi3813 chemically competent cells and stored at À80 (Datsenko and Wanner 2000).
Conjugal transfer of the pSW7848-VfDmutS and pSW7848-VcDmutS plasmids was performed using a triparental mating with the E. coli pi3813 cells as the donors (Val et al. 2012), E. coli DH5a-pEVS104 as the helper (Stabb and Ruby 2002), and V. fischeri ES114 and V. cholerae 2740-80 as the respective recipients. For V. fischeri ES114, the chromosomally inserted pSW7848-VfDmutS plasmid resulting from a crossover at the DmutS gene was selected on LBS plates (Graf et al. 1994) containing 1% glucose and 1ug/ml chloramphenicol at 28 . Selection for loss of the plasmid backbone from a second recombination step was then performed on LBS plates containing 0.2% arabinose at 28 , which induces the P BAD promoter of the ccdB gene and ensures that all cells that have not lost the integrated plasmid will die (Val et al. 2012). For V. cholerae, the chromosomally inserted pSW7848-VcDmutS plasmid was selected on LB plates (Sambrook et al. 1989) containing 1% glucose and 5ug/ml chloramphenicol at 30 . Selection for loss of the plasmid backbone was performed on LB plates with 0.2% arabinose at 30 . Replacement of the mutS gene in V. fischeri ES114 and V. cholerae 2740-80 were verified by conventional sequencing, and V. fischeri ES114 DmutS and V. cholerae 2740-80 DmutS were used to found the two mutator MA experiments, under identical conditions to those described above for the wildtype experiments.

Ancestral Reference Genomes
Prior to this study, the genome of V. fischeri ES114 was already in completed form and annotated, consisting of three contigs representing chr1, chr2, and the 45.85-kb plasmid (Ruby et al. 2005). Further, the location of the oriC on both chromosomes was available in dOriC 5.0, a database for the predicted oriC regions in bacterial and archaeal genomes (Gao et al. 2013). Fortunately, the oriC region on both chromosomes had been placed at coordinate zero, allowing us to proceed with this V. fischeri ES114 reference genome for all subsequent V. fischeri analyses. In contrast, when we initiated our MA experiment the V. cholerae 2740-80 genome was still in draft form, consisting of 257 scaffolds with unknown chromosome association. Therefore, to reveal inter-chromosomal variation and assess the effects of genome location on bpsm and indel rates, we used single molecule, real-time (SMRT) sequencing to generate a complete assembly separated into the two contigs of V. cholerae 2740-80.
The Pacific Biosciences RSII sequencer facilitates the completion of microbial genomes by producing reads of multiple kilobases (kb) that extend across repetitive regions and allow whole-genomes to be assembled at a relatively limited cost (Koren and Phillippy 2015). Genomic DNA (gDNA) was prepared using the Qiagen Genomic-Tip Kit (20/G) from overnight cultures of V. cholerae 2740-80 grown in LB at 37 using manufacturer's instructions. Importantly, this kit uses gravity filtration to purify gDNA, which limits shearing and increases the average fragment size of the resulting gDNA sample. Long Dillon et al. . doi:10.1093/molbev/msw224 MBE insert library preparation and SMRT sequencing was performed on this V. cholerae 2740-80 gDNA at the Icahn School of Medicine at Mount Sinai according to the manufacturer's instructions, as described previously (Beaulaurier et al. 2015). Briefly, libraries were size selected using Sage Science Blue Pippin 0.75% agarose cassettes to enrich for long-reads, and were assessed for quantity and insert size using an Agilent DNA 12,000 gel chip. Primers, polymerases, and magnetic beads were loaded to generate a completed SMRTbell library, which was run in a single SMRT cell of a Pacific Biosciences RSII sequencer at a concentration of 75 pM for 180 min.
As expected, our long insert SMRT sequencing library generated mostly long reads, with an average sub-read length of 8,401 bp and an N50 of 11,480 bp. We used the hierarchical genome-assembly process workflow (HGAP3) to generate a completed assembly of V. cholerae 2740-80 and polished our assembly using the Quiver algorithm (Chin et al. 2013). The resultant assembly consisted of two contigs representing chr1 and chr2, with an average coverage of 128x. We annotated this assembly using prokka (v1.11), specifying Vibrio as the genus (Seemann 2014). We then identified the location of the oriC on both contigs using Ori-finder, which applies analogous methods to those used by dOriC 5.0 to identify oriC regions in bacterial genomes (Gao and Zhang 2008;Gao et al. 2013). Of course, these oriC regions were not located at coordinate zero of the V. cholerae 2740-80 reference genome, so we re-formatted the reference genome to place each oriC region at the beginning of the chr1 and chr2 contigs, then stitched the contigs back together and re-polished the genome using Quiver. Prokka was then run a second time to update the location of all genes, and this re-formatted V. cholerae 2740-80 genome was used as the ancestral reference genome for all subsequent V. cholerae analyses.

MA-WGS Process
For the two wild-type MA experiments, 75 independent lineages were founded by single cells derived from a single colony of V. fischeri ES114 and V. cholerae 2740-80, respectively. Each of these lineages was then independently propagated every 24 h onto fresh TSAN for V. fischeri and fresh TSA for V. cholerae. This cycle was then repeated for a total of 217 days. For the two mutator MA experiments, 48 independent lineages were founded and propagated as described above from a single colony each of V. fischeri ES114 DmutS and V. cholerae 2740-80 DmutS, respectively. However, because of their higher mutation rates, these lineages were only propagated for a total of 43 days. At the conclusion of the four MA experiments, each lineage was grown overnight in the appropriate liquid broth at the appropriate temperature (see above), and stored at À80 in 8% DMSO.
Daily generations were estimated monthly for the wildtype lineages and bi-monthly for the mutator lineages by calculating the number of viable cells in a representative colony from 10 lineages per MA experiment following 24 h of growth. During each measurement, the representative colonies were placed in 2 ml of phosphate buffer saline (80 g/l NaCl, 2 g/l KCl, 14.4 g/l Na 2 HPO 4 • 2H 2 O, 2.4 g//l KH 2 PO 4 ), serially diluted, and spread plated on TSAN or TSA for V. fischeri and V. cholerae, respectively. These plates were then incubated for 24 h at 28 or 37 , and the daily generations per colony were calculated from the number of viable cells in each representative colony. The average daily generations were then calculated for each time-point using the ten representative colonies, and the total generations elapsed between each measurement were calculated as the product of the average daily generations and the number of days before the next measurement. The total of number of generations elapsed during the duration of the MA experiment per lineage was then calculated as the sum of these totals over the course of each MA study (supplementary fig. S1, Supplementary Material online).
At the conclusion of each of the four MA experiments, gDNA was extracted using the Wizard Genomic DNA Purification Kit (Promega) from 1 ml of overnight culture (TSBN at 28 for V. fischeri; TSB at 37 for V. cholerae) inoculated from 50 representative stored lineages for Vf-wt and Vc-wt experiments, and all 48 stored lineages for the Vf-mut and Vc-mut experiments. For the wild-type MA experiments, gDNA from the ancestral V. fischeri ES114 and V. cholerae 2740-80 strains was also extracted. All libraries were prepared using a modified Illumina Nextera protocol designed for inexpensive library preparation of microbial genomes (Baym et al. 2015). Sequencing of the Vf-wt and Vc-wt lineages and their respective ancestors was performed using the 101-bp paired-end Illumina HiSeq platform at the Beijing Genome Institute (BGI), whereas sequencing of the Vf-mut and Vc-mut lineages was performed using the 151-bp pairedend Illumina HISeq platform at the University of New Hampshire Hubbard Center for Genomic Studies.
Our raw fastQ reads were analyzed using fastQC, and revealed that 48 Vf-wt lineages, 49 Vc-wt lineages, 19 Vf-mut lineages, and 22 Vc-mut lineages were sequenced at sufficient depth to accurately identify bpsm and indel mutations. Our failure to successfully sequence a high proportion of Vf-mut and Vc-mut lineages was mostly generated by a poorly normalized library, leading to limited sequence data for several of the mutator lineages. For the successfully sequenced lineages, all reads were mapped to their respective reference genomes with both the Burrows-Wheeler Aligner (BWA) ) and Novoalign (www.novocraft.com). The average depth of coverage across the successfully sequenced lineages of each MA experiment was 100Â for Vf-wt, 96Â for Vc-wt, 124Â for Vf-mut, and 92Â for Vc-mut.

Base-Substitution Mutation Identification
For all four MA experiments, bpsms were identified as described previously in an MA experiment with Burkholderia cenocepacia (Dillon et al. 2015). Briefly, we used SAMtools to convert the SAM alignment files from each lineage to mpileup format ), then in-house perl scripts to produce the forward and reverse read alignments for each position in each line. A three-step process was then used to detect putative bpsms. First, pooled reads across all lines were used to generate an ancestral consensus base at each site in the reference genome. This allows us to correct for any Mutation Biases in Vibrio Species . doi:10.1093/molbev/msw224 MBE differences that may exist between the reference genomes and the ancestral colony of each our MA experiments. Second, a lineage specific consensus base was generated at each site in the reference genome for each individual MA lineage using only the reads from that line. Here, a lineage specific consensus base was only called if the site was covered by at least two forward and two reverse reads and at least 80% of the reads identified the same base. Otherwise, the site was not analyzed. Third, each lineage specific consensus base that was called was compared with the overall ancestral consensus of the MA experiment and a putative bpsm was identified if they differed. This analysis was carried out independently with the alignments generated by BWA and Novoalign, and putative bpsms were considered genuine only if both pipelines independently identified the bpsm and they were only identified in a single lineage.
Using relatively lenient criteria for identifying lineage specific consensus bases, we were able to analyze the majority of the genome in all of our lineages, but increase our risk of falsely identifying bpsms at low coverage sites. Therefore, we generated a supplementary dataset for all genuine bpsms identified in this study, which includes the read coverage and consensus at each site where a bpsm was identified (supple mentary dataset S1, Supplementary Material online). We do not see clusters of bpsms at the lower limits of our coverage or consensus requirements. In fact, the vast majority of bpsms in all of four MA experiments were covered by more than 50 reads and were supported by > 95% of the reads that covered the site. Furthermore, we verified that none of the bpsms that we identified were present in the ancestral V. fischeri ES114 and V. cholerae 2740-80 strains that we sequenced, so we are confident that nearly all of the bpsms identified in this study were spontaneous bpsms that arose during the mutation accumulation experiments.

Insertion-Deletion Mutation Identification
All indels identified in this study were also identified using similar requirements to those described in our MA experiment with B. cenocepacia (Dillon et al. 2015), with a few modifications. First, we extracted all putative indels from the BWA and Novoalign alignments for each lineage under the requirements that the indel was covered by at least two forward and two reverse reads, and 30% of those reads identified the exact same indel (size and motif). All putative indels that were independently identified by both BWA and Novoalign, where 80% of the reads identified the exact same indel were considered genuine. Next, we noticed that nearly all putative indels that had been identified with <80% consensus were in SSRs. Therefore, for indels where only 30-80% of the reads identified the exact same indel, we parsed out only reads that had bases on both the upstream and downstream region of the SSR (if the indel was in an SSR), and on both the upstream and downstream region of the indel itself (if the indel was not in an SSR). Using only this subset of reads, we reassessed the number of reads that identified the exact same indel (size and motif), and considered these initially low confidence putative indels genuine if > 80% of these sub-reads identified the exact same indel. In addition, we passaged our alignment output through the patterngrowth algorithm PINDEL to identify any large-scale genuine indels using paired-end information (Ye et al. 2009). Here, we required a total of 20 reads, with at least six forward and six reverse reads, and 80% of the reads to identify the exact same indel for the indel to be considered genuine. This summative collection of genuine indels was then compared with the analysis of the ancestral V. fischeri ES114 and V. cholerae 2740-80 strains, and any indels that were identified in their corresponding ancestor, or > 50% of the analyzed lineages from the corresponding MA experiment were excluded from subsequent analyses.
Our initial filters for indels were even more lenient than those for bpsms, which could have led to false positive indel identification in the putative indel phase. However, we subsequently required at least 80% consensus for all genuine indels identified in this study among reads that had bases covering both the upstream and downstream regions of putative indels that were not in an SSR. For genuine indels that were in an SSR, we required at least 80% consensus among reads that had bases covering both the upstream and downstream regions of the SSR. Further, we verified that all indels identified were not present in our ancestral V. fischeri ES114 or V. cholerae 2740-80 strains and were not identified in > 50% of the other lineages analyzed in the same MA experiment. As with our bpsms, we generated a supplemen tary dataset containing all genuine indels analyzed in this study, which includes read coverage and consensus of each site where an indel was identified, as well as the read coverage and consensus among reads with bases covering both the upstream and downstream regions of the indel or SSR if the initial consensus among reads covering the indel was <80% (supplementary dataset S2, Supplementary Material online). Thus, we are confident that nearly all indels identified in this study were genuine spontaneous indels that arose during the mutation accumulation process.

Mutation-Rate Analyses
Overall bpsm and indel rates were calculated for each lineage using the equation: l ¼ m=nT, where l represents the mutation rate, m represents the number of mutations observed, n represents the number of ancestral sites analyzed, and T represents the total number of generations elapsed per lineage. Conditional bpsm rates for each lineage were calculated using the same equation, but with m representing the number of bpsms of the focal bpsm type, and n representing the number analyzed ancestral sites that could generate the focal bpsm type. All summative bpsm and indel rates presented for each MA experiment were calculated as the average mutation rate across all analyzed lineages, whereas summative SEs were calculated as the SD of the mutation rate across all lines (s), divided by the square root of the total number of lines in the corresponding MA experiment (N): SE pooled ¼ s= ffiffiffi ffi N p . For our analysis of bpsm and indel rates within chromosomes, we divided each chromosome into 100-kb intervals, starting at the origin of replication and extending bidirectionally to the replication terminus. Bpsm rates in each interval were measured by dividing the total number of bpsm Dillon et al. . doi:10.1093/molbev/msw224 MBE or indel mutations from this study, by the product of the total number of sites analyzed in each interval across all lines and the number of generations per line, using the same formula described above for genome wide mutation rates: l ¼ m=nT. Because none of the chromosomes were exactly divisible by 100 kb, the final intervals on each replichore were always <100 kb, but their mutation rates were calibrated to the number of bases analyzed.
Potential effects of selection on observed mutations were assessed as follows. For each MA experiment, the expected ratio of coding to noncoding mutations was determined directly from each ancestral reference genome, and the expected ratio of synonymous to nonsynonymous bpsms was calculated from each ancestral reference genome, after accounting for codon usage and %GC content of synonymous and nonsynonymous sites.

Data Availability
Separate Bioprojects have been generated on NCBI for all sequencing data that pertains to these V. fischeri (PRJNA256340) and V. cholerae (PRJNA256339) MA-WGS projects. Under these Bioprojects, Illumina DNA sequences for all the V. fischeri and V. cholerae MA-WGS lines are available under the following Biosamples: Vf-wt ¼ SAMN05366916, Vc-wt ¼ SAMN05366910, Vf-mut ¼ SAMN05366917, Vc-mut ¼ SAMN05366911. The completed V. cholerae 2740-80 genome assembly generated in this study is also available under the Biosample SAMN05323685. All strains are available upon request.

Statistical Analyses
All statistical analyses were performed in R Studio Version 0.99.489 using the Stats analysis package (R Core Team 2016).

Supplementary Material
Supplementary figures S1 and S2, tables S1 and S2, and data set S1 and S2 are available at Molecular Biology and Evolution online.