A Near-Deterministic Mutational Hotspot in Pseudomonas fluorescens Is Constructed by Multiple Interacting Genomic Features

Abstract Mutation—whilst stochastic—is frequently biased toward certain loci. When combined with selection, this results in highly repeatable and predictable evolutionary outcomes. Immotile variants of the bacterium Pseudomonas fluorescens (SBW25) possess a “mutational hotspot” that facilitates repeated occurrences of an identical de novo single nucleotide polymorphism when re-evolving motility, where ≥95% independent lines fix the mutation ntrB A289C. Identifying hotspots of similar potency in other genes and genomic backgrounds would prove valuable for predictive evolutionary models but to do so we must understand the genomic features that enable such a hotspot to form. Here, we reveal that genomic location, local nucleotide sequence, gene strandedness, and presence of mismatch repair proteins operate in combination to facilitate the formation of this mutational hotspot. Our study therefore provides a framework for utilizing genomic features to predict and identify hotspot positions capable of enforcing near-deterministic evolution.

the rhaSR and prhaBAD construct by restriction-ligation cloning. This allowed the native operon structure, promoter, enhancer and terminator of ntrBC to be maintained to avoid alterations to native co-regulation and co-transcription of these genes. Inserts were generated for ntrBC, and ntrBC-sm which includes the DNA-hairpin abolishing ntrB synonymous mutations detailed previously (Horton et al. 2021). These plasmids were transformed into E.
Transposon insertions were selected for on LB supplemented with Gentamicin sulphate and Kanamycin sulphate. Chromosomal Transposon insertion downstream of glmS was confirmed by colony PCR.
Manipulation of ntrBC and ntrBC-sm gene strandedness was conducted by taking advantage of the reliable insertional orientation of miniTn7 in the SBW25 chromosome where the Tn7R site is always proximal to the 3' end of the glmS gene (Choi and Schweizer 2006;Liu et al. 2014). Swapping the positions of the two restriction sites used for cloning the ntrBC insert into the miniTn7 transposon plasmid allowed the ntrBC inserts to be situated on the opposite strand in relation to Tn7R, and therefore in relation to glmS. This allowed reliable insertion of the ntrBC open reading frame (ORF) onto the opposite strand downstream glmS (Fig.1A). In this new orientation, the Tn7R site is situated upstream of ntrBC genes, so that they are on the same strand as glmS, which is the leading strand with respect to the origin of replication. Strand orientation of the ntrBC ORF sequence relative to the glmS ORF was confirmed by PCR ( Supplementary Fig.S2). Ancestral strains were checked for mutations introduced during construction by whole genome resequencing (detailed below). AR2 miniTn7[ntrBC-Lead] was found to have gained a SNP in the ntrC gene resulting in the amino acid change D468N. This change was outside of any functional protein motif and had no significant effect on motility phenotype for subsequent mutations ( Supplementary Fig.S3).

Motility evolution experiments
AR2 and AR2 miniTn7[ntrBC] variants were challenged to rescue motility in the absence of the FleQ master flagellar regulator on soft agar, as described previously (Taylor et al. 2015;Horton et al. 2021). Pure colonies were inoculated into 0.25% agar LB plates as described in Alsohim et al., 2014, and incubated at 27°C. At least 20 replicates were performed for each condition. Plates were checked a minimum of twice daily for motility, recording time to emergence. The leading edge of motile zones was sampled immediately after emergence by streaking onto LB agar. Single pure colony were re-streaked, and stored at -80°C as glycerol stocks of LB overnight cultures. All subsequent analysis was conducted on these pure motile isolates. Experiment was run for six weeks and any replicates without motility after this cutoff recorded as having not evolved.

Colony PCR and Sanger sequencing to identify mutations
Motility-conferring mutations were identified by colony PCR and subsequent Sanger sequencing (provided by Eurofins Genomics). PCR amplicons were purified using the Monarch® PCR & DNA Cleanup Kit (New England Biolabs). The genes ntrB, glnK, glnA, and PFLU1131 were screened, which were selected based on being mutational targets previously known to rescue motility in the AR2 background (Taylor et al. 2015;Horton et al. 2021;Shepherd et al. Unpublished Data). Returned sequences were aligned against the P. fluorescens SBW25 reference genome (Silby et al. 2009) using NCBI BLAST to identify mutations. All motile isolates were checked for presence or absence of ntrB mutation, and those without an ntrB mutation were checked for glnK, glnA and PFLU1131 mutation in that order until a mutation was identified.

Whole genome resequencing and calling of SNPs/Indels
Genomic DNA was extracted using the ThermoScientific GeneJET Genomic DNA Purification Kit from ancestral strains, and for motile strains for which no mutation could be identified by Sanger sequencing. Purified gDNA was quality checked using BR dsDNA Qubit spectrophotometry and nanodrop spectrophotometry. Illumina NextSeq 2000 sequencing was provided by the Microbial Genome Sequencing Center (MIGS, Pittsburgh USA), with a minimum 30x coverage. Returned paired-end reads were aligned to the P. fluorescens SBW25 reference genome (Silby et al. 2009) using the Galaxy platform (Afgan et al. 2018). Indels were identified using the integrated genomics viewer (Robinson et al. 2011). For SNP identification, the variant calling software SNIPPY was used with default parameters (Seemann, 2015).

Phenotypic assays and analysis
Motility phenotype of AR2 miniTn7[ntrB'C] strains measured as distance moved after 24 h of incubation in 0.25% agar LB plates. Six biological replicates were grown as separate overnight cultures. OD595 was adjusted so that soft agar plates were inoculated with 1μL of OD595 =1 suspension of cells in PBS. The surface of the agar was pierced with the pipette tip, and sample effused into the gap left by the tip. Photographs were taken of motile zones after 24hrs incubation. Surface area moved was calculated from the radius of the concentric motile zone measured from these images (A = π r²). Values were square root transformed before plotting.
Growth phenotype in shaking LB broth was measured by inoculating 99 μL of sterile LB broth with 1 μL of the OD595 =1 PBS cell suspensions for each replicate in a 96-well plate. Plates were incubated in a plate reader, recording OD595 every ten minutes. Area under the bacterial growth curve was calculated as a measure of fitness using the growthcurver package in R and plotted (Sprouffske and Wagner 2016).

RNA extraction, cDNA preparation and RT-qPCR
RNA was extracted from 20 OD units of P. fluorescens cultures in mid-log phase growth (OD595 ~1.5) in biological triplicates of each strain. At the desired OD, growth and RNA expression was halted by addition of a ½ culture volume of ice-cold killing buffer (20 mM NaN3, 20 mM Tris-HCl, 5 mM MgCl2). Cells were pelleted, and the killer buffer removed.
Pellets were resuspended in buffer RLT from the Qiagen RNeasy extraction kit with βmercaptoethanol added, and cells lysed by bead-milling at 4500 r.p.m. for 45 s with lysing matrix B. Lysates were added to Qiagen RNeasy extraction columns, and the extraction completed following the kit protocol. A DNase I treatment step (RNase-free DNase kit -Qiagen) was included between buffer RW1 washes, by adding DNase I directly to the column.
RNA was eluted in nuclease-free water, and subsequently treated with TURBO DNase from the Turbo DNA-free kit (Invitrogen) following kit protocol.
Purified RNA concentration was measured by Qubit RNA BR assay (ThermoScientific), RNA quality by nanodrop spectrophotometry, and RNA integrity and agarose gel electrophoresis.
Production of cDNA for subsequent qPCR was performed from extracted RNA using the Protoscript-II First strand cDNA synthesis kit (New England Biolabs) with random hexamer priming following kit protocols. RT-qPCR was used to measure gene expression of ntrB and ntrC by the comparative Ct (ΔΔCt) method with gyrB as an endogenous reference. Reaction plates were set up using SYBR green PCR master mix (applied biosystems), with cDNA template preps diluted to 10 -2 in nuclease-free water.

∆MutS strain assays
AR2∆mutS strains were assembled via two-step allelic exchange as outlined above. Primers used (Supplementary Table S3) produced a knockout construct that deleted nucleotides 760-2554 in the coding region of mutS. Three individual ∆mutS mutants were isolated and all three utilised at equal frequencies in subsequent experiments and data pooled for analysis. This mitigated the impact of any secondary mutations occurring after knockout construction during -80 banking and routine culturing. To test for mis-match repair deficiency, a fluctuation assay using rifampicin was performed to estimate relative mutation rates between AR2 mutS and ancestral AR2. This approach is outlined in (Vogwill et al. 2014), and guided by parameters outlined by (Foster 2006).
In brief, the pooled AR2 ∆mutS lines and three AR2 colonies were used to inoculate precultures in 10mL LB liquid broth until early-mid log phase (OD595 = 0.4). Precultures were harvested and corrected to OD595 = 1. In a 96-well plate, 1μL of cells was added to 99μL LB broth with ten biological replicates for both AR2ΔmutS and AR2. The assay was incubated for 23hrs free from antibiotic selection. The contents of each well were plated onto LB plates supplemented with 30μg/mL rifampicin, incubated, and the number of resistant colonies counted. Mutation rate was estimated from the frequency of spontaneous mutations in rpoB conferring rifampicin resistance, as a proxy for rates across the genome (Krašovec et al. 2019).

Statistical analyses
All statistical analysis and data handling was performed using R core statistical packages aside from the Dunn.test package. Shapiro-Wilks normality tests were performed to confirm nonnormality of datasets. To compare distributions across two groups, a Wilcoxon rank sum test with continuity correction was performed, with a P ≤ 0.05 taken to indicate significance. To compare group medians for more than two groups, a Kruskal-Wallis test with post-hoc Dunn test and Benjamini-Hochberg correction was performed, with a P ≤ 0.025 taken to indicate significance. To compare frequency counts containing low values, a Pearson's Chi-squared test was used with a Monte Carlo simulation (replicates = 20000) to generate p-values, with P ≤ 0.05 taken to indicate significance. To compare observed counts against those expected at given probabilities, a Bootstrap test was performed by drawing a subset of values at defined likelihoods 20 times for n iterations to form an expected distribution.

Data availability
The data underlying this article are available in the Open Science Framework, at (AR2 miniTn7[ntrBC-Lead] strain S-1) and the illicit NtrC D468N is compared with an AR2 miniTn7[ntrBC-sm-Lead] strain with the same NtrB D228G mutation but lacking NtrC D468N (AR2 miniTn7[ntrBC-sm-Lead] strain S-25). No significant difference was found between the two strains for motility in soft agar (A) and growth in shaking LB broth (B) assays (P= 0.06209 and P= 0.2777 respectively, Kruskal-Wallis tests).