Dissecting and tuning primer editing by proofreading polymerases

Abstract Proofreading polymerases have 3′ to 5′ exonuclease activity that allows the excision and correction of mis-incorporated bases during DNA replication. In a previous study, we demonstrated that in addition to correcting substitution errors and lowering the error rate of DNA amplification, proofreading polymerases can also edit PCR primers to match template sequences. Primer editing is a feature that can be advantageous in certain experimental contexts, such as amplicon-based microbiome profiling. Here we develop a set of synthetic DNA standards to report on primer editing activity and use these standards to dissect this phenomenon. The primer editing standards allow next-generation sequencing-based enzymological measurements, reveal the extent of editing, and allow the comparison of different polymerases and cycling conditions. We demonstrate that proofreading polymerases edit PCR primers in a concentration-dependent manner, and we examine whether primer editing exhibits any sequence specificity. In addition, we use these standards to show that primer editing is tunable through the incorporation of phosphorothioate linkages. Finally, we demonstrate the ability of primer editing to robustly rescue the drop-out of taxa with 16S rRNA gene-targeting primer mismatches using mock communities and human skin microbiome samples.


INTRODUCTION
Starting with the discovery of Taq polymerase (1) and the advent of the Polymerase Chain Reaction (PCR) (2,3), lowering the error rate and improving the efficiency of DNA amplification have been ongoing experimental goals (4).
These goals have been achieved through the discovery of additional naturally occurring polymerases with desirable properties, such as proofreading activity, as well as through engineering polymerases with desirable properties such as improved fidelity, specificity, efficiency, thermostability, inhibitor tolerance, reduced GC bias, and higher processivity (5).
Proofreading polymerases have 3 to 5 exonuclease activity, which allows the detection, excision, and replacement of mis-incorporated bases (6,7). Such proofreading activity serves to minimize the rate of PCR errors and modern engineered enzymes have reported substitution error rates as low as 10 -6 -10 -7 (8). We previously found that in addition to reducing the rate of amplification errors during PCR, proofreading polymerases are able to edit primer sequences to correct mismatches between the primer and template during amplification (9).
Primer editing activity can be used to prevent the dropout of taxa due to mismatches between the amplification primers and the DNA template in amplicon-based microbiome profiling. In such experiments, PCR primers are designed that target highly conserved regions of the 16S ribosomal RNA (rRNA) gene, the 18S rRNA gene, the Internal Transcribed Spacer (ITS) region, the mitochondrial Cytochrome Oxidase I (COI) gene, or other marker genes of interest. However, the binding sites for these 'universal' primers are not perfectly conserved and even with the incorporation of degenerate bases, mismatches between the amplification primer and bacterial template sequences particularly in the critical last 3-4 bases of the primer can dramatically reduce amplification efficiency or even cause taxa to be undetected (10,11). In silico analyses have found that for a set of eight commonly used 16S rRNA gene-targeting primers, rates of primers having a mismatch in the last four nucleotides were >10% in the majority of shotgun metagenomic datasets analyzed (12). In addition, the number of microbial taxa identified through shotgun metagenomic sequencing continue to grow (13,14), leading to the identifica-tion of new taxa with divergent 16S rRNA gene sequences (15). Thus, methods to improve the recovery of taxa in spite of mismatches in marker gene amplification primers will help improve the accuracy of amplicon-based microbiome profiling.
In this study, we develop a set of synthetic DNA standards to report on primer editing activity and use them to characterize this phenomenon in more detail. Using this next-generation sequencing-based enzymology approach, we find that many proofreading polymerases are able to mediate primer editing, though to different extents, that primer editing can be tuned by adjusting enzyme concentration or through the incorporation of phosphorothioate bonds, and we demonstrate that primer editing can robustly and quantitatively rescue the dropout of key taxa in 16S rRNA gene microbiome profiles from human skin microbiome samples.

Design, synthesis and cloning of primer editing standards
Primer editing standards were designed consisting of 332 bp of the Escherichia coli 16S rRNA gene spanning variable region 4, the V4 515F and V4 806R primer sites, and 20 bp of flanking sequence. Thirty-one different constructs were designed, with either the wild-type E. coli V4 515F primer binding site or each possible single base mismatch within the last 10 bp of the E. coli V4 515F primer binding site (Figure 1). In addition, the constructs contain an I-SceI site for plasmid linearization, as well as a REcount PCR-free quantification barcode (16) for accurately measuring the relative construct abundance within the standard pool. See Supplemental File 1 for primer editing construct designs. Constructs were synthesized as DNA tiles by SGI-DNA and assembled and inserted into a pUCGA 1.0 backbone using the BioXP instrument. Successful assembly was verified by gel electrophoresis and plasmids were transformed into E. coli 5-alpha cells (NEB). Plasmid DNA from multiple independent clones was isolated for each construct using a Zymo Zyppy 96-well plasmid prep kit. Clones were sequenced by Sanger sequencing using the following sequencing primers and mutation-free clones for each construct were selected for further experiments: pUCGA1.0-Sanger For: CGACTCTAGAGGATCGAG CACA pUCGA1.0-Sanger Rev: TTCGAGCTCGGTACCCGC AT Sequence-verified clones were consolidated onto a single 96-well plate, normalized to 10 ng/l with nuclease-free water, and pooled evenly (4 l of each construct). The primer editing standard pool was diluted to a working concentration of 1 million molecules/construct/l by diluting 5 l of the 10 ng/l standard pool in 452.95 l of nuclease-free water.

REcount-based quantification of primer editing standard pool
The following restriction digest was used to liberate the REcount PCR-free quantification barcodes: 17 l primer editing standard pool DNA (10 ng/l), 2 l NEB CutSmart buffer, 1 l MlyI (NEB). The digest was incubated at 37 • C for 1 h. 30 l of water was added to the digest (to bring the volume up to 50 l). Next, 30 l of AMPureXP (Beckman Coulter) beads (0.6×) were added, beads were collected on a magnetic stand and transferred supernatant to new tube (discarded beads). Purified the supernatant using 1.8x AMPureXP beads (Beckman Coulter) and eluted in 25 l of EB (Qiagen). The resulting sample was quantified using a Quant-iT PicoGreen dsDNA assay (Thermo Fisher Scientific), fragment sizes were assessed using an Agilent Bioanalyzer High Sensitivity assay and normalized to 2 nM for sequencing. The library was denatured with NaOH and prepared for sequencing according to the protocols described in the Illumina MiSeq Denature and Dilute Libraries Guide and sequenced in a portion of a MiSeq 600 cycle v3 run. Demultiplexed fastq files were generated using Illumina's bcl2fastq software. REcount data were analyzed as previously described (16) using custom Python scripts and BioPython (17). The first 20 bp of the sequencing reads was mapped against a barcode reference file (Supplemental File 2), with a maximum of two mismatches allowed, using an analysis script which is available on GitHub (https://github.com/darylgohl/REcount).
Phusion: 2.5 l DNA template, 3.7 l nuclease-free water, 2 l 5x Phusion buffer (NEB), 0.2 l 10 mM dNTPs (NEB), 0.5 l DMSO (Fisher Scientific), 0.1 l Phusion Polymerase (1× polymerase condition, NEB), 0.5 l forward primer (10 M, IDT), 0.5 l reverse primer (10 M, IDT). For the Phusion enzyme concentration tests the amount of enzyme added was adjusted appropriately (0.025 l was added for the 0.25× condition, 0.05 l was added for the 0.5× condition, and 0.2 l was added for the 2× condition) and the amount of nuclease-free water was adjusted to compensate for the missing or added volume. Vent: 2.5 l DNA template, 4.7 l nuclease-free water, 1 l 10x Vent buffer (NEB), 0.2 l 10 mM dNTPs (NEB), 0.5 l DMSO (Fisher Scientific), 0.1 l Vent Polymerase (1× polymerase condition, NEB), 0.5 l forward primer (10 M, IDT), 0.5 l reverse primer (10 M, IDT). For the Vent enzyme concentration tests the amount of enzyme added was adjusted appropriately (0.025 l was added for the 0.25× condition, 0.05 l was added for the 0.5× condition, and 0.2 l was added for the 2× condition) and the amount of nuclease-free water was adjusted to compensate for the missing or added volume.
Pfu 0.25× condition, 0.1 l was added for the 0.5× condition, and 0.4 l was added for the 2× condition) and the amount of nuclease-free water was adjusted to compensate for the missing or added volume.
Accuprime Taq: 2.5 l DNA template, 4.55 l nucleasefree water, 1 l 10× Accuprime buffer (Thermo Fisher Scientific), 0.2 l 10 mM dNTPs (NEB), 0.5 l DMSO (Fisher Scientific), 0.25 l Accuprime Polymerase (1× polymerase condition, Thermo Fisher Scientific), 0.5 l forward primer (10 M, IDT), 0.5 l reverse primer (10 M, IDT). For the Accuprime Taq enzyme concentration tests the amount of enzyme added was adjusted appropriately (0.0625 l was added for the 0.25× condition, 0.125 l was added for the 0.5× condition, and 0.2 l was added for the 2× condition) and the amount of nuclease-free water was adjusted to compensate for the missing or added volume.
NEB Taq For the NEB Taq enzyme concentration tests the amount of enzyme added was adjusted appropriately (0.0125 l was added for the 0.25× condition, 0.025 l was added for the 0.5× condition, and 0.1 l was added for the 2× condition) and the amount of nuclease-free water was adjusted to compensate for the missing or added volume.
Qiagen Taq: 2.5 l DNA template, 4.47 l nuclease-free water, 1 l 10× Taq buffer (Qiagen), 0.08 l 10 mM dNTPs (NEB), 0.5 l DMSO (Fisher Scientific), 0.4 l MgCl 2 , 0.05 l Taq Polymerase (1x polymerase condition, Qiagen), 0.5 l forward primer (10 M, IDT), 0.5 l reverse primer (10 M, IDT). For the Qiagen Taq enzyme concentration tests the amount of enzyme added was adjusted appropriately (0.0125 l was added for the 0.25× condition, 0.025 l was added for the 0.5× condition, and 0.1 l was added for the 2× condition) and the amount of nuclease-free water was adjusted to compensate for the missing or added volume.
The following PCR cycling conditions were used: KAPA HiFi: 95 • C for 5 min, followed by 30 cycles of 98 • C for 20 s, 55 • C for 15 s, 72 • C for 1 min, followed by 72 • C for 10 min.
Primary PCRs were then diluted 1:100 in sterile, nucleasefree water, and a second PCR reaction was set up to add the Illumina flow cell adapters and indices. All samples were indexed using KAPA HiFi polymerase using the following KAPA HiFi indexing PCR recipe: 5 l 1:100 DNA template, 5 l template DNA, 1 l nuclease-free water, 2 l 5× KAPA HiFi buffer (Kapa Biosystems), 0. The indexing PCR reactions were then purified and normalized using a SequalPrep normalization plate (Thermo Fisher Scientific), followed by elution in 20 l of elution buffer. An even volume of the normalized libraries was pooled and concentrated using 1× AmpureXP beads (Beckman Coulter). Pooled libraries were quantified using a Quant-iT PicoGreen dsDNA assay (Thermo Fisher Scientific), fragment sizes were assessed using an Agilent Bioanalyzer High Sensitivity assay, and libraries were normalized to 2 nM for sequencing. The library was denatured with NaOH and prepared for sequencing according to the protocols described in the Illumina MiSeq Denature and Dilute Libraries Guide and sequenced in a portion of a MiSeq 600 cycle v3 run.

Testing effects of annealing temperature
To test the effect of annealing temperature on primer editing, reactions were set up in triplicate for each of the eight temperature conditions tested with KAPA HiFi HotStart or Qiagen Taq at 1× enzyme concentration using the recipes above. The primer editing standard pool was used at a template concentration of 10,000 template molecules/construct/l (25,000 template molecules per construct in the amplification reaction). Reactions were run using the above cycling conditions for either KAPA HiFi or Qiagen Taq but using a gradient of annealing temperatures from 50 to 60 • C. The eight temperatures tested were 50, 50.7, 52.1, 54, 56.2, 58.1, 59.4 and 60 • C. The resulting amplicons were indexed using KAPA HiFi, normalized, quantified, and sequenced as described above.

Amplification of wild type E. coli template with variant primers
A set of Nextera-tailed primers containing all 31 variants corresponding to those in the primer editing standards were designed (Supplemental File 3). These forward primers were pooled evenly and used to amplify a plasmid containing a tagged E. coli 16S rRNA template with wild-type primer binding sites, together with the E coli V4 806R reverse primer with either KAPA HiFi polymerase or Qiagen Taq polymerase across a gradient of annealing temperatures using the PCR recipes and cycling conditions described above. The resulting amplicons were indexed using KAPA HiFi, normalized, quantified, and sequenced as described above.

Amplification with phosphorothioate-modified primers
To test the phosphorothioate primers, the primer editing standards were amplified at a template concentration of 10,000 template molecules/construct/l (25,000 template molecules per construct in the amplification reaction) with either KAPA HiFi, Q5, or Phusion polymerase, at the 1x enzyme concentration using the recipes and cycling conditions above. The following exonuclease-protected primers were used, together with the the E coli V4 806R reverse primer (16S-specific sequences in bold, '*' indicates the position of a phosphorothioate bond): The resulting amplicons were indexed using KAPA HiFi, normalized, quantified, and sequenced as described above.

Amplification of HM-276D mock community using V1V3, V4, V3V4, and V4V6 primers
The HM-276D mock community DNA was obtained through BEI Resources, NIAID, NIH: Genomic Mock Community B (HM-276D, Even, High Concentration, v5.1H). Reactions were set up in triplicate for each of the four primer sets with KAPA HiFi HotStart or Qiagen Taq at 1× enzyme concentration using the recipes and cycling conditions described above, with the following differences: For V1V3 amplification with KAPA HiFi and Qiagen Taq, a 5 minute 72 • C final extension step was used as opposed to 10 min. For V4 amplification with KAPA HiFi and Qiagen Taq, no 72 • C final extension step was used as opposed to 10 min.
The following primers were used (16S-specific sequences in bold): The resulting amplicons were indexed using KAPA HiFi, normalized, quantified, and sequenced as described above.

Collection and extraction of skin microbiome samples
Skin samples from different body sites (scalp, face, armpit, toe web, and forearm) were collected using either prototype or commercially released OMR-140 OMNIgene•SKIN collection devices, according to manufacturer's instructions. Samples were treated with 5 l of Proteinase K (80 mg/ml) and incubated at 50 • C in a water bath for 1h, then DNA was extracted either using the Qiagen PowerSoil Pro Kit or an optimized bead beating based low biomass extraction method co-developed by DNA Genotek and Diversigen. Briefly, the entire volume (1 ml) of an OMR-140 collected sample is transferred to a bead beating tube containing a mix of 0.5 and 0.1 mm glass beads and then homogenized for 10min. Following inhibitor removal, the DNA is purified on a silica column, washed and finally eluted in nuclease free water.

Preparing 16S rRNA gene variable region V4 or V1V3 sequencing libraries from skin microbiome samples
Skin sample 16S rRNA gene libraries were prepared and sequenced by Diversigen, Inc. Briefly, samples were quantified via qPCR using primers for variable region 4 of the 16S rRNA gene (V4 515F/V4 806R). Samples were amplified as previously described (9) using either KAPA HiFi polymerase or Qiagen Taq and primers for either the V4 (V4 515F/V4 806R) or V1V3 (V1 27F/V3 534R) variable regions. The resulting amplicons were indexed using KAPA HiFi polymerase, normalized, quantified, and sequenced as described above.

Data analysis
Primer editing analysis. Analysis of edits to primer sequences was carried out as previously described (9). Mismatches to the V4 primer sequences were identified using custom Python scripts and BioPython (17). Illumina adapters were trimmed using cutadapt (18) and paired reads were merged using PANDAseq (19). The first 19 bases (V4 F primer) or 20 bases (V4 R primer) were then compared to the reference sequences and mismatching bases were enumerated. In order to filter out noise from indels in the primer regions, a threshold of a maximum of three mismatches per read was used for this analysis. The proportion of mismatched bases observed at each position was divided by the proportion of expected variants in the primer editing standard plasmid pool as assessed by REcount measurements (16) to generate the Observed edits/Expected edits metric. Sequencing data were alternatively analyzed by counting the number of each base at each position in the primer sequence. The results of these analyses were consistent with the Observed edits/Expected edits metric reported in the paper. Scripts used to analyze the primer editing standards are available on GitHub (https://github.com/ darylgohl/PrimerEditing).
HM-276D mock community analysis. Fastq files were evenly subsampled down to a maximum of 50,000 reads per sample. Primer and adapter sequences were trimmed off the ends of reads using cutadapt (18). Forward and reverse reads were stitched using PANDAseq (19). The resulting reads were mapped to an HMP mock community reference file (20) using the BURST aligner (version: embalpha 0.99.3 mac) -see: https://dx.doi.org/10.1101/2020.09. 08.287128 (21).

Design and construction of primer editing standard pool
In order to characterize the phenomenon of primer editing in more detail, we designed and constructed a set of 31 primer editing standard plasmids containing the wild type Escherichia coli (E. coli) V4 515F primer binding site as well as every possible single base mismatch within the last 10 nucleotides of the E. coli V4 515F primer binding site ( Figure  1A and B). The primer editing standard plasmids also include the E. coli 16S rRNA gene variable region 4 sequence and the wild type V4 806R primer binding site. In addition, each of these plasmids contains a unique REcount barcode construct which allows highly-accurate PCR-free measurement of the composition of the standard pool (16). The primer editing plasmids were sequence-verified and pooled evenly. The composition of the primer editing pool was verified by digesting the pool with MlyI to liberate the REcount barcode constructs, sequencing, and determining the percent abundance for each plasmid barcode ( Figure 1C).
The primer editing standard plasmid pool was then used to create an amplicon sequencing library with KAPA HiFi polymerase, which we previously demonstrated has primer editing activity (9), using primers corresponding to the wildtype E. coli V4 515F and V4 806R primer binding sites. In the resulting sequencing reads, substantial editing of the forward amplification primer was observed for the last 7 bases of the primer sequence. Between 8-13% of the sequencing reads contained a non-wild-type base within the last 7 bases of the amplification primer ( Figure 1C), consistent with the composition of the template sequences where variants were present at each position in 3/31 plasmids (9.68%). The percentage of edits observed within the last 6 bases of the primer sequence tracked the REcount abundance measurements, suggesting that editing at these positions was near complete, while the 7th base from the 3 end was likely incompletely edited. Primer editing was barely detected 8 bases from the 3 end of the primer. As expected, no appreciable primer editing was observed for the wild type reverse primer reads, in which the template sequence perfectly matched the amplification primer in all of the primer editing standard plasmids ( Figure 1D).

Ability of different proofreading polymerases to mediate primer editing
In previous work, we demonstrated that both KAPA HiFi and NEB Q5 polymerases exhibit primer editing activity (9). We used the primer editing standards to explore whether other commercially available enzymes were able to mediate primer editing (Figure 2A). In addition to KAPA HiFi and Q5, we tested three additional proofreading enzymes (Pfu, Phusion, and Vent polymerase) and three nonproofreading polymerases (NEB Taq, Qiagen Taq, and Accuprime Taq). All five proofreading polymerases exhib- ited some level of primer editing activity, though the extent of editing observed varied for each proofreading polymerase, ranging from near complete editing of up to 6 bases for KAPA HiFi polymerase to near complete editing of only the last 2 bases for Phusion polymerase ( Figure  2A). No appreciable primer editing activity was observed for any of the non-proofreading Taq polymerases tested (Figure 2A).

Primer editing is dependent on enzyme concentration
We previously hypothesized that the primer editing activity of KAPA HiFi polymerase was dependent on enzyme concentration (9). We generated sequencing libraries by amplifying the primer editing standard plasmid pool with each of the enzymes tested above across a range of enzyme concentrations (0.25×, 0.5×, 1× and 2× the manufacturer's recommended concentration). Consistent with our previous observations, the extent of primer editing exhibited a dependence on the concentration of the KAPA HiFi polymerase in the reaction ( Figure 2B). A similar concentration dependence was observed with the other four proofreading polymerases tested (Supplemental Figure S1).

Primer editing exhibits minimal sequence-specificity
Minimal sequence specificity was observed in primer editing ( Figure 2C). In general, there was no clear pattern in the base composition of the primer edits, with the possible exception that a slightly increased proportion of G and C edits were observed at the 3 end of the primer, relative to T edits ( Figure 2C). It is possible that this reflects a small increase in amplification efficiency due to the presence of a stronger G:C bond in the 3 end of the primer.

Effect of primer mismatches on amplification efficiency
Next, we examined the extent to which primer editing is expected to rescue the amplification of templates that have primer mismatches. Mismatches within the last 3 bases of the primer sequence have been shown to be the most deleterious to amplification efficiency (10,11). To examine the effect of the individual primer mismatches on amplification efficiency, we used a pool of primers that contained the same 31 variants as the primer editing standard plasmids (variant primers) to amplify an E. coli template with wild type V4 515F and V4 806R primer binding sites with either KAPA HiFi polymerase or a non-proofreading Taq polymerase. For KAPA HiFi polymerase, we observed edits of the variant primers to match the wild type template sequence in a pattern that was roughly complementary to the editing of the wild type primer to match the variants encoded by the primer editing standard plasmids ( Figure  3A). For the non-proofreading Taq polymerase, where no editing of the wild-type primer to match the primer editing standards is seen, we examined the proportion of variant primers able to amplify the wild-type template as a measure of the amplification penalty incurred by each mismatch ( Figure 3B). Variant primers at all positions were permissive for amplification by Taq at some level. However, variants within the last five bases of the forward amplification primer exhibited notable amplification penalties as reads containing these variants were present at 0.922, 0.774, 0.404, 0.318, 0.189-fold their expected values. We also tested whether primer editing or the permissiveness of amplification of primer mismatches were sensitive to annealing temperature by performing the above amplifications across a gradient of annealing temperatures ranging from 50 to 60 • C. Within this range of annealing temperatures, there were no effects on the amount of primer editing or the amount of amplification by variant primers (Supplemental Figure S2). Thus, primer editing helps to overcome reduced amplification efficiencies due to primer mismatches within the last 4-5 bases of the amplification primer, while in the case of the V4 515F primer variants tested, variants beyond 5 bases from the 3 end of the primer were generally permissive for amplification.

Primer editing is tunable through the incorporation of phosphorothioate bonds in amplification primers
Phosophorothioate bonds (in which a non-bridging oxygen within the oligonucleotide phosphate backbone is replaced with a sulfur) have been previously shown to block the activity of various exonucleases and are commonly used to prevent exonuclease degradation of oligonucleotides (24)(25)(26). Using the primer editing standard plasmid pool, we tested whether phosphorothioate bonds could block primer editing by preventing the 3 to 5 exonuclease activity of proofreading polymerases. For KAPA HiFi polymerase, Q5 polymerase, and Phusion polymerase, a single phosphorothioate bond was able to block primer editing beyond the position of the phosphorothioate bond (Figure 4, Supplemental Figure S3). Thus, the extent of primer editing can be tuned through the strategic placement of phosphorothioate bonds in the amplification primers. This ability to limit the extent of primer editing can be used to prevent the formation of undesirable reaction products such as primer dimers. We tested the incorporation of phosphorothioate bonds into an ITS1 (Internal Transcribed Spacer) primer set used for fungal microbiome profiling where we previously saw extensive primer dimer formation during amplification with KAPA HiFi polymerase. The addition of phosphorothioate bonds between the third and fourth to last bases of the forward and reverse primers eliminated the formation of primer dimers for this ITS1 primer set ( Figure 5).

Effect of primer mismatches on amplification of DNA mock community samples
Using DNA-based mock microbial communities and other non-human primate and human samples, we previously demonstrated that primer editing could mitigate the dropout of taxa due to primer mismatches in amplicon-based microbiome profiling (9). We amplified a mock microbial community made by the Human Microbiome Project (HM-276D) using each of the eight polymerases tested in Figure 2. As expected, the three non-proofreading polymerases (AccuPrime Taq, NEB Taq, Qiagen Taq) were unable to amplify Cutibacterium acnes, which has mismatches between the 16S rRNA gene template and the 3 end of the V4 variable region amplification primers ( Figure 6A). All five proofreading enzymes were able to amplify C. acnes, despite the primer mismatches; the recovery of C. acnes was reduced for the sample amplified with Phusion polymerase (Figure 6A), consistent with the weaker primer editing activity observed with this enzyme with the synthetic primer editing standards (see Figure 2A). Other 16S rRNA gene variable regions besides the V4 variable region are often used for specific sample types, as different research communities strive to optimize the taxonomic resolution or minimize the drop-out of key taxa. For instance, the V1V3 variable regions have been recommended for the profiling of human skin associated communities on the basis of improved resolution of key taxa and due to the fact that drop-out of C. acnes occurs when the V4 variable region is used in the absence of primer editing (27,28). We examined the effect of using primers targeting different variable regions on the amplification of the Human Microbiome Project mock community (HM-276D) ( Figure 6B). The detection of near-expected levels of C. acnes was dependent on using a proofreading polymerase for primer sets targeting the V4 variable region and also the V3V4 and V4V6 variable regions, both of which contain 3 mismatches in one of the two amplification primers ( Figure  6C, Supplemental Figure S4). As previously reported (27), C. acnes template was amplified by V1V3 primers, both in the presence or absence of primer editing ( Figure 6C, Supplemental Figure S4). The V1V3 reverse primer contains a single mismatch to the C. acnes template near the 5 end primer which would not be expected to interfere with primer annealing or extension in the absence of primer editing (see Figure 3B).

Primer editing enables robust detection of Cutibacterium in skin microbiome samples with the 16S rRNA gene V4 variable region
In order to determine whether primer editing by a proofreading polymerase can overcome the drop-out of C. acnes observed in human skin microbiome samples with primers targeting the 16S rRNA gene V4 region, we compared the V4 and V1V3 microbiome profiles from a number of skin body sites amplified using either the proofreading KAPA HiFi polymerase or the non-proofreading Taq polymerase ( Figure 7). As expected (27), substantial levels of the Cutibacterium genus was observed for a variety of skin body sites, including face ( Figure 7A), forearm ( Figure   7B), armpit ( Figure 7C), and scalp ( Figure 7D) when these samples were amplified with the V1V3 primers. This was true for amplification using both a proofreading polymerase (KAPA HiFi) and a non-proofreading polymerase (Qiagen Taq). A greater proportion of Cutibacterium was observed in the case of amplification of the V1V3 region with Taq polymerase (Supplemental Figure S5) and for these samples, it is not certain which enzyme is more reflective of the actual composition of the samples. However, a similar effect was seen for the HMP mock community, where the amount of C. acnes observed when the V1V3 variable region was amplified with Taq polymerase was 40% higher than when amplified with KAPA HiFi ( Figure 6B). In the case of the HMP mock community, the KAPA HiFi measurements were closer to the expected 5% abundance of C. acnes, suggesting that for the V1V3 amplicon the abundance of Cutibacterium may be overestimated when amplified with Taq polymerase (Supplemental Figure S5).
When this set of skin microbiome samples was amplified using the 16S rRNA gene V4 primers, the abundance of Cutibacterium was substantially decreased for samples amplified with Taq polymerase, likely due to the reduced amplification efficiency caused by the mismatches between the V4 primers and the Cutibacterium 16S template ( Figure 7A-D). In contrast, when the same skin samples were amplified using the 16S rRNA gene V4 primers and KAPA HiFi polymerase, similar levels of Cutibacterium were observed as when the V1V3 region was amplified with KAPA HiFi (Figure 7A-E). Thus, the incorporation of primer editing into the design of amplicon-based microbiome experiments can mitigate the effect of taxa dropout due to primer mismatches and in the case of human skin microbiome samples enables the recovery of similar levels of C. acnes using 16S rRNA gene V4 and V1V3 primers.

DISCUSSION
Here, we report a novel set of synthetic standards that allowed us to examine the phenomenon of primer editing by proofreading polymerases in detail. We use these syn-  thetic standards to demonstrate that a variety of proofreading polymerases, including KAPA HiFi, NEB Q5, Phusion, Pfu and Vent, can all mediate primer editing to enable efficient amplification of templates which have mismatches to the amplification primers ( Figure 2). The amount of primer editing observed varies between the different enzymes, is sensitive to enzyme concentration, and exhibits minimal sequence specificity (Figure 2).
Kinetic studies of proofreading polymerases such as the T7 DNA polymerase have demonstrated that selective exonucleolytic repair of mismatches relies on kinetic partitioning between the polymerase and exonuclease sites (29) where incorporation of a mismatch slows the rate of poly-merization (30), allowing transfer of the mismatched DNA to the exonuclease site from the normally thermodynamically favored polymerase site (29,31). Modern engineered polymerases such as KAPA HiFi and NEB Q5 polymerase are the result of targeted protein engineering and directed evolution (32)(33)(34). During these processes, enzymes are selected on the basis of a number of desirable properties, such as error rate, GC bias, and processivity. It is possible that the kinetic parameters that determine the balance of exonuclease and polymerase activity were intentionally or inadvertently altered during this selection process. While determining the detailed mechanisms of the differences in primer editing exhibited by different enzymes will be an interest- ing topic for future studies, we speculate that enzymes such as KAPA HiFi may have increased exonuclease activity or accessibility that explain their enhanced primer editing activity and may also correlate with other desirable enzyme properties (such as lower error rates).
Using the synthetic primer editing standards, we determined that for some enzymes (KAPA HiFi and Pfu) nearcomplete editing of mismatches could extend as far as 6 bases from the end of the primer (Figures 1 and 2). Kinetic studies have demonstrated that mismatches or lesions in the n-1 to n-4 positions can induce polymerase stalling (35)(36)(37) and structural studies using the proofreading Bacillus stearothermophilus DNA polymerase large fragment indicated that mismatches as far as 4 bases from the point of incorporation can disrupt the polymerase active site, leading to a 'short-term memory' of mis-incorporated bases (38). To enable primer editing at the n-6 position, it is possible that enzymes such as KAPA HiFi and Pfu have active sites that are sensitive to perturbation by mismatches over longer ranges. Alternatively, non-specific degradation or excision at the 3 end of the primer could help explain the detection and editing of n-6 mismatches and is also consistent with the elevated levels of primer dimer formation often seen with proofreading enzymes.
The extent of primer editing is tunable through the incorporation of phosphorothioate bonds in the amplification primers, which block the polymerase 3 to 5 exonuclease activity ( Figure 4). Thus, the primer editing standard plasmid pool enabled us to carry out DNA sequencing-based enzymology and further characterize and refine the utility of primer editing. The incorporation of strategically placed phosphorothioate bonds allows one to take advantage of the benefits of primer editing while reducing detrimental side-effects such as formation of primer dimers ( Figure 5), or degradation of primer specificity.
The introduction of a phosphorthioate bond creates a chiral center in the oligonucleotide backbone and during phosphoramidite synthesis roughly equal proportions of Rp and Sp diastereomers are incorporated (24). Many nucleases are sensitive to the chirality of the phosphorothioate e87 Nucleic Acids Research, 2021, Vol. 49, No. 15 PAGE 12 OF 13 bond and multiple consecutive phosphorothioate bonds are often used to completely block exonuclease activity (25,26). The fact that a single phosphorothioate bond was able to block essentially all primer editing suggests that the 3 to 5 exonuclease activity of the proofreading polymerases we tested is either not sensitive to the chirality of the phosphorothioate bond or that the cumulative effect of random incorporation of the blocking stereoisomer over the course of multiple PCR cycles is able to substantially inhibit the efficiency of amplifying edited primers beyond the site of phosphorothioate incorporation.
Using both mock microbial communities and human skin microbiome samples we show that primer editing can be used to minimize the dropout or under-amplification of taxa with mismatches to amplification primers across a number of commonly used microbiome primer sets. In particular, we demonstrated that primer editing can overcome a previously documented shortcoming of the V4 515F/V4 806R primer set in amplifying C. acnes from skin microbiome samples (27,28), recovering similar levels of Cutibacterium as those observed with V1V3 primers (Figure 7). It should be noted that suitable sequencing strategies are also required to successfully employ primer editing in microbiome experiments, as we previously demonstrated that using custom sequencing primers can negate the beneficial effects of primer editing (9). In silico analysis and comparison to shotgun microbiome data suggests that primer mismatches in the critical final 3-4 bases of frequently used 16S rRNA gene amplification primers are relatively common (12), and the discovery of new phyla with divergent 16S rRNA gene sequences (15) further support the use of amplification strategies that can overcome primer mismatches as a means to improve the accuracy of amplicon-based microbiome measurements.