Arrayed in vivo barcoding for multiplexed sequence verification of plasmid DNA and demultiplexing of pooled libraries

Abstract Sequence verification of plasmid DNA is critical for many cloning and molecular biology workflows. To leverage high-throughput sequencing, several methods have been developed that add a unique DNA barcode to individual samples prior to pooling and sequencing. However, these methods require an individual plasmid extraction and/or in vitro barcoding reaction for each sample processed, limiting throughput and adding cost. Here, we develop an arrayed in vivo plasmid barcoding platform that enables pooled plasmid extraction and library preparation for Oxford Nanopore sequencing. This method has a high accuracy and recovery rate, and greatly increases throughput and reduces cost relative to other plasmid barcoding methods or Sanger sequencing. We use in vivo barcoding to sequence verify >45 000 plasmids and show that the method can be used to transform error-containing dispersed plasmid pools into sequence-perfect arrays or well-balanced pools. In vivo barcoding does not require any specialized equipment beyond a low-overhead Oxford Nanopore sequencer, enabling most labs to flexibly process hundreds to thousands of plasmids in parallel.


Introduction
Sequence verification of plasmid DNA is a cornerstone of many cloning and molecular biology workflows.Sanger sequencing, a method developed in 1977 ( 1 ) and commercialized in 1986, remains one of the most commonly used methods.In its current form, Sanger sequencing uses a DNA primer, a DNA polymerase, and fluorescent dideoxy chain terminating nucleotides to produce DNA fragments, which are then separated by capillary electrophoresis to 'read' which nucleotide caused a termination at each position.The method requires a relatively clean plasmid template (e.g. a miniprep) or amplicon and results in reads of < 1 kb for ∼$2-5 / reaction, not including the cost of DNA purification ( ∼$2) or a sequencing primer ( ∼$8).Although essential for small-scale studies, Sanger sequencing can be expensive and time-intensive for applications that require sequencing many plasmid clones or long DNA molecules: each clone requires a separate plasmid purification and each ∼500-600 bp region of a long DNA molecule requires a separate sequencing primer and Sanger reaction.Because of this, alternative methods that leverage high-throughput sequencing technologies are being developed.
In vitro methods have been developed to multiplex highthroughput sequencing by introducing DNA barcode 'indices' via PCR (2)(3)(4)(5) or Tn5 transposase tagmentation ( 6 ,7 ) (see also https:// www.octant.bio/blog-posts/ octopus-v3 ), enabling the short-read (e.g.Illumina, MGI, Element) or longread (e.g.Oxford Nanopore Technology (ONT), PacBio) sequencing platforms to sequence many barcoded samples at once.Illumina provides the high throughput and accuracy at low cost, but maximum read lengths of 125-300 bp make barcoding and read assembly schemes more challenging and can miss some common plasmid features such as long repeated elements, structural variation, or plasmid multimers ( 8 ,9 ).By contrast, the ONT platform generates long reads spanning entire plasmids at reasonable quality and offers better discrimination between these long DNA features ( 10 ,11 ).Additionally, the ONT instrument is inexpensive enough to be purchased and run by most academic labs, and the ability to vary the user-defined runtime enables labs to scale the sequencing throughput and cost to meet variable demand.This flexibility means that a highthroughput plasmid barcoding and sequencing method built on the ONT platform could be more widely accessible.Indeed various in vitro protocols ( 3 ,7 ) and commercial services ( https: // www.plasmidsaurus.comand https:// www.primordiumlabs.com ) have been developed to more efficiently multiplex ONT plasmid sequencing.However, all these methods require that each plasmid is purified from cells and barcoded individually, with these library preparation steps having an outsized impact on the overall cost and throughput of plasmid sequencing.
Here, we develop a B acterial P ositioning S ystem ( BPS ): a platform that uses bacterial conjugation, in vivo DNA cutting, and in vivo recombination to barcode and index plasmids in large bacterial arrays.This platform enables different samples to be pooled before plasmid isolation, library preparation, and ONT sequencing, greatly increasing throughput of routine plasmid sequencing.We show that BPS can sequence, with high accuracy and recovery rate, tens of thousands of plasmids in parallel at a cost between $0.12 and $1.40 per plasmid (a 5-to 70-fold cost reduction relative to existing protocols).To demonstrate new capabilities that come with this increased scale, we show that BPS can be used to transform overdispersed error-containing oligonucleotide and gene library pools into sequence-verified arrays and well-balanced pools.

BPS protocol for in vivo barcoding and pooled sequencing of plasmid arrays
For the most current version of a 'wetbench' protocol, visit http:// darachm.gitlab.io/bps/ and navigate to the 'BPS Proto-col'.For questions, suggestions, or issues, please open an Issue at http:// gitlab.com/darachm/ bps/ -/ issues .

Construction of barcoded donor plasmids and arrayed donor clones
The backbone of donor vectors (pSL438 and pSL439, Supplementary Table S4 ) were constructed to contain (i) KanR (kanamycin resistance), (ii) oriT (origin of transfer), (iii) R6K ori γ (conditional replication origin depending on the phage-derived pir1 expression) and (iv) a swapping region, I-SceI-HU-HD-I-SceI, where I-SceI is the recognition site of the endonuclease I-SceI, and HU (5 -ttgccctctctcttca ttcagggtcatgagaggcacgccattcaaggggagaagtgagatc-3 ) and HD (5 -aagaacttttctatttctgggtaggcatcatcaggagcagga-3 ) are the upstream and downstream homology regions for recombination.In the swapping region, a selection cassette containing HygR-SacB was cloned between HU and HD.To insert random barcodes into donor backbones (pSL438 and pSL439), an oligonucleotide library (pXL633, Supplementary Table S5 ) that contains a NotI restriction site, a barcode region including a random 15 nucleotides, and a region of homology to both donor backbones, was ordered from IDT. pXL633, paired with pXL585, was used to PCR the barcodes with ∼1 ng of either pSL438 or pSL439 as template.The resulting PCR products were restriction digested and ligated into the corresponding donor vector via NotI and XmaI sites.Following the same cloning protocol above, the ligation products were transformed into competent BUN20 donor cells and the barcoded donor clones were selected on the LB + Kan agar plates at 37ºC.Transformant clones were then randomly selected and arrayed to generate two 96-well barcoded donor collections: pSL438_BC and pSL439_BC ( Supplementary Table S4 ).To identify the barcode sequences in the arrayed donor collec-PAGE 3 OF 13 tions, the regions containing the barcodes were amplified by colony PCR ( 15 ) using pXL583 and pXL584 ( Supplementary Table S5 ) as primers.The amplicons were then purified and Sanger sequenced using pXL583.Barcodes were then extracted to compile two lists of donor barcode collections.

Construction of barcoded recipient plasmids and arrayed recipient clones
Plasmid pSL937, which is used as the backbone to insert the random barcodes to generate the arrayed and barcoded recipient collection, was constructed from the following sources: (i) plasmid backbone / origin of replication from pBR322 ( 16 ,17 ), (ii) GmR (gentamicin resistance marker) from pUC18-mini-Tn7T-Gm ( 18), (iii) homology sequences HU and HD, and two I-SceI recognition sites in a HU-I-SceI-I-SceI-HD configuration, (iv) a rhamnose-inducible toxin relE (P rhaBAD -relE) from pSLC-217 ( 19 ) , which was cloned between two I-SceI sites.To insert random barcodes into the recipient backbone (pSL937), an oligonucleotide library (pXL631, Supplementary Table S5 ) that contains an XhoI restriction site, a barcode region including 20 random nucleotides, and a region of homology to pSL937, were ordered from IDT. pXL631, paired with pXL154 ( Supplementary Table S5 ), was used to generate barcodes via PCR with ∼1 ng pSL937 as template.The resulting PCR products were digested and ligated into pSL937 using MluI and XhoI restriction sites.The ligation products were then transformed into competent BW28705 cells that contain a spectinomycin-resistant helper plasmid pSL361, which was constructed by integrating a multi-cloning site and the pBAD-I-SceI (Addgene) endonuclease gene into pML104 ( Supplementary Table S4 ).Barcoded recipient clones were selected on the LB + Sp + Gm + 2% glucose at 30ºC.Transformants were then randomly selected and arrayed into ten 96-well plates.
The previously constructed two donor barcode plates were repeatedly used to acquire position and sequence identity of unknown barcodes on recipient plasmids to construct recipient positional barcodes plates.Because sequencing chimeras may generate and mis-associate donor barcodes to recipient barcodes, each of these 96-array recipient barcode plates were mated with two donor plates to allow cross-validations and ensure accurate parsing results.The resultant recombinants containing known donor barcodes and unknown recipient barcodes were sequenced by Illumina MiSeq platform.A total of 831 unique recipient barcodes were detected and 768 (requiring a pairwise hamming distance > 5) were randomly rearrayed into 8 × 96-arrayed and 2 × 384-arrayed plates, which serve as positional barcode plates for parsing unknown DNA blocks in donor plasmids.
Construction of donor plasmid libraries containing oligonucleotide pools and arrayed donor clones pSL1071 and pSL1064, which contain the NsrR-PheS or HygR-SacB cassettes, respectively, two I-SceI sites, and two homology regions for recombination (HU and HD), were used as the backbone to insert oligonucleotide pools.The 300-base oligonucleotide pools were ordered from IDT and Twist according to the following design, GCTTA TTCGTGCCGTGTTA TGGCGCGCCNN ... NNG CGGCCGCGGGC AC AGCAATCAAAAGTA, where GCTT A TTCGTGCCGTGTT A T and GGGC AC AGC AAT-CAAAAGTA are priming sites for the forward and reverse primers (skpp-101-F and skpp-101-R, Supplementary Table S5 ) to amplify the oligonucleotide pool, GGCGCGCC and GCGGCCGC are recognition sites for restriction enzymes AscI and NotI, and NN…NN denotes the 244-base sequences randomly selected from the human genome assembly (GRCh38) by using a custom Python script ( Supplementary Table S6 ).The amplification of the oligonucleotide pool was performed with ∼5 ng of template DNA and KAPA HiFi polymerase (Roche) with the annealing temperature at 53ºC and extension time of 15 s for 14-20 cycles.PCR products were purified using DNA Clean & Concentrator-5 (Zymoresearch).To clone PCR products into the donor plasmid pSL1071 or pSL1064, AscI and NotI restriction enzymes were used.The digestion reactions of PCR products and pSL1071 / 1064 were performed at 37ºC for 4 hours.Digested products were then size selected by running a 1.2% agarose gel and recovered using Zymoclean Gel DNA Recovery Kit (Zymoresearch).The ligation reaction was performed with 25 ng of digested vector and 3.8 ng of inserts using T4 DNA ligase (NEB) at 16ºC for 15 h.Donor plasmids were then transformed into the donor cells by heat shock at 42ºC for 60 s and recovery at 37ºC for 45 min.Resulting donor clones were randomly arrayed on PlusPlates.384-arrayed donor clones were generated by PIXL colony picker.

Pooled amplicon sequencing on the Illumina platform
To extract the recombinant plasmids, cells were scraped from each 96-position array selection plate and a pooled plasmid extraction was performed using Plasmid Plus Mini Kit (QI-AGEN).The plasmid DNA was quantified and diluted to ∼ 1 ng / μl, which is approximately 1.5 × 10 6 copies of each unique barcode-barcode pair per 96-array plate.A two-step PCR was performed, as described ( 20 ) with modifications.First, 4-5 cycles of PCR with OneTaq polymerase (New England Biolabs) was performed using the forward (pBPS_fwr) and reverse (pBPS_rev) primers listed in Supplementary Table S9 .∼1 ng of recombinant plasmid DNA was amplified in a single 50 μl PCR reaction with the annealing temperature at 55ºC and extension time of 20 s.To increase the multiplexity of sequencing samples, a unique pair of 1st and 2nd PCR primers (see Supplementary Table S9 ) were used to amplify the plasmid DNA from each mated plate, which uniquely barcodes each amplification reaction and enables pooling of multiple mated plates together in one sequencing library.
Primers for the first step PCR have this general configuration and sequences are listed in Supplementary Table S9 : pBPS_fwr: A CA CTCTTTCCCTA CA CGA CGCTCTTCC-GATCTNNNNNNNNXXXXXXttcggttagagcggatgtg pBPS_rev: GTGA CTGGA GTTCA GA CGTGTGCTCTTC-CGATCTNNNNNNNNXXXXXXXXXaggtaacccatatgcatggc.
The Ns in these sequences correspond to any random nucleotides and are used in the downstream analysis to remove skew in the counts caused by PCR jackpotting.The Xs correspond to one of several multiplexing tags, which allows different plasmid pools to be distinguished when loaded on the same sequencing flow cell.The lowercase sequences correspond to the priming sites on the recombinant plasmids.The uppercase sequences correspond to the Illumina Read 1 or Read 2 sequencing primer.The PCR products were purified using NucleoSpin columns (Macherey-Nagel) and eluted into 33 μl water.A second 23-25 cycles PCR was performed with PrimeStar HS polymerase (Takara) or KAPA HiFi DNA Polymerase (Roche), with 33 μl of cleaned product from the first PCR as template and 50 μl total volume per tube.The annealing temperature is 69ºC and extension time is 20 s.Primers for this reaction were the standard Illumina TruSeq dual-indexed primers (D501-D508 and D701-D712) listed in Supplementary Table S9 .
PCR products were cleaned using NucleoSpin columns.Amplicons from each mating plate were uniquely labeled with our customized primer indexes (first step PCR) as well as standard Illumina indices (second step PCR).This quadrupleindexed strategy increases the multiplexing capacity for sequencing.Cleaned amplicons were pooled and paired end sequenced at ∼800 reads per barcode-barcode pair on an Illumina MiSeq, HiSeq or NextSeq with 25% PhiX genomic DNA spike-in.

Illumina sequencing data analysis for demultiplexing and sequence verification
Donor-recipient double barcode amplicon sequencing data was analyzed by customized Python scripts and Bartender ( 21 ).First, Illumina reads were demultiplexed using the Illumina indices.Any sequence without an exact match to two Illumina indices was discarded.Barcodes were extracted from demultiplexed sequences using the regular expressions ' Unique molecular identifiers (UMIs, the Ns in pBPS_fwr and pBPS_rev) were also extracted based on their expected position in the Illumina reads.Barcode reads, which contain a mix of true barcode sequences and sequences that contain errors stemming from PCR or sequencing, were next clustered into consensus sequences using Bartender ( 21 ).Each barcode cluster was next examined for replicate UMIs (indicating PCR duplicates) using Bartender, and all duplicates were removed to generate final counts of each barcode pair.The double barcodes with < 20 reads were excluded, many of which are expected to be PCR chimeras (barcodes fused by PCR amplification).The remaining reads were used to ascertain the position of each donor barcode from each corresponding recipient barcode.Customized scripts are available at https:// github.com/Li-WY/ BPS-data-analysis .

E. coli pre-LASSO probe design
The pre-LASSO probes ( ∼160-180 nt) used in this study were designed from the E. coli str.k-12 substr.mg1655 reference ORFeome (RefSeq: NC_000913.3)by using a custom Biopython script.The algorithm was set up to select probes that capture E. coli ORFs ranging from 999 to 2000 bp in size.The ligation and extension arms of pre-LASSO probes had similar melting temperatures, in the 65-70 • C range ( 22 ).
The complete list of the pre-LASSO probes with the targeted ORFs are included in Supplementary Table S8 .The 417 pre-LASSO probes were obtained as pooled oligonucleotides from Twist Bioscience and used for the assembly of mature LASSO probes, as described ( 22 ).The pre-LASSO design was: 5´CA GA CGA CGGCCA GTGTCGA C, Ligation Arm, AA CA CTTCTTGCGGCGATGGTTCCTGGCTCTTC-GATC, Extension Arm, GGATCCTACGGTC ATTC AGC 3´.

Capture and cloning of E. coli ORFs by LASSO probes
Capture of E.coli ORFs was performed as described ( 23 ).Briefly, the 417 LASSO probe library was hybridized to E. coli genomic DNA in 15 μl of 1 × Ampligase DNA Ligase Buffer (Epicentre) containing 250 ng of unsheared E. coli K12 total genomic DNA and 5 ng of the LASSO probe pool.In the hybridization reaction, the concentration of E. coli chromosomes was approximately 10 pM.The reaction (15 μl) containing the LASSO probe pool and the E. coli genomic DNA was denatured for 5 min at 95 • C in a PCR thermocycler (Eppendorf Mastercycler), then incubated at 65 • C for 60 min.After hybridization, 5 μl of freshly prepared gap filling mix ( Supplementary Table S7 ) was added into the hybridization solution while maintaining the reaction at 65 • C in the thermal-cycler.Gap filling and ligation was performed for 30 min at 65 • C.After capture, the DNA samples were denatured for 3 min at 95 • C, and the temperature was reduced to 37 • C. Next, 4 μl of linear DNA digestion solution was added.Digestion was performed for 1 h at 37 • C, followed by 20 min at 80 • C.
A post-capture PCR was performed using AttB1CaptF and AttB1CaptR ( Supplementary Table S5 ) as described ( 22 ).The post capture PCR product was purified by using AMPure XP Beads (Beckman Coulter) and mixed with the Gateway 'donor vectors' (pDONR221 ( 24 )) and the BP Clonase enzyme mix (Invitrogen).The BP reaction was purified and used for electroporation in NEB 10-beta Electro-competent E. coli (c3020K) to generate cloned libraries.

Integration of captured ORF pools into BPS donor plasmids
Plasmid pSL1064, which contains the HygR-SacB cassette, two I-SceI sites, and two homology regions for recombination (HU and HD), was used as the backbone to insert pooled E. coli ORFs.The E. coli ORFs were integrated into the plasmid pDONR221.oSL1581 and oSL1582 ( Supplementary Table S5 ) were used to introduce AscI and NotI recognition sites to these ORFs for cloning into pSL1064 by PCR.The amplification of the pooled ORFs was performed with 10 ng of template ORFs and KAPA HiFi polymerase (Roche) for 20-cycle PCR with the annealing temperature at 57ºC and extension time of 3.5 min.Amplified ORFs were purified using DNA Clean & Concentrator-5 (Zymoresearch).To clone amplified products into the donor plasmid pSL1064, AscI and NotI restriction enzyme recognition sites were used.The digestion reaction of amplified products and pSL1064 were performed at 37ºC for 4 h.Digested products, ranging from 1-2 kb, were size selected by cutting bands from a 1.2% Agarose gel and PAGE 5 OF 13 isolating DNA using Zymoclean Gel DNA Recovery Kit (Zymoresearch).The ligation reaction was performed with 100 ng of digested vector backbone and 70.8 ng of inserts using T4 DNA ligase (NEB) at 16ºC for 15 h.

Arrayed mating for barcoding DNA with BPS
Each donor plate was mated to each barcoded recipient plate in an arrayed format on agar plates.The donor arrays were grown on LB + Kan + Hyg / clonNat plates overnight at 37ºC; the recipient arrays were grown on LB + Sp + Gm + 2% glucose overnight at 30ºC.The agar media for arrayed mating was LB + Ara + IPTG, pre-warmed in 37ºC for 1 hour.Both donor and recipient clones were transferred onto the mating plates using SINGER RO T OR HDA robot with 96-or 384position pin pads, and grown for 3-6 h at 30ºC.The mated cells were then transferred onto the selection plates containing LB + Ara + Rha + Gm + Hyg / clonNat.Recombinant clones were then selected at 37ºC overnight.

Pooled sequencing of whole plasmid backbones on the Oxford Nanopore platform
Bacterial clones on selection plates were scraped and pooled to extract recombinant plasmids containing the recipient barcodes and DNA blocks (donor barcodes, oligonucleotides, and E. coli ORFs) using Plasmid Plus Mini Kit (QIAGEN).The pooling capacity is determined by the number of unique positional barcodes.In this study, 768 positional barcodes were routinely used and up to 768 clones (2 × 384-array plates) containing unique positional barcodes can be pooled.
Two fragmentation approaches were used to generate linearized plasmids for nanopore library constructions.One is to use the restriction enzyme PmlI (NEB) to cut circular plasmids by incubation at 37ºC for 2 h.Linearized plasmids were size selected by running a 1.2% Agarose gel and recovered using Zymoclean Gel DNA Recovery Kit (Zymoresearch).The second approach is to tagment circular plasmids using the transposome complex from Rapid Barcoding Kit (SQK-RBK114.96).

BPS analysis pipeline
A bioinformatics pipeline was devised to achieve appropriate, performant, and scalable analysis of BPS experiments, and this pipeline uses a flexible configuration interface to enable diverse applications.The aim of the pipeline is to (i) gather long-read sequencing data from multiple runs, (ii) extract a small barcode from each read, (iii) use this small barcode to separate reads from each colony, (iv) perform one of two assembly strategies and (v) assess the 'purity' and 'correctness' of the assembly at that position.
Basecalled FASTQ files are filtered for size and a known contaminant file is used to optionally filter out reads using alignment (minimap2 ( 25 ) and samtools ( 26 )).Within each sequencing library pool, demultiplexing barcodes that are introduced during ONT library preparation are used to assign sample membership to each read.Alignment to unique 'signature' sequences are used to separate out different plasmids within each sample (minimap2 and awk).Alignment to known 'trimming' sequences is used to remove most of the backbone (min-imap2 and Python), and fuzzy regular expressions are used to extract exactly the barcode from a known sequence context (itermae, https:// gitlab.com/darachm/ itermae/ ).Barcodes are clustered and assigned to an optionally provided list of known barcodes (starcode ( 27 ) and Python), then each combination of sample and positional barcodes is used to separate raw FASTQ records into separate files (awk).These may be assembled using one of two options (A or B).(A) To assemble long ( > 1 kb) target sequences, read-length distributions per well are analyzed with a gaussian-mixture model to separate different species of plasmid in each well (Python), then each cluster of reads is used for de novo assembly (trycycler ( 28 ) and flye ( 29 )) and polishing of the assembly (medaka).(B) To assemble of short ( < 1 kb) targets, raw reads are trimmed using alignment to known trimming sequences (minimap2 and Python) before multiple-sequence alignment (kalign3 ( 30 )), merging to a draft consensus (Python), and polishing of the consensus (racon ( 31 ) and medaka).The purity of either assembly is assessed by aligning each raw input sequence to the resulting assembly (either with minimap2 or with pairwise alignment using Biopython ( 32 )).The resulting assemblies are subject to another round of post-assembly processing using options of raw output, a 'rotation' to begin all of the circular assemblies in a similar location (minimap2 and Python), trimming of known sequences (minimap2 and Python), and / or exact extraction using fuzzy regular expressions (itermae).Positions with at least 5 reads for which at least 90% of the reads matched the consensus reference with at least 90% of end-toend identity were considered pure.Final processed assemblies are optionally compared to a known target sequence file to assess 'correctness', with a perfect match between the consensus and reference sequences considered to be correct (minimap2 or bwa ( 33 )).Each successfully considered position is analyzed (using R ( 34 )) to output a per-position per-sample call of purity and correctness (in matching a particular reference in the provided set).The entire pipeline is written in Nextflow ( 35 ), uses Singularity ( 36 ) to execute Docker containers, and makes extensive use of GNU utilities (including GNU parallel ( 37 )).
The pipeline is available on GitLab under the BSD 3-clause license, documentation is available at darachm.gitlab.io/bps, and we will support users via Issues at gitlab.com/darachm/bps/ -/ issues .

Illumina sequencing to evaluate the uniformity of oligonucleotides before and after amplification
To approximate the abundance of the original single stranded oligonucleotides generated from service providers, second strands were synthesized using a primer annealing and extension approach (38)(39)(40)(41)(42).The second strand synthesis reaction was performed with ∼50 ng (IDT) or ∼8 ng (Twist) of template DNA and KAPA HiFi polymerase (Roche) with the annealing temperature at 53ºC and extension time of 30 sec for 2 (IDT) or 10 (Twist) cycles.For second strand synthesis only one primer (skpp-101-R) was used.
The resultant dsDNAs, together with amplicons of oligonucleotides generated from 14 (IDT) or 20 (Twist) cycles of PCR, were ligated with xGen™ UDI-UMI Adapters (IDT) which contain full length sequencing primers, i7 / 5 indices, and Unique Molecular Identifiers (UMIs).PE150 reads were generated by Illumina iSeq platform.The absolute copy number of oligonucleotides / amplicons were estimated by counting the number of unique UMIs associated with each type of oligonucleotide / amplicon.

Results
Overview of the in vivo barcoding platform MAGIC cloning ( 12 ), developed as an in vivo alternative to in vitro GATEWAY cloning ( 43 ), enables rapid subcloning of a DNA block from one plasmid to another using bacterial conjugation and in vivo recombination.A DNA block in a donor plasmid is conjugated to a recipient cell containing a recipient plasmid.A genetic program in the recipient cell recombines the DNA block from the donor plasmid to the recipient plasmid using the endonuclease I-SceI to cut both plasmids and the recombinase lambda Red to stitch (recombine) the donor DNA block into the recipient plasmid backbone.We extended this platform for use in multiplexed plasmid sequencing by constructing arrays of cells with a barcode that is unique to a position on an array (positional barcodes) in either the donor plasmids or the recipient plasmids (Figure 1 A).In addition, we made two major modifications to improve the reliability and portability of the platform.First, the PheS negative selection marker is replaced by relE , with flanking terminators to ensure deliberate control of expression ( 19 ).Second, the homing endonuclease I-SceI is placed in a helper plasmid (pSL361) with a temperature-sensitive replication origin (pSC101 -ori ts ), providing a convenient way to cure (remove) the helper plasmid prior to isolation of recombinant plasmid DNA.
In one design (Figure 1 B), DNA constructs are sequenced by integrating them into donor plasmids that are subsequently conjugated to arrays of cells containing recipient plasmids with positional barcodes.In another design (Figure 1 C), the whole backbone of recipient plasmids are sequenced by conjugating positional barcodes from arrays of donor cells and recombining the barcodes into the recipient plasmids.The resultant recombinant plasmids from either workflow, including positional barcodes and to-be-sequence-verified plasmid DNA, can be pooled and processed for ONT sequencing with a single DNA miniprep and a single ONT library prep.To process these data, we developed a flexibly-configured Nextflowbased pipeline to enable read partitioning and scalable de novo assembly of tens of thousands of plasmids.

Accuracy and recovery rate
To test the accuracy and recovery rate of the in vivo barcoding and sequencing platform, we generated 192 clones of barcodes in donor cells / plasmids and 768 clones of barcodes in recipient cells / plasmids in 2 × and 8 × 96-position arrays, verified the barcode sequence at each position using a combination of Sanger and Illumina sequencing (Materials and Methods), and used these arrays as a test set.We next mated each donor array to each recipient array (96 × 8 matings per donor array), selected for colonies that contain barcodebarcode recombinant plasmids, and performed two minipreps (one for each donor array mating).For each plasmid pool (768 clones), we tagmented plasmids using the ONT Rapid Barcoding Kit 96 (V14).Each tagmentation reaction introduces one of 96 unique sample indices to the plasmid pool enabling up to 73 728 (96 × 768) plasmids to be processed in parallel with our 768-recipient barcode array.We sequenced plasmid pools on an ONT MinION (R10.4.1) at average sequencing depth of 130 reads per position.A custom BPS bioinformatics pipeline (Materials and Methods) was developed to partition ONT sequencing data by both the positional barcodes (introduced in vivo via conjugation) and the sample indices (introduced in vitro during ONT library preparation), and to assemble (using Trycycler ( 28 ) and Flye ( 29)) and polish (using medaka https:// github.com/nanoporetech/ medaka ) the consensus sequence for each position on each 96-position array.Consensus ONT sequences of barcode-barcode pairs in all cases matched the sequence determined by Illumina and Sanger sequencing (Materials and Methods).Using these ONT data, we calculated the recovery rate (percent of known positions that were detected) and accuracy (percent of detections that were correct) of donor barcode sequences (Figure 1 D and Supplementary Figure S1 ).We found perfect accuracy and a high recovery rate ( > 98%) that improves when the same donor barcode is assayed multiple times (Figure 1 D).Missing positions were, in all cases, due to a lack of sufficient sequencing coverage ( Supplementary Figure S2 ).ONT reads also enabled de novo (reference-free) assembly of the whole recombinant plasmids to detect various types of sequence variation from the expected recipient backbone sequence (Figure 1 E).

Low-cost whole-plasmid backbone sequencing with a fast turnaround time
The number of plasmids that can be processed in parallel by in vivo barcoding is determined by the number of unique positional barcodes across arrays.Here, we constructed arrays of 768 positional barcodes to enable DNA isolation and library construction of pools of 768 plasmids in parallel.At this scale of multiplexing, using off-the-shelf robotics for colony picking and arrayed bacterial conjugation / mating, we estimate that thousands of plasmids can be sequenced in parallel for between $0.12 (100 × sequencing depth) and $0.53 (1000 × sequencing depth), including all consumables and labor costs (high throughput in Figure 1 F and Supplementary Table S1 ).At lower scales of multiplexing, we estimate that manual colony picking and mating can be used to sequence hundreds of plasmids in parallel for between ∼$1.00 (100 × sequencing depth) and $1.40 (1000 × sequencing depth), including all consumables and labor costs (low throughput in Figure 1 F and Supplementary Table S2 ).Because most steps of the procedure are performed with cell or plasmid pools, both lowthroughput and high-throughput protocols can be performed with ONT reads ready as soon as the next day (Figure 1 G).

Demultiplexing and sequence verification of oligonucleotide pools
Generation of oligonucleotide pools using arrayed synthesis technologies can be several orders of magnitude less expensive than one-at-a-time column-based DNA synthesis ($0.0005-0.035/ base versus $0.07-0.50/ base) ( 44 ,45 ).Parallelized methods that capture long blocks of natural DNA can offer similar cost savings relative to de novo gene synthesis.Yet, many testing modalities require arrays of DNA designs (e.g.mass spectrometry , microscopy , and enzymatic assays) and are unable to take advantage of these sources of low-cost DNA.We next explored whether the higher throughput of the BPS plasmid sequencing platform could be used to demultiplex such DNA pools.S1 and S2 .( G ) Experimental timeline for sequence verification by BPS.We assume a MinION flow cell contains 250 active pores generating data at 100 bases / second, enabling ∼500 (7 kb) plasmids to be sequenced at 100 × depth in 4 h.

PAGE 8 OF 13
Nucleic Acids Research , 2024, Vol.52,No. 10,e47 To test the capability of BPS to demultiplex oligonucleotide pools, we designed a library of 1100 oligonucleotides, each containing a 244-base sequence chosen randomly from the human reference genome GRCh38 (Figure 2 A) ( 46 ).Differences in the efficiency of synthesis across different nucleotide sequences may result in oligonucleotide pools that are more or less dispersed in frequency (defined here as pool dispersion).Higher pool dispersion would require picking and sequencing more clones per design (higher sampling depths) to recover the same number of designs across an array ( 20 ).To minimize potential pool dispersion in our first test, we subset the 1100 oligonucleotide designs into one 100 randomlychosen oligonucleotide pools that were scored as 'low complexity' by the IDT online sequence complexity analysis tool ( https:// www.idtdna.com/site/ order/ plate/ gblocks ).This pool was synthesized by IDT as an oPool, and inserted into donor plasmids / cells by PCR, digestion, and ligation.Randomly arrayed clones were generated from this pool at a ∼20 × sampling depth (the average number of clones picked per DNA block in the pool) for in vivo barcoding and sequencing, as described above, to generate a consensus sequence at each position on each plate.Given the relatively high error rate of ONT simplex reads, we needed to distinguish positions that contain only sequencing errors from those that contain a mixture of two or more plasmids.For each position, we determined a purity score (the fraction of ONT reads that are > 90% end-to-end identical to the consensus sequence at that position) and assayed, across a range of purity scores, which clones were indeed pure by examining Sanger sequencing traces ( Supplementary Figure S3 ).We found that purity scores > 0.8 were reliably pure clones (15 / 15) but that those with purity scores < 0.8 were frequently mixed clones (7 / 8).Based on these results, we set a conservative 0.9 purity score threshold for calling a clone 'pure' and flagged putatively 'impure' positions.Illumina sequencing was performed on a subset of samples to validate the correctness of consensus sequences (Materials and Methods), and for unflagged positions the consensus Illumina and ONT sequences agreed 99.7% of the time (with most differences presumably stemming from errors in PCR during the Illumina library prep).After discarding flagged positions, we recovered a sequence-perfect clone for 83% of the oligonucleotide designs in this pool.
We next scaled-up the method by demultiplexing, at ∼20 × sampling depth, a pool containing all 1100 oligonucleotide designs, synthesized by Twist.This full set includes 100 oligonucleotides that were scored as 'intermediate complexity' by the IDT online oligonucleotide analysis tool, indicating that it may be more dispersed than the 100 oligonucleotide pools synthesized by IDT.We recovered sequenceperfect clones for 51.5% (567 / 1100) of the oligonucleotide designs in the full pool, and 52.0%(52 / 100) of the oligonucleotide designs in the subset with the same designs as the subpool synthesized by IDT.

Examination of factors impacting the recovery rate of demultiplexing
Given the high recovery rate of BPS using pre-arrayed colonies as inputs (see Accuracy and Recovery Rate experiments above), we sought to understand what factors impact the recovery rate of commercial oligonucleotide pools.The recovery rate could be influenced by sequence errors introduced by (i) synthesis and / or assembly of linear DNA, (ii) PCR amplifica-tion or (iii) introduction into plasmids and cells.The vendorreported oligonucleotide synthesis error rate of ∼ 4 × 10 −4 / nt predicts that ∼ 11.3% ( 1 − 0 .9996 300 ) of oligonucleotides in the pool contain erroneous sequences.Additional sequence errors introduced during amplification and cloning of the BPS protocol appeared to have little impact on the recovery rate: we observed a base substitution error rate that is roughly the same as the rate reported by the vendors ( 4 .75 × 10 −4 / nt and 3 .81 × 10 −4 / nt for IDT and Twist pools respectively, Supplementary Figure S4 ).
We next explored the impact of variation in sequence abundance on the recovery rate by examining the level of pool dispersion at different stages of the protocol for both the IDT (Figure 2 B) and Twist (Figure 2 C) pools: following second strand synthesis (Original dsDNA), PCR amplification (Amplicon dsDNA), introduction into plasmids and plating of cell colonies (Randomly Plated Clones), and construction of colony arrays and barcoding by BPS (Arrayed Clones).The lowest abundance correlations were observed between the Amplicon dsDNA and Randomly Plated Clones steps.Using Gini coefficients as a measure of pool dispersion, we found that this step also resulted in the greatest increase in pool dispersion, resulting from large changes in abundance of particular oligonucleotide designs (Figure 2 D and F).To determine whether the pool dispersion at this cloning step was due to inadequate clone sampling, we estimated the expected recovery rate based on the frequency distribution we observed in the Amplified dsDNA pools (Figure 2 E and G).We found that, at a 7 × sampling depth, > 93% of sequence perfect designs are expected to be present at the Randomly Plated Clones step, while we could only recover 86% and 49%, for IDT and Twist pools respectively, by resampling the observed Randomly Plated Clones (Figure 2 E and G).These data suggest that the cloning step, which includes introduction of oligonucleotide designs into plasmids and cells, has the greatest impact on frequency dispersion and therefore demultiplexing performance.

Demultiplexing a captured open reading frame (ORF) library
To demonstrate the capability of BPS to demultiplex pools of longer DNA (1-2 kb) with variable sizes, we parsed a library of ORFs captured using long-adapter single-strand oligonucleotide (LASSO) probes ( 47 ).This DNA capture technique uses pools of long inversion probes to selectively hybridize and amplify multiple target regions from genomic DNA.While LASSO probes have been demonstrated to capture > 3000 E. coli ORFs in parallel ( 47 ), off-target sequences are also captured, desired on-target sequences may contain mutations from PCR, and, as expected for any pooled technique, the relative abundance of each type of sequence can vary significantly.These features can limit the cost-efficiency and data quality of downstream pooled assays.To determine if BPS could be used with LASSO probes to generate ORF arrays or well-balanced sequence-perfect pools, we captured 417 E. coli ORFs (1-2 kb) by LASSO probes, cloned them as a pool into the BPS donor plasmid backbone, and picked a total of 10752 clones ( ∼25 × sampling depth) for in vivo barcoding and sequence verification (Figure 3 A).Of the 8562 pure clones isolated by BPS, we found that 30.7% were sequence perfect, 61.2% had at least one mismatch, and 7.5% were off-target DNA.The high number of sequences that contain at least one mismatch  is likely a result of the many cycles of PCR required for the process: 25 cycles for post ORF capture amplification and 20 cycles for cloning into the BPS donor plasmids.Sequence perfect clones contained 52.3% (218 / 417) of targeted ORFs (Figure 3 B).The distribution of DNA lengths of these error-free ORFs suggests that our protocol can capture ORFs in all size ranges in the original pool (Figure 3 C).
When DNA pools contain sequence errors or are overdispersed, the power of sequencing-based pooled functional assays (e.g.massively parallel reporter assays ( 48)) can suffer because: (i) constructs with errors may not provide useful data; (ii) more reads are needed to assay low abundance constructs and (iii) some sequence errors (such as errors result in premature stop codons) may contaminate selection schemes.To demonstrate the capability of our platform to address these issues by constructing error-free well-balanced pools, we arrayed 111 clones carrying sequence perfect ORFs, grew replicates of these arrays both in liquid multi-well plates and on agar pads, and pooled each replicate separately.For each replicate, we extracted plasmids and ONT sequenced the library (without an additional amplification).We found that demultiplexing with BPS and re-pooling drastically improved the uniformity of the pool (Figure 3 D).ORFs that were highly overrepresented following LASSO capture were not overrepresented in the balanced libraries.The relative abundance of each ORF in a balanced pool was highly correlated between replicate pools using the same outgrowth procedure [Pearson's r = 0.73, P < 2.2 × 10 -16 in liquid vs. liquid, Pearson's r = 0.82, P < 2.2 × 10 -16 in agar versus agar] or different outgrowth procedures [Pearson's r = 0.82, P < 2.2 × 10 -16 in liquid versus agar] (Figure 3 E), suggesting that abundance differences are due to reproducible growth rate differences between clones.

Discussion
We have developed a Bacterial Positioning System (BPS): an in vivo plasmid barcoding platform for high-throughput sequence validation of plasmid DNA.In contrast to in vitro barcoding methods that require each sample to be prepared independently (using microwell plates, seals, thermocycler time, high fidelity polymerase, and primers for individual barcoding reactions), BPS barcodes plasmid DNA in vivo, enabling arrays of cells to be pooled prior to sample processing.This dramatically reduces the cost and hands-on time required, and PAGE 11 OF 13 using BPS with low-overhead ONT sequencing enables most labs to flexibly process hundreds to thousands of plasmids.
One application of high-throughput plasmid sequencing is demultiplexing of DNA pools that are the products of nextgeneration DNA synthesis ( 49 ), pooled DNA assembly (50)(51)(52) or pooled DNA capture ( 47 ,53 ) methods.While construction of these pooled libraries is inexpensive relative to constructing each design independently, demultiplexed plasmids are required for many testing modalities (e.g.mass spectrometry, microscopy and enzymatic assays) and for downstream DNA engineering.We have previously developed an arrayed in vivo barcoding platform in S. cerevisiae and used it to demultiplex and sequence verify pools of oligonucleotides encoding gRNA variable regions ( 54 ).However, that platform requires that both the barcode and the DNA-to-be-sequenced are placed at a similar location in the yeast genome.This limitation, in combination with the slower growth rate relative to E. coli , makes the yeast platform too cumbersome for most applications.Another demultiplexing solution is dial-out PCR (55)(56)(57)(58), which uses pre-designed unique tags to prime specific sequences from a pool.Although this in vitro approach is expected to recover designs at low relative abundance, it is expensive and time consuming to scale up: isolation of each design requires a PCR with a unique set of primers.Byproducts in the retrieved sequences by dial-out pcr also limit its applications ( 56 ).Scalable low-cost plasmid sequencing with technologies such as BPS offers an alternative 'shotgun' approach (59)(60)(61)(62) to demultiplexing: instead of isolating each design by bespoke methods, clones are oversampled from a pool and sequenced, with the aim of recovering a large fraction of designs.
However, shotgun demultiplexing approaches have several constraints that limit their utility.Similar to the high sequencing depths required for shotgun sequencing, the number of clones sampled (sampling depth) must be several fold the size of a library to have a good chance of recovering most designs, even when the frequency dispersion is low.Frequency dispersion was not low in some of our experiments, meaning that, even at high cloning depths, we would be unable to recover many designs (Figure 2 G).Frequency dispersion can be introduced at several steps before and during the BPS protocol: DNA library construction (e.g.pooled DNA synthesis), PCR amplification, integration into a plasmid backbone, transformation into host cells, and cell outgrowth.In our experiments, we find that the measures of oligonucleotide design abundances have the lowest correlation before and after the cloning step.Several factors such as the size, initial concentration, complexity of a DNA library ( Supplementary Figure S5 ), differences in the efficiency of plasmid integration between designs, undersampling of transformants, and jackpotting of some designs following transformation may all contribute to dispersion.Despite these potential sources of variation, several groups have achieved pooled design libraries with relatively low levels of dispersion and high recovery rates ( 5 ,50 ).More research is needed to determine which differences between protocols contribute to this cloning variance and how it can be minimized.Nevertheless, shotgun demultiplexing is a strategy made viable by inexpensive plasmid screening as described here, and even overdispersed pools could be made useful when an investigator only needs to sparsely sample the design space ( 4 ) (e.g.randomly sampling 10 3 designs from a pool of 10 6 ).In addition, demultiplexing and subsequent pooling of sequence-verified clones could be used to transform error-prone overdispersed pools into error-free low-dispersion pools that can be more cost-effectively assayed by sequencing readouts.
BPS provides two methods for sequencing of plasmid DNA: (i) a DNA block of interest is transferred to become adjacent to a positional barcode (Figure 1 B) or (ii) a positional barcode is transferred into a plasmid of interest (Figure 1 C).Both methods use ONT long-read sequencing, meaning that large DNA blocks and / or plasmids with repetitive features (tandem repeats and long interspersed repeats) can be more accurately characterized than by short read methods.The first method is readily applicable to routine sequence validation of DNA blocks (e.g.oligonucleotides, genes and variants) that have been synthesized, captured, or assembled.Practicing this method only requires cloning of DNA blocks of interest into the donor plasmid using standardized approaches.The provided donor plasmid backbone contains a multicloning site including 8-mer AscI and NotI restriction sites to insert DNA blocks, and we routinely insert DNA blocks into linearized plasmid using ligation or Gibson assembly in 96-position arrays.The donor plasmid also contains a required R6K ori γ , facilitating plasmid propagation in the donor cell carrying pir-116 or pir + gene, and an o riT , enabling the transfer to the recipient cell through bacterial conjugation.One limitation of the first BPS approach (Figure 1 B) is potential incompatibilities between a DNA block sequence and the donor plasmid or strain.For example, a DNA block that contains an additional origin of replication, OriT , or an element that is toxic to the donor strain may not behave properly, although we expect these cases to be rare.
With the second BPS method (Figure 1 C), the entire plasmid is sequenced, enabling not only verification of a DNA part of interest, but also detection of undesired changes to the plasmid backbone, such as point mutations, insertions, deletions, duplications, and rearrangements.However, plasmid backbones sequenced by this method must be engineered to contain a BPS 'landing pad' composed of homology regions (HRs), I-SceI restriction sites, and the relE negative selection marker.The plasmid of interest is also required to be transformed into the recipient cell that contains a helper plasmid with a temperature sensitive replication origin (pSC101 ori ) and spectinomycin resistance marker, which restricts the use of this origin or marker in the recipient plasmid.With further development of the BPS platform, it may be possible to remove the requirement for a landing pad on the recipient plasmid or the use of a helper plasmid.For example, new BPS barcode cassettes could be designed with different selection markers and / or gRNA cut sites that integrate directly at sequences that are common to most plasmids in lab use (i.e.natural landing pads).Furthermore, genetic elements that facilitate homologous recombination (I-SceI endonuclease and lambda red system) could be engineered into the chromosome of recipient cells.With these advances, BPS could be used on most plasmids without any modifications, dramatically reducing the cost and hands-on time for sequencing of most plasmid DNA.

PAGE 7 OF 13 Figure 1 .
Figure 1.Workflow and performance of BPS, an in vivo barcoding platform.( A ) Donor cell arrays (blue) containing donor plasmids are conjugated to recipient cell arra y s (gra y) containing recipient plasmids.A DNA casset te from the donor plasmid is recombined into the recipient plasmid bac kbone to join an indexing DNA barcode with a sequence of interest.Cells from one or more plates can subsequently be pooled and prepared for sequencing.( B ) In one design, a cassette with a DNA block on a donor plasmid is recombined into a recipient plasmid backbone containing a positional barcode.( C ) In another design, where a whole plasmid backbone needs to be sequenced, a 'landing pad', including (homology region for recombination) HR, I-SceI cut sites and relE negative selection marker, is firstly engineered into the plasmid backbone of interest.A donor positional barcode is then recombined into a recipient plasmid.In both ( B ) and ( C ), a scissor icon is an I-SceI cut site, HR is a region of sequence identity that mediates homologous recombination, dashed lines are homologous recombination e v ents, and + or -icons are selection (Hyg / clonNat) or negative selection (relE) markers.( D ) The recovery rate (percent of positions that were detected, orange) and accuracy (percent of detections that were sequence correct, blue) for the design in (B).Number of matings (x-axis) indicates the number of times the same DNA block was mated to a barcode and sequenced.Error bars indicate standard errors calculated by bootstrapping.( E ) In the design illustrated in (C), donor barcodes were mated to recipient plasmids and plasmid sequences at each position were assembled de novo by sequencing pools of clones.Variation from the a priori reference expectation, including insertions, deletions, and substitutions are shown by colored dots.Successful de novo assemblies are ranked by decreasing ONT read coverage.( F ) The cost of plasmid sequencing when BPS is performed at high-and low-throughput.Low-throughput assumes barcoding is performed in 96-well plates and 4608 plasmids are sequenced at 10 0-40 0 × co v erage per flo w cell.High-throughput assumes barcoding is perf ormed on 384-position agar arra y s and 9216 plasmids are sequenced at 10 0-20 0 × co v erage per flow cell.Detailed cost assumptions are listed in Supplementary TablesS1 and S2. ( G ) Experimental timeline for sequence verification by BPS.We assume a MinION flow cell contains 250 active pores generating data at 100 bases / second, enabling ∼500 (7 kb) plasmids to be sequenced at 100 × depth in 4 h.

Figure 2 .
Figure 2. Sequence verification and demultiplexing of oligonucleotide pools.( A ) Oligonucleotide pools from vendors were cloned into donor plasmids and donor cells, and randomly arra y ed.For experiments here, oligonucleotides are designed with fixed priming sites (red) and restriction endonuclease sites (blue, AscI and NotI) to facilitate cloning.( B , C ) The distribution of oligonucleotide abundances from the IDT (B) and Twist (C) pools, which contain 100 and 1100 oligonucleotide designs, respectively, assessed at multiple stages during the in vivo barcoding workflow: (i) original oligonucleotide pools after the second strand synthesis (original dsDNA), (ii) f ollo wing 14-20 rounds of PCR amplification (Amplicon dsDNA), (iii) random clones after inserting the oligonucleotides into plasmid backbones and cells and randomly plating for clones (Randomly Plated Clones), (iv) and parsed clones after in vivo barcoding and sequencing (Arra y ed Clones).Pairwise comparisons of the abundance of each oligonucleotide design at different stages are displa y ed in scatter plots with Pearson correlation coefficients ( r ).The level of corresponding significance is indicated by asterisks (*** P < 10 −7 ).Histograms represent frequency distributions of oligonucleotide designs with different abundance le v els.Gini coefficients, G , are used to quantify the degree of pool dispersion.A uniform distribution has a Gini coefficient of 0. ( D , F ) The change of normalized abundance of each oligonucleotide design from amplicons to clones for the IDT and Twist pools, respectively.G denotes the increase in Gini coefficients from the Amplified dsDNA to Randomly Plated Clones pools.( E , G ) R eco v er y cur v es f or all oligonucleotide designs at different sampling depths f or the IDT and Twist pools, respectiv ely.T he e xpected reco v ery rate at different in silico sampling depths is calculated by assuming the probability of sampling one oligonucleotide design equals its empirical frequency observed in the corresponding pool.

Figure 3 .
Figure 3. Sequence validation and construction of a balanced ORFs library.( A ) Pooled capture of E. coli ORFs and cloning into the BPS donor cells.ORF libraries were amplified from E. coli genomic DNA using Long-Adaptor Single-Stranded Oligonucleotide (LASSO) probes and inserted into a Gateway v ector.T hen, ORFs w ere amplified b y PCR and cloned into BPS donor plasmids and cells.( B ) A Sank e y diagram sho wing the sequencing results of the ORF library f ollo wing BPS.Mixed clones (orange, purity score < 0.9, see Materials and Methods) indicate that the same position on an array is likely to contain more than one ORF.Pure clones (blue, purity score ≥ 0.9) are further classified into sequence perfect ORFs, ORFs with errors (point mutations or indels), off-target DNA (cannot be mapped to the target ORFs but can be mapped to the E. coli genome), and others (cannot be mapped to any region of the E. coli genome).( C ) The length and abundance of sequence-perfect ORFs identified from pure donor clones.( D ) The distribution of normalized ORF abundance in the original amplicon pools after PCRing from genomic DNA (original amplicons), after cloning into Gate w a y v ector plasmids (original plasmids), and after re-pooling f ollo wing gro wth of sequence perfect clones in liquid or on agar.ORFs w ere rank ed according to their abundance from lo w est to highest.Gini coefficients were calculated to quantify the degree of uniformity of ORF abundances in a pool.A lower value indicates a higher degree of uniformity.( E ) The correlation of the relative abundance of each ORF from balanced libraries constructed using agar and liquid protocols.