-
PDF
- Split View
-
Views
-
Cite
Cite
Christophe Gaillochet, Ward Develtere, Thomas B Jacobs, CRISPR screens in plants: approaches, guidelines, and future prospects, The Plant Cell, Volume 33, Issue 4, April 2021, Pages 794–813, https://doi.org/10.1093/plcell/koab099
- Share Icon Share
Abstract
Clustered regularly interspaced short palindromic repeat (CRISPR)-associated systems have revolutionized genome engineering by facilitating a wide range of targeted DNA perturbations. These systems have resulted in the development of powerful new screens to test gene functions at the genomic scale. While there is tremendous potential to map and interrogate gene regulatory networks at unprecedented speed and scale using CRISPR screens, their implementation in plants remains in its infancy. Here we discuss the general concepts, tools, and workflows for establishing CRISPR screens in plants and analyze the handful of recent reports describing the use of this strategy to generate mutant knockout collections or to diversify DNA sequences. In addition, we provide insight into how to design CRISPR knockout screens in plants given the current challenges and limitations and examine multiple design options. Finally, we discuss the unique multiplexing capabilities of CRISPR screens to investigate redundant gene functions in highly duplicated plant genomes. Combinatorial mutant screens have the potential to routinely generate higher-order mutant collections and facilitate the characterization of gene networks. By integrating this approach with the numerous genomic profiles that have been generated over the past two decades, the implementation of CRISPR screens offers new opportunities to analyze plant genomes at deeper resolution and will lead to great advances in functional and synthetic biology.
Introduction
The emergence of genome editing tools has offered great opportunities for deciphering gene function and engineering beneficial traits in a wide range of biological systems. Among these tools, the clustered regularly interspaced short palindromic repeat (CRISPR)-associated (Cas) nuclease system has been harnessed to selectively introduce mutations across life kingdoms (Knott and Doudna, 2018). The simplicity and modularity of CRISPR systems have led to a great expansion of the genetic toolbox and the diversification of its applications in various organisms (Knott and Doudna, 2018). In particular, the building of large CRISPR libraries has driven the implementation of forward genetic screens in which mutations are permanently introduced at the genome-scale with high precision (Smith et al., 2017). These recent improvements in screening methodologies have primarily been used for functional genomics and for drug target discovery in mammalian cell systems (Shalem et al., 2015; Behan et al., 2019). Furthermore, saturation mutagenesis has led to the development of high-throughput protein engineering (Ma et al., 2017; Kweon et al., 2020; Li et al., 2020).
Besides its applications in animal systems, the CRISPR-Cas system has been optimized for plant genome editing, and a large toolbox is now available to engineer many plant species including crops (Manghwar et al., 2019; Zhang et al., 2019b). In contrast to small-scale gene knockout experiments, the application of high-throughput screening in plants is still in its infancy compared to animal cell culture systems. We anticipate that CRISPR screens will become a widely used functional genomics tool for plant research in the coming years and will significantly expand our understanding of plant gene function and facilitate the engineering of crop traits. In this review, we discuss the principles and methods for building high-throughput CRISPR screening platforms as well as the specific challenges and advantages when implementing this technology in plants. We describe recent studies that have used this approach to build plant knockout mutant collections or to engineer and evolve new genetic variants. Finally, we discuss the future potential of CRISPR screening and its applications in various plant species.
CRISPR-Cas genome editing tools
At the molecular level, the CRISPR-Cas system is composed of two core components that form a ribonucleic complex: the CRISPR-Cas9 nuclease and the guide RNA (gRNA) (Figure 1). The gRNA is a chimeric molecule composed of the trans-activating CRISPR RNA (tracrRNA), which is required for binding Cas9, and the CRISPR RNA (crRNA), a 20-nt sequence complementary to the target DNA (protospacer; Deltcheva et al., 2011; Jinek et al., 2012; Jiang and Doudna, 2017). Base pairing of the gRNA with DNA forms an RNA-DNA heteroduplex, which promotes Cas9-mediated cleavage of double-stranded DNA through the activity of two catalytic domains: the HNH and RuvC domains, which cleave the target and non-target strands, respectively (Jinek et al., 2012; Nishimasu et al., 2018). The Streptococcus pyogenes Cas9 (SpCas9) binds DNA at NGG protospacer adjacent motif (PAM) sequences and typically introduces a blunt, DNA double-strand break (DSB) between the third and fourth nucleotides away from the PAM (Figure 1; Jinek et al., 2012; Sternberg et al., 2014). In a cell, the endogenous DNA repair machinery is then activated to repair the DSBs. Imperfect repair introduces insertions or deletions (indels) at the break site. When these indels appear in coding sequences, frameshifts or premature stop codons can disrupt the expression of the gene product, whereas deletions in regulatory regions can significantly alter gene expression (Jinek et al., 2012; Cong et al., 2013; Canver et al., 2015; Rodríguez-Leal et al., 2017). The DNA recognition capabilities and nuclease activity of Cas enzymes can be uncoupled by inactivating the two catalytic domains of Cas9 by introducing two amino acid substitutions (D10A, H840A; Jinek et al., 2012). The resulting Cas9, referred to as dead Cas9 (dCas9), binds to target DNA without generating a DNA break and has been used as a platform to recruit enzymes or protein tags at specific genomic sites and introduce modifications such as point mutations, changes in transcriptional responses, or changes in chromatin accessibility (Gilbert et al., 2013; Deng et al., 2015; Komor et al., 2016; Liu et al., 2016).

CRISPR-Cas genome editing tools. A, The Cas9–gRNA complex, which targets DNA complementary to the gRNA sequence 3′ of the PAM. Following imperfect DNA repair, nucleotide deletions or insertions can be generated at the target loci. B, Base editors are built by fusing Cas9 (D10A; nCas9) or Cas9 (D10A H840A; dCas9) to cytidine or adenine deaminase domains. CBEs preferentially introduce C to T mutations, while ABEs catalyze A to G mutations. Base editors deaminate nucleotides in a restricted editing window (green nucleotides). C, CRISPRi or CRISPRa. dCas9 is fused to transcriptional repressors or activation domains and can modulate RNA levels. Red triangle: nuclease-cutting site. D10A and H840A are Cas9 mutations that inactivate nuclease domains.
Base editors have been designed by fusing dCas9 or nickase Cas9 (Cas9(D10A); nCas9; Figure 1B) with cytidine deaminases; the resulting base editors catalyze cytidine (C) to thymine (T) nucleotide transitions (Komor et al., 2016). Upon binding of the complex to the target DNA, Cs on the nontarget strand are deaminated into uracil (U) (Komor et al., 2016). After preferential repair of the nonedited strand by the DNA repair machinery, a stable transition from C to T is introduced. The transversion of C to guanine (G) can also result from incorrect repair (Hess et al., 2016; Komor et al., 2016). Along the same lines, adenine (A) base editors mutate A into G by deaminating A into inosine (I), which is then recognized by the cellular repair machinery as G (GaudelLi et al., 2017). Base editors operate under a restricted editing window on the DNA, and the choice and combination of nuclease variants with deaminases have an impact on which nucleotides are targeted (Rees and Liu, 2018). For example, the cytidine deaminase base editor BE3, which uses nCas9 fused to APOBEC1, shows strong activity on nucleotides +4 to +8 of the protospacer (Komor et al., 2016). In contrast, CRISPR-X uses a two-component system comprising dCas9 and a modified gRNA-recruiting AIDΔ deaminase that can introduce point mutations within a window range of 100-bp centered around the PAM (Hess et al., 2016). Therefore, the choice of the base editor can be tailored for precise or broad editing windows.
In addition to introducing stable genetic mutations, dCas9 can also be used to modulate the transcription of specific genes (Figure 1C). The CRISPR activation (CRISPRa) system uses transcriptional activators, while CRISPR interference (CRISPRi) utilizes repressors (Gilbert et al., 2013). Transcriptional activators are designed by fusing dCas9 with a transactivation domain such as VP64 (Gilbert et al., 2013; Konermann et al., 2015; Lowder et al., 2018), p65, or Rta (Chavez et al., 2015), which can be used in tandem or in repeats to increase efficiency. The successful use of the CRISPRa system has been demonstrated in plants, providing a complementary strategy for investigating gene function by enhancing gene expression (Li et al., 2017; Lowder et al., 2018). On the other hand, a repressor domain such as KRAB (Gilbert et al., 2013) or the plant SRDX (Piatek et al., 2015; Tang et al., 2017) fused to dCas9 can efficiently suppress the expression of the targeted genes.
Comparative approaches for high-throughput screening
Functional genetics aims to understand gene function by characterizing how genes interact with the environment and each other to result in specific phenotypes. Forward genetic screens are powerful tools for deciphering gene function without a priori knowledge. This approach starts with a population containing genetic variation (natural or mutagenized), followed by the selection of individuals with phenotypes deviating from the wild-type (Weigel and Glazebrook, 2006; Alonso-Blanco et al., 2016; McCouch et al., 2016). Genes are then associated with specific phenotypes using a variety of genetic mapping techniques and tools (Jander et al., 2002; Schneeberger et al., 2009; Ogura and Busch, 2016). This approach has been used successfully for several decades across all fields of biology. Building and characterizing large mutagenized populations are critical for the success of this approach. This has largely been done in plants by randomly generating point mutations via physico-chemical treatments, transforming plant populations with T-DNA or transposon systems, or gene silencing (Table 1; Alonso et al., 2003; Kolesnik et al., 2004; Hauser et al., 2013; Wang et al., 2013; Lu et al., 2018).
Method . | Genetic Perturbation . | Mutagenesis Induction . | Off-targets . | Mutant Population Size . | Ease of Gene Isolation . | Overcomes Redundancy . |
---|---|---|---|---|---|---|
EMS-TILLING | SNV | Untargeted | High (>100 / line)a | Small to medium | − | − |
T-DNA Insertion | Insertion | Untargeted | Low (1.5insertion / line)b | Large | + | − |
amiRNA | Transcriptional repression | Targeted | Low and predictablec | Large | ++ | + |
CRISPR | indels, SNV, transcriptional activation, and repression | Targeted | Low and predictabled | Large | ++ | + |
Method . | Genetic Perturbation . | Mutagenesis Induction . | Off-targets . | Mutant Population Size . | Ease of Gene Isolation . | Overcomes Redundancy . |
---|---|---|---|---|---|---|
EMS-TILLING | SNV | Untargeted | High (>100 / line)a | Small to medium | − | − |
T-DNA Insertion | Insertion | Untargeted | Low (1.5insertion / line)b | Large | + | − |
amiRNA | Transcriptional repression | Targeted | Low and predictablec | Large | ++ | + |
CRISPR | indels, SNV, transcriptional activation, and repression | Targeted | Low and predictabled | Large | ++ | + |
Method . | Genetic Perturbation . | Mutagenesis Induction . | Off-targets . | Mutant Population Size . | Ease of Gene Isolation . | Overcomes Redundancy . |
---|---|---|---|---|---|---|
EMS-TILLING | SNV | Untargeted | High (>100 / line)a | Small to medium | − | − |
T-DNA Insertion | Insertion | Untargeted | Low (1.5insertion / line)b | Large | + | − |
amiRNA | Transcriptional repression | Targeted | Low and predictablec | Large | ++ | + |
CRISPR | indels, SNV, transcriptional activation, and repression | Targeted | Low and predictabled | Large | ++ | + |
Method . | Genetic Perturbation . | Mutagenesis Induction . | Off-targets . | Mutant Population Size . | Ease of Gene Isolation . | Overcomes Redundancy . |
---|---|---|---|---|---|---|
EMS-TILLING | SNV | Untargeted | High (>100 / line)a | Small to medium | − | − |
T-DNA Insertion | Insertion | Untargeted | Low (1.5insertion / line)b | Large | + | − |
amiRNA | Transcriptional repression | Targeted | Low and predictablec | Large | ++ | + |
CRISPR | indels, SNV, transcriptional activation, and repression | Targeted | Low and predictabled | Large | ++ | + |
Chemical mutagenesis in plants most often involves the use of ethyl methanesulfonate (EMS) to induce multiple mutations (primarily C to T transitions) to create M1 plant populations (Table 1). It has been estimated that ∼50,000 M1 lines are required for saturation mutagenesis at all G:C bases in the Arabidopsis thaliana genome; this number can vary depending on genome size and mutation rate (Jander et al., 2003, Henry et al., 2014). Mutagenesis in forward genetic screens can nevertheless be used at sub-saturation levels to identify several mutant lines per gene from a relatively small population (<5,000 M1 plants for Arabidopsis; Page and Grossniklaus, 2002; Østergaard and Yanofsky, 2004; Tsai et al., 2013; Henry et al., 2014; Lu et al., 2018). As the mutant lines contain multiple mutations in addition to the causal mutation, it is necessary to perform several backcrosses and select only the progeny carrying the desired phenotype. The technical simplicity of this method allows it to be used on many plant species with limited genome sequence information. This method has also been successfully implemented to identify mutations that enhance or suppress various mutant plant phenotypes (Page and Grossniklaus, 2002). Because the changes in DNA sequences produced by this method are not targeted to specific regions of the genome, the identification of causal mutations requires genetic mapping. The line of interest is crossed with a plant from a different genetic background to obtain a mapping population in the progeny. Using genetic markers and massively parallel sequencing-based mapping, the causal genetic region for the phenotype can then be efficiently identified (Jander et al., 2002; Page and Grossniklaus, 2002; Schneeberger et al., 2009). However, mapping causal mutations using this strategy can be time consuming, especially when analyzing plant species with long generation times (Watson et al., 2018).
In contrast to chemical mutagenesis, plant populations have also been mutagenized by inserting Agrobacterium tumefaciens T-DNA carrying herbicide or antibiotic resistance genes or vectors carrying Ac/Ds or Mu transposons and a transposase (May et al., 2003; Kolesnik et al., 2004). In general, mutants generated with T-DNAs or transposons contain few insertions, and thus the establishment of a near saturation population using this approach requires larger populations than with EMS mutagenesis (Table 1; Østergaard and Yanofsky, 2004). One main advantage of this approach is the relative ease of causal gene identification. As the T-DNA or transposon sequence is known, insertion sites can be identified using thermal asymmetric interlaced polymerase chain reaction (TAIL-PCR) or other sequencing methods that do not require the establishment of a mapping population (Alonso et al., 2003; Williams-Carrier et al., 2010).
One limitation of both EMS screening and insertion mutagenesis methods is their low capacity to simultaneously target multiple genes of the same pathway in a single mutant line and thus their inability to identify functionally redundant genes (Tsai et al., 2013; Henry et al., 2014; Lu et al., 2018). In contrast, RNA interference and artificial microRNA (amiRNA) systems can be used to silence multiple genes by targeting conserved regions, thereby allowing the knockdown of potentially functionally redundant genes (Mlotshwa et al., 2002; Schwab et al., 2006). Genome-scale RNAi and amiRNA collections have been built and can be used for forward genetic screening with the possibility to identify redundant genes (Hauser et al., 2013; Zhang et al., 2018). However, these strategies generally result in incomplete knockout phenotypes, as some protein is still produced, and are prone to off-targets (Echeverri et al., 2006; Jackson et al., 2006).
In recent years, the programmable nature of the CRISPR-Cas system has enabled large knockout mutant libraries to be built for forward genetic screens. The main advantage of using CRISPR for screening lies in its ease of identifying causal genes and relatively high specificity (Table 1; Lu et al., 2017; Meng et al., 2017; Liu et al., 2020). Furthermore, the possibility of multiplexing allows multiple genes to be targeted simultaneously and can thus overcome genetic redundancy and enable saturation mutagenesis using much smaller populations. Finally, the versatility of the CRISPR toolbox enables researchers to generate not only indels and loss of function mutations, but also a range of genome perturbations such as specific single nucleotide variants (SNVs) or transcriptional changes that are not possible with other mutagenic or knockdown technologies (Gilbert et al., 2013; Komor et al., 2016; GaudelLi et al., 2017).
Applications of CRISPR screening in plants
Generation of mutant collections
In contrast to animal cell systems, in which CRISPR screens are widely utilized for forward genetic studies, only a few such studies have been reported in plants (Table 2). Currently, the most extensive pooled CRISPR mutant collections have been reported in rice (Oryza sativa) using T-DNAs containing a single gRNA per vector. Lu et al. targeted 34,234 nontransposable element protein-coding genes (∼83% of all genes) of the rice genome in variety Zhonghua 11 using a library of 88,541 gRNAs and 2 to 3 gRNAs per gene. They generated a T0 mutant population of 91,004 individuals, corresponding to a 1X gRNA coverage, with a mutation rate of ∼80% (Lu et al., 2017). In parallel, another rice CRISPR mutant collection in the same variety was created with approximately 14,000 T0 mutant plants. The library targeted 12,802 highly expressed genes in shoot tissues using two gRNAs per gene (Meng et al., 2017). Based on their sequence assessment of 139 plants, the mutation rate was 63.3%, including 33.1% homozygous, 10.8% biallelic, and 19.4% heterozygous mutations in T0 plants, confirming the efficiency and specificity of their gRNAs. Both groups identified lines with mutant phenotypes and were able to associate them with previously characterized genes. Novel phenotypes were also observed, and candidate genes could be identified by gRNA and target site analyses.
Screening Purpose . | Editing Tool . | Species . | gRNA Library Size . | Number of Target Genes . | Number of gRNAs per Vector . | Mutantcollection Size . | References . |
---|---|---|---|---|---|---|---|
Gene Knockout | SpCas9 | Tomato | 165 | 54 | 1 | 31 (T0) | Jacobs et al., 2017 |
SpCas9 | Tomato | 36 | 18 | 3 | 59 (T0) | Jacobs et al., 2017 | |
SpCas9 | Rice | 88,541 | 34,234 | 1 | 91,004 (T0) | Lu et al., 2017 | |
SpCas9 | Rice | 25,604 | 12,802 | 1 | 14,000 (T0) | Meng et al., 2017 | |
SpCas9 | Soybean | 70 | 102 | 1 | 407 (T0) | Bai et al., 2019 | |
SpCas9 | Maize | 1,368 | 1,244 | 1 | 4,356 (T0) | Liu et al., 2020 | |
Sequence Diversification | SpCas9 | Tomato | 8 | 1 (CLV3) | 8 | 1,152 (F1) | Rodríguez-Leal et al., 2017 |
SpCas9 | Tomato | 8 | 1 (S) | 8 | 326 (F1) | Rodríguez-Leal et al., 2017 | |
SpCas9 | Tomato | 8 | 1 (SP) | 8 | 81 (F1) | Rodríguez-Leal et al., 2017 | |
SpCas9 | Rice | 119 | 1 (SF3B1) | 1 | 6 (T0) | Butt et al., 2019 | |
ABE and CBE | Rice | 63 | 1 (OsALS1) | 1 | 113 (T0) | Kuang et al., 2020 | |
STEME-1 STEME-NG | Rice | 200 | 1 (OsACC) | 1 | 19 (T0) | Li et al., 2020 |
Screening Purpose . | Editing Tool . | Species . | gRNA Library Size . | Number of Target Genes . | Number of gRNAs per Vector . | Mutantcollection Size . | References . |
---|---|---|---|---|---|---|---|
Gene Knockout | SpCas9 | Tomato | 165 | 54 | 1 | 31 (T0) | Jacobs et al., 2017 |
SpCas9 | Tomato | 36 | 18 | 3 | 59 (T0) | Jacobs et al., 2017 | |
SpCas9 | Rice | 88,541 | 34,234 | 1 | 91,004 (T0) | Lu et al., 2017 | |
SpCas9 | Rice | 25,604 | 12,802 | 1 | 14,000 (T0) | Meng et al., 2017 | |
SpCas9 | Soybean | 70 | 102 | 1 | 407 (T0) | Bai et al., 2019 | |
SpCas9 | Maize | 1,368 | 1,244 | 1 | 4,356 (T0) | Liu et al., 2020 | |
Sequence Diversification | SpCas9 | Tomato | 8 | 1 (CLV3) | 8 | 1,152 (F1) | Rodríguez-Leal et al., 2017 |
SpCas9 | Tomato | 8 | 1 (S) | 8 | 326 (F1) | Rodríguez-Leal et al., 2017 | |
SpCas9 | Tomato | 8 | 1 (SP) | 8 | 81 (F1) | Rodríguez-Leal et al., 2017 | |
SpCas9 | Rice | 119 | 1 (SF3B1) | 1 | 6 (T0) | Butt et al., 2019 | |
ABE and CBE | Rice | 63 | 1 (OsALS1) | 1 | 113 (T0) | Kuang et al., 2020 | |
STEME-1 STEME-NG | Rice | 200 | 1 (OsACC) | 1 | 19 (T0) | Li et al., 2020 |
Screening Purpose . | Editing Tool . | Species . | gRNA Library Size . | Number of Target Genes . | Number of gRNAs per Vector . | Mutantcollection Size . | References . |
---|---|---|---|---|---|---|---|
Gene Knockout | SpCas9 | Tomato | 165 | 54 | 1 | 31 (T0) | Jacobs et al., 2017 |
SpCas9 | Tomato | 36 | 18 | 3 | 59 (T0) | Jacobs et al., 2017 | |
SpCas9 | Rice | 88,541 | 34,234 | 1 | 91,004 (T0) | Lu et al., 2017 | |
SpCas9 | Rice | 25,604 | 12,802 | 1 | 14,000 (T0) | Meng et al., 2017 | |
SpCas9 | Soybean | 70 | 102 | 1 | 407 (T0) | Bai et al., 2019 | |
SpCas9 | Maize | 1,368 | 1,244 | 1 | 4,356 (T0) | Liu et al., 2020 | |
Sequence Diversification | SpCas9 | Tomato | 8 | 1 (CLV3) | 8 | 1,152 (F1) | Rodríguez-Leal et al., 2017 |
SpCas9 | Tomato | 8 | 1 (S) | 8 | 326 (F1) | Rodríguez-Leal et al., 2017 | |
SpCas9 | Tomato | 8 | 1 (SP) | 8 | 81 (F1) | Rodríguez-Leal et al., 2017 | |
SpCas9 | Rice | 119 | 1 (SF3B1) | 1 | 6 (T0) | Butt et al., 2019 | |
ABE and CBE | Rice | 63 | 1 (OsALS1) | 1 | 113 (T0) | Kuang et al., 2020 | |
STEME-1 STEME-NG | Rice | 200 | 1 (OsACC) | 1 | 19 (T0) | Li et al., 2020 |
Screening Purpose . | Editing Tool . | Species . | gRNA Library Size . | Number of Target Genes . | Number of gRNAs per Vector . | Mutantcollection Size . | References . |
---|---|---|---|---|---|---|---|
Gene Knockout | SpCas9 | Tomato | 165 | 54 | 1 | 31 (T0) | Jacobs et al., 2017 |
SpCas9 | Tomato | 36 | 18 | 3 | 59 (T0) | Jacobs et al., 2017 | |
SpCas9 | Rice | 88,541 | 34,234 | 1 | 91,004 (T0) | Lu et al., 2017 | |
SpCas9 | Rice | 25,604 | 12,802 | 1 | 14,000 (T0) | Meng et al., 2017 | |
SpCas9 | Soybean | 70 | 102 | 1 | 407 (T0) | Bai et al., 2019 | |
SpCas9 | Maize | 1,368 | 1,244 | 1 | 4,356 (T0) | Liu et al., 2020 | |
Sequence Diversification | SpCas9 | Tomato | 8 | 1 (CLV3) | 8 | 1,152 (F1) | Rodríguez-Leal et al., 2017 |
SpCas9 | Tomato | 8 | 1 (S) | 8 | 326 (F1) | Rodríguez-Leal et al., 2017 | |
SpCas9 | Tomato | 8 | 1 (SP) | 8 | 81 (F1) | Rodríguez-Leal et al., 2017 | |
SpCas9 | Rice | 119 | 1 (SF3B1) | 1 | 6 (T0) | Butt et al., 2019 | |
ABE and CBE | Rice | 63 | 1 (OsALS1) | 1 | 113 (T0) | Kuang et al., 2020 | |
STEME-1 STEME-NG | Rice | 200 | 1 (OsACC) | 1 | 19 (T0) | Li et al., 2020 |
At the same time, a CRISPR library was developed in tomato (Solanum lycopersicum) to investigate the functions of leucine-rich repeat receptor-like kinase subfamily XII (LRR-XII) genes (Jacobs et al., 2017). Fifty-four subfamily members were targeted in a pooled transformation using three gRNAs per gene, and 31 T0 plants were recovered with a mutation rate of 62.5%. Nearly half of all subfamily members were mutagenized in a single transformation. To increase throughput and overcome potential redundancy within gene families, multiplexing CRISPR vectors were then used to target 18 genes encoding putative transporters identified by differential expression analysis. Using combinatorial vector assembly with three gRNAs per vector and pooled plant transformation, T0 plants were recovered with up to five mutated genes, thus demonstrating the potential of generating higher-order mutant plants by combinatorial targeting (Jacobs et al., 2017). Fourteen of the 18 genes were mutated at least 3 times, demonstrating that higher replication was possible using the multiplex approach. This replication made it possible to associate two developmental phenotypes with the underlying mutated genes.
Multiplex mutagenesis can also be achieved by designing gRNAs that target homologous regions of paralogous genes. Such a CRISPR library was created in soybean (Glycine max) to target duplicated genes (Bai et al., 2019). The authors observed a 60% editing efficiency and generated double and triple homozygous mutant lines in the T1 generation, which displayed altered nodule numbers (Bai et al., 2019). As many plant species (notably many of the important crops) have undergone extensive gene and/or genome duplication, this study highlights the potential of using CRISPR libraries to overcome genetic redundancy in plants.
Another CRISPR mutant collection was recently created in maize (Zea mays; Table 2; Liu et al., 2020). A set of 1,244 candidate genes was selected by quantitative trait locus (QTL) analysis or comparative genomics. Designing gRNAs for CRISPR screens requires previous knowledge of genome sequences. As the authors used the nonreference maize variety KN5585, they first obtained the gene coding sequences using de novo genome assembly. They then designed 1,368 gRNAs and individually cloned them in a vector to target single genes. About 4,356 T0 plants were isolated, and subsampling showed that the mutation rate was ∼85% (449/531 plants with mutation). Of the 119 T1 families analyzed, they isolated 107 plants showing developmental phenotypes in the field (such as altered flowering time, plant height, or architecture) and identified the putative causal genes (Liu et al., 2020). This study confirmed the activities of previously known genes and identified novel mutants with interesting phenotypes. This work provides a model for performing a CRISPR screen on cultivars or species with limited public genomic information.
Targeted sequence diversification
The engineering of new genetic variants involves diversifying gene sequences and creating allelic series to identify proteins with novel functions. Traditionally, this has been conducted in plants using two different approaches based on random mutagenesis: Targeting Induced Local Lesions in Genomes (TILLING) and error-prone PCR. The TILLING method, which involves mutagenizing plants by treating them with chemicals (e.g. EMS) or irradiation, has been applied to many crops including wheat (Triticum aestivum), rice, and tomato (Jacob et al., 2018). The first generation (M1) is self-fertilized and propagated to create individual M2 lines. DNA is isolated from multiple siblings of individual M2 lines and pooled. Gene-specific PCR amplification and massive parallel sequencing can then be used to identify genetic variants in the desired gene or region of interest and the mutations are traced back to M2 lines for further characterization. As such, TILLING collections can be used as a resource to repeatedly screen for mutant genes and phenotypes. The TILLING approach has been widely applied to crops to identify genetic variants underlying phenotypic traits such as pathogen resistance and pod shattering (Acevedo-Garcia et al., 2017; Jacob et al., 2018; Stephenson et al., 2019). Genetic variants can also be created by error-prone PCR followed by genetic transformation. The gene of interest is amplified under suboptimal buffer conditions with a DNA polymerase lacking proofreading activity to promote the introduction of SNVs. The resulting gene variants are then transformed into plants to test their activity (McCullum et al., 2010). This method has been used to engineer broad-spectrum resistance in R-genes, such as the NB-LRR Rx gene that confers pathogen resistance in potato, or to further characterize protein structure–function (Farnham and Baulcombe, 2006; Harris et al., 2013).
In contrast to the random mutagenesis performed using these two approaches, CRISPR-Cas allows the user to specifically target gene sequences and can encourage certain repair outcomes in vivo. Rodríguez-Leal et al., 2017 made use of this feature by targeting the regulatory regions of the tomato CLAVATA3 (SlCLV3) promoter using Cas9. SlCLV3 expression controls locule number and fruit size in tomato (Xu et al., 2015; Rodríguez-Leal et al., 2017). Using eight gRNAs targeting a 2-kb region, they generated multiple T0 lines carrying deletions and subsequently used one of them as a parent for backcrossing with the wild-type. They obtained a mutant collection of 1,152 F1 plants and took advantage of continued CRISPR activity on the wild-type DNA to create an allelic series within the SiCLV3 promoter. This led to the generation of plants showing a continuum of variation in the number of fruit locules (Rodríguez-Leal et al., 2017). Remarkably, they then used this editing strategy to engineer plant architecture variation by targeting the COMPOUND INFLORESCENCE (S) and SELF PRUNING promoters using a smaller segregating population of 326 and 81 F1 plants, respectively, demonstrating that their strategy can be applied to a number of genes to fine-tune quantitative trait variation (Rodríguez-Leal et al., 2017).
Along a similar line, Butt et al. used Cas9-generated indels to evolve the spliceosome and identify variants that are resistant to splicing inhibitors in rice (Butt et al., 2019). They transformed 119 gRNAs targeting all possible sites on the SPLICING FACTOR 3B1 (SF3BI) coding sequence into 15,000 rice calli and applied negative selective pressure with the splicing inhibitor GEX1A during regeneration. As knockout deletions of SF3BI are lethal, only in-frame or heterozygous mutants were recovered. From this scheme, they recovered 21 resistant rice shoots and identified three amino acid deletions that conferred resistance to GEX1A (Butt et al., 2019). They were also able to recover three additional alleles with only 3,000 rice calli using a focused approach targeting known drug-interacting domains with four gRNAs.
Another emerging strategy for sequence diversification makes use of base editors. A cytidine base editor system based on the hyperactive AID deaminase (Hess et al., 2016) and an adenine base editor were used on separate vectors to target the rice ACETOLACTATE SYNTHASE1 (OsALS1) gene to generate genetic variants that confer resistance to the herbicide bispyribac-sodium (Kuang et al., 2020). The authors constructed a library of 63 gRNAs targeting the entire OsALS1 coding sequence and observed substitutions in 20% of T0 plants. Interestingly, editing was observed in a window broader than the gRNA targeting region, suggesting that this system can be used to generate mutations more broadly on the target sequence (Kuang et al., 2020). After applying negative selection with bispyribac-sodium on approximately 3,600 calli, they recovered 113 resistant calli and identified six OsALS1 variants conferring resistance. One new variant (P171F) was never selected in previous rice breeding programs but corresponds to amino acid substitutions in a structural motif previously identified in Arabidopsis and wheat. Using the same gRNAs and base editors, they introduced the P171F substitution into the elite commercial rice cultivar Nangeng 46 and obtained herbicide-resistant plants, demonstrating the ability of this strategy to rapidly identify genetic variants and their relevance for targeted crop breeding (Kuang et al., 2020).
In parallel to this study, Li et al. developed a dual base editor system to simultaneously target genomic regions with cytidine and adenine deaminases (Li et al., 2020). The system is composed of a base editor that carries both deaminases in combination with nCas9 or the variant nCas9-NG, which recognizes NG, GAA, and GAT PAM sites (Hu et al., 2018; Nishimasu et al., 2018). This greatly broadens the number of genomic sites that can be targeted by simultaneously generating A to G and C to T mutations (Li et al., 2020). The authors targeted the ∼1.2-kb CT domain of the rice ACETYL-COENZYME A CARBOXYLASE (OsACC) gene, which is the target of acetyl-CoA carboxylase (ACC) herbicides. Two hundred gRNAs were subdivided into 27 subpools and delivered into approximately 6,500 rice calli. They observed an average editing efficiency of 13%, with 53% of the amino acids altered. Depending on the subpool, the target coverage varied from 70% to 100%, indicating that most of the amino acids were edited. After applying negative selection with the ACC inhibitor haloxyfop, they identified multiple seedlings with various alleles at four different amino acids. One previously known amino acid substitution (W2125C) and three novel substitutions (P1927F, S1866F, and A1884P) conferred herbicide tolerance (Li et al., 2020).
These protein-engineering studies in rice lay the foundations for using CRISPR screens for synthetic biology applications such as protein structure–function studies or for accelerated plant breeding. Using transformation populations of 3,000–15,000 rice calli, the authors were able to obtain evolved allele variants. Thus far, all of these studies targeted genes that confer resistance to drugs or herbicides, which greatly facilitates phenotypic selection (Butt et al., 2019; Kuang et al., 2020; Li et al., 2020). Despite these exciting results, significant limitations must still be overcome. First, the mutation spectra of deaminases are largely restricted by the deaminase window and the transitions that can be made. Furthermore, only a restricted number of amino acid substitutions can be generated using base editors (Tong et al., 2019). One main challenge is to also evolve traits that cannot be negatively selected in tissue culture or are quantitative in nature (plant height, drought tolerance, flowering time, etc.). Furthermore, two of the groups were able to use prior knowledge to more specifically target functional domains. While the results of Butt et al. clearly demonstrate an increase in efficiency using a targeted approach, the required knowledge is generally lacking for many genes and species.
Designing a pooled CRISPR knockout screen
Having described the current applications of CRISPR screens in plants, we will now discuss the design criteria needed to carry out a CRISPR knockout screen in plants (Figure 2). We draw from our experiences in generating CRISPR libraries as well as recently published reports. It is important to first think through the entire workflow before ordering any reagents. Which type of editing tool will I use and what is its efficiency? How will I transform/transfect my plants? Can I generate, grow, and analyze a population large enough for a genome-wide screen, or is a targeted approach more realistic? If a focused screen is used to investigate specific regulatory pathways or quantitative phenotypes, how will I select the target genes? Are they gene candidates from differential expression analysis, QTL/Genome-wide Association Studies (GWAS), phylogenetic relationships, and so on? Once the library is made, how will I phenotype and identify candidate genes and mutations? How will I confirm the “hits”?

Workflow and design parameters for CRISPR knockout screening in plants. A CRISPR knockout screen starts with the design and preparation of the gRNA library. The size of the library is based on the final population size. Genotyping assays must be designed and tested for the eventual genotyping of the mutant population. Once the design is completed, the experimental part of the screening begins. The Cas enzymes and gRNA library are cloned into the appropriate plasmids and the library is evaluated for gRNA distribution and coverage. The library is then transformed into the target tissues and mutant plants recovered. Plants in the primary or subsequent generations can then be selected based on their phenotype, and their DNA is isolated, amplified, and sequenced. Alternatively, the entire collection may be genotyped, resulting in a population with known mutations and/or gRNAs. The resulting sequences are analyzed using sequence alignment software to associate phenotype with genetic perturbations. Finally, the causal mutation is confirmed using independent genome editing assays and functional characterizations. We provide key design parameters to be considered as well as some quality control steps.
All of these questions must have clear answers before starting the screen. Otherwise, one might generate a large collection of transgenic materials with no space to grow them, no simple way of knowing whether the phenotype they are seeing is caused by genome editing (or conversely, why they are not observing any phenotype), and no understanding of how mutagenized their population is.
Choice of editing tool
A large panel of genome-editing tools is currently available for plants (Zhang et al., 2019b). Editing efficiencies are frequently increased by changing promoters and Cas9-codon usage (Hanson and Coller, 2018; Zhang et al., 2019b). For most knockout screens, SpCas9 is the simplest choice, as most tools have been designed based on this enzyme, and it is still more active than any of the variants with relaxed PAM requirements (Endo et al., 2019; Zhong et al., 2019; Legut et al., 2020; Walton et al., 2020). The limitation of the NGG PAM does not present considerable issues when targeting most plant genomes for simple knockouts. However, it may be limited in some circumstances, such as base editing or transcriptional control where there are greater targeting restrictions (specific amino acids or TA-rich regions). Cas9 variants such as xCas9 and Cas9-NG recognize NG, GAA, and GAT PAMs and can accommodate a greater variety of targets (Hu et al., 2018; Nishimasu et al., 2018). The target range and indel spectrum are further expanded using Cas12a, which recognizes T-rich PAMs and introduces a staggered DNA cut (Zetsche et al., 2015; Tang et al., 2017). Lastly, the DNA repair outcomes may be a factor worth considering. The use of the exonuclease Trex2, in combination with Cas9, has been shown to produce larger deletions than Cas9 alone, and therefore could be used to investigate non-coding RNA sequences where small indels may not result in a significant effect (Ho et al., 2014; Čermák et al., 2017; Tan et al., 2020).
Choice of screening format
The method-of-choice for all plant CRISPR screens to date has been stable plant generation with pooled vectors. The main advantages of this approach are that it is relatively simple to implement using existing plant transformation protocols and that it enables plant phenotypes to be analyzed over multiple generations, across development, and under various environmental conditions. This approach also allows one to easily isolate causative alleles from the mutagenized population by genetic segregation and/or crossing. Pooled transformations have thus far been performed using Agrobacterium or direct-DNA delivery via biolistics.
With multiple transformation options available such as electroporation, biolistics, and polyethylene glycol- or Agrobacterium-mediated transformation, it is important to consider how genome editing reagents are delivered into the cells. The use of pools of Agrobacteria typically results in the introduction of T-DNAs from only one or a few bacteria into an individual plant cell (Depicker et al., 1985). Consequently, if plasmids are pooled, one should expect that ∼80% of events will contain a single T-DNA and the rest will contain two, three, or even more different T-DNAs. Such distributions of T-DNAs have been observed in pooled screens (Jacobs et al., 2017; Bai et al., 2019; Kuang et al., 2020). The one report using biolistically delivered DNA suggests that co-transformation might occur more frequently using biolistics compared to Agrobacterium (Kuang et al., 2020). Such co-transformations are in many cases beneficial to the researcher, as they increase the throughput of the screen, thereby decreasing the population size needed. However, the presence of multiple vectors can also make the downstream data analysis more complex.
Another important factor to consider when designing a plant transformation strategy is that transient expression of CRISPR reagents is sufficient to induce inheritable mutations (Woo et al., 2015; Jacobs et al., 2017; Kuang et al., 2020). In the transporter library reported by Jacobs et al., all target loci were checked for indels in all transgenic lines in addition to sequencing the gRNAs. This analysis revealed that 20% of the mutations were induced by gRNAs that were not integrated, suggesting that transient expression of T-DNAs without integration led to these mutations. Likewise, Kuang et al. observed base edits where the corresponding gRNAs were not integrated, leading them to suggest that transiently expressed T-DNAs induced the edits in these lines. These are important observations, as they show that relying solely on sequencing the gRNAs from pooled transformations will likely cause mutations to be missed. The other pooled CRISPR libraries generated in plants (Lu et al., 2017; Meng et al., 2017; Bai et al., 2019; Liu et al., 2020) were likely more mutagenized than reported. It is important to note that this effect might be genotype and/or transformation method dependent. PacBio sequencing all 54 genes in 6 lines of an LRR-XII library revealed no obvious indels from transiently expressed T-DNAs (Jacobs et al., 2017). On one hand, the presence of transiently expressed T-DNAs that induce mutations is beneficial, as it can increase the mutagenesis rate within the population. Nevertheless, this means that a significant amount of genotyping and/or crossing will need to be performed on lines of interest to properly associate phenotypes with the underlying genotypes. Alternatively, researchers may wish to identify conditions that limit the transient expression of T-DNAs.
Producing large numbers of transgenic plants is also a crucial when performing an in planta CRISPR screen. The majority of studies in plants reported to date have been performed in species that are highly amenable to transformation, such as rice or tomato. In a tour de force study, Liu et al. (2020) generated over 4,000 independent transgenic maize events. Very few groups will have the capacity to generate a screening population of this size in maize or other species recalcitrant to transformation and tissue culture. Fortunately, recent studies have demonstrated that overexpressing morphogenetic regulators such as WUSCHEL2 and BABY-BOOM can greatly improve the throughput to generate transgenic plants in several monocotyledon species by stimulating callus regeneration or by inducing somatic embryogenesis (Lowe et al., 2016, 2018). Thus, the improvement of plant regeneration capability might allow CRISPR screening to be scaled up.
Protoplasts are isolated from multiple plant tissues, such as roots or leaves, after their surrounding cell walls are digested, easily providing a large source of cells that can be used for transient assays (Yoo et al., 2007). Genome editing reagents are frequently tested in protoplast systems, and the analysis is typically limited to DNA or RNA profiles (Bargmann et al., 2013; Shan et al., 2014). Their ease of isolation from a wide range of plant species and the high numbers of transfection events make protoplasts an extremely attractive model, especially for genome-wide screens. Yet, there are currently no reports of pooled CRISPR screens using plant protoplast systems. One potential issue when using protoplasts is that as micrograms of plasmids are typically used in protoplast transfections, every single gRNA in the library could enter each and every cell during a pooled transformation. Such a possibility needs to be experimentally determined, but this would defeat the purpose of a screen where the objective is to create a variable population. Thus, protoplasts may be better suited for arrayed-based screens on an automated robotized platform that can transfect large numbers of independent samples (Vanden Bossche et al., 2013).
To overcome these limitations, plant cell suspension cultures offer an alternative that share comparable features with lentiviral delivery in mammalian cell cultures: they are easily maintained under aseptic conditions and can display rapid growth (18–24 h doubling time for tobacco (Nicotiana tabacum) BY-2 cells; Santos et al., 2016). Furthermore, cell suspensions can be stably transformed with Agrobacterium (Forreiter et al., 1997; Santos et al., 2016), which, as previously mentioned, leads to a majority of cells containing a single plasmid. These features can allow pooled vector libraries to be transformed, propagated, and potentially selected. Although the process of creating a plant cell suspension culture de novo might be time consuming (Mustafa et al., 2011), plant cell cultures are currently available for several species (Moscatiello et al., 2013; Santos et al., 2016). Obtaining and using established material might therefore represent an efficient strategy for pooled CRISPR screening.
Number of targets
Population size
The first variable to consider is the size of a reasonable population one can generate, maintain, and phenotype. Even for a small plant like Arabidopsis, generating and phenotyping thousands of lines requires a significant investment of time and resources. For larger plant species, such population sizes might become impractical and too expensive very quickly. It is important to keep in mind that it is often possible to generate many more transgenic lines than one can fit in the greenhouse.
Mutagenic efficiency
Mutagenic efficiency is the percentage of individuals within a population that will carry a knockout mutation. Several considerations feed into this: the efficiency of transgene expression, the efficiency of the gRNA, the type of indel generated, the effect of that indel on gene function, and the inheritability of mutant alleles (if going to the next generation). Researchers should know the efficiency of their system. Most screens thus far have a mutagenesis frequency ranging from 60% to 80% (Jacobs et al., 2017; Lu et al., 2017; Meng et al., 2017; Bai et al., 2019; Liu et al., 2020). Having a highly efficient system is likely to be more important when targeting multiple genes at the same time (Bai et al., 2019). The use of transgenic lines stably expressing Cas9 may increase the mutagenic efficiency by reducing the typical variation in expression among transgenic events (Osakabe et al., 2016). In cases where a novel gene-editing system is used, the efficiency of the editing system together with a subsample of gRNAs should be tested and validated on a few target sites (Li et al., 2020; Liu et al., 2020).
Representation factor
The representation factor considers the number of independent lines that should have mutations in the same gene(s) to provide confidence when correlating genotypes and phenotypes. If a spectacular phenotype is observed in only one individual line, the researcher should be wary of jumping to a hypothesis (or calling the patent attorney) regarding the gene(s) mutated in that line. In Jacobs et al., developmental phenotypes segregated in T1 lines but were not associated with the mutant alleles and thus were likely due to somaclonal effects (Jacobs et al., 2017). Having multiple lines carrying the same mutated gene(s) all showing the same phenotype is important for determining which lines to follow up on. As with most experimental setups, three independent events are a reasonable number to aim for. However, pooled screens in animal cells typically aim for 300–1,000 cells per gRNA (Miles et al., 2016). Similar numbers will likely be needed if such pooled screens are to be attempted in plant cells.
Sampling distributions
Libraries that are evenly distributed do not contain perfectly equal portions of individual gRNAs. Instead, the number of any individual gRNA will fall somewhere on a distribution curve and could vary by 10-fold in abundance (Lu et al., 2017; Liu et al., 2020). Sampling distributions are critical when determining the correct population size. As a rule-of-thumb in such binomial sampling distributions, to have 95% confidence of observing a single gRNA at least once, one needs to sample 3 times the number of gRNAs in the population.
Given these parameters, if, for example, one is able to generate a population of 1,000 transgenic events, they can in principle target: genes for a single-gene knockout screen.
Selecting target sites
Some of the more uncertain aspects of designing a CRISPR screen include selecting the gene targets and selecting the target sites in these genes. If a genome-wide screen is not being performed, it is necessary to generate a sub-selection of genes. Even when performing a genome-wide screen, researchers still subpool the gRNAs to create more focused collections (Lu et al., 2017). Datasets such as QTL (Liu et al., 2020), GWAS, or (differentially) expressed genes (Jacobs et al., 2017; Meng et al., 2017) can guide the selection of target genes, whereas if the critical functional domains of the target proteins are known, the screen can focus on specific regions (Butt et al., 2019; Kuang et al., 2020; Li et al., 2020). For knockout screens, we follow these general recommendations: target constitutive exons (Doench, 2018); avoid regions close to the start site to avoid alternative start codons or regions close to the stop codon to prevent the creation of truncated proteins; and, if possible target the regions of the gene important for its function. These general guidelines often cause us to target the middle ∼75% of gene-coding sequences.
When selecting the target site(s), it is critical to establish genotyping assays for subsequent steps. Primers used to amplify the target regions and/or gRNAs should be tested with the desired sequencing technology before ordering any cloning reagents. While many researchers may wish to skip this step, as testing primers is rather mundane, performing these tests saves time and limits unnecessary stress and anxiety during later steps. Genotyping errors of a target site due to the presence of adjacent homopolymers or the generation of amplicons that do not match the reference genome sequence can significantly stall a project. Target sites, or even whole genes, that are not compatible with later genotyping steps should be removed from the library; it makes little sense to target a gene/region if the analysis will be blind to it in later steps.
gRNA library construction
Only a few genome-scale libraries have been designed for plants. Therefore, although premade knockout libraries are available for rice (Lu et al., 2017; Meng et al., 2017), building a new library is necessary for other plant species or if a more targeted screen is desired. To increase the chances of generating loss-of-function mutations, target regions in constitutive exons should be selected and all possible gRNAs identified. Targets containing homopolymers and/or extreme GC contents (<25% and >80%) are generally avoided because they show lower efficiency on average (Doench et al., 2014; Shalem et al., 2014; Wang et al., 2014; Lu et al., 2017). As researchers are often still left with a large number of potential gRNAs, predicted gRNA efficiency is frequently used to further select gRNAs with potential maximum on-target activity. This step can be automated using computational tools such as CRISPR library designer, which offers the option to use plant genomes as input (Heigwer et al., 2016). However, it is worth pointing out that all gRNA activity tools reported thus far are based on animal cell models, and their results in plants have been mixed (Naim et al., 2020). A simple approach to overcome inevitable gRNA inefficiencies is to design three to four gRNAs per target gene (Shalem et al., 2014; Lu et al., 2017). Note that gRNA selection must be linked to the selection of target sites, and these two decisions often have to be made in tandem.
Cloning can begin following target and gRNA selection. To begin, gRNAs are synthesized as DNA oligo pools. Two types of formats are currently available for oligo pools. Oligo pools generated by array-based synthesis are the least expensive per gRNA and can generate tens of thousands of unique sequences. However, there are some drawbacks: the minimum number of sequences that must be ordered is usually in the thousands; they require PCR amplification to generate sufficient starting material for cloning (care must be taken to avoid amplification bias); and they have relatively high error rates (1 in 200 to 1 in 2,000 nt). These considerations make arrayed-based oligos more suitable for large-scale genome-wide screens requiring tens of thousands of unique gRNAs. Standard column-synthesized de-salted oligos can also be used. Any standard provider can generate very large quantities of oligos that are sufficient for cloning and contain few errors (typically <1 in 2,000 nt). However, these oligos become very expensive when ordering large numbers and often require manual steps for pooling, making this option better suited for smaller-scale screens requiring tens to hundreds of gRNAs. However, there is often a dilemma when creating libraries somewhere in the middle of these two extremes. Fortunately, oligo providers are now starting to bridge the gap and are offering large numbers of relatively inexpensive pooled, column-synthesized oligos.
The oligos or PCR products are then cloned into expression vectors using various methods such as restriction-ligation (Lu et al., 2017), Gibson assembly (Jacobs et al., 2017; Meng et al., 2017), or homologous recombination (Liu et al., 2020). This can be done either individually or with pooled samples. It is important for the cloning strategy to be highly accurate (≥95% correct clones) and efficient. Otherwise, the presence of large numbers of incorrect clones will reduce the mutagenic efficiency of the library. Large numbers of independent clones are also needed to maintain gRNA distribution. gRNA coverage of 25× (Jacobs et al., 2017) or 166× (Shalem et al., 2014) has been used for different libraries, requiring tens of thousands to millions of independent clones. Thus, very large numbers of clones must be generated, and this is a step where many researchers can get stuck. Any new cloning strategy should be thoroughly tested, and the use of cloning sites with the ccdB negative selection marker is highly encouraged. After cloning, the quality of the library is evaluated by Sanger sequencing for single clones or amplicon sequencing for pooled clones. In addition to checking for correct gRNA sequences, the distribution of the library is assessed by analyzing the relative abundance of each gRNA with a histogram, which can be generated with computational tools such as the tool developed by Joung et al. (2017). gRNAs should be normally distributed without individual gRNAs being overly enriched in the library (Doench, 2018). gRNAs that are enriched in the Escherichia coli library will likely be enriched in the transgenic population (Liu et al., 2020) and can have a negative impact on the quality of the screen if the goal is to obtain an evenly distributed mutant collection. Limiting the duration of bacterial growth after gRNA library transformation is also important for limiting bias (Joung et al., 2017). When using Agrobacterium-mediated transformation, the final pool of bacteria used for plant transformation can also be analyzed (Lu et al., 2017; Meng et al., 2017; Liu et al., 2020). After all quality control measures have been met, the library is ready to be transformed into plant tissues.
Data acquisition and screening validation
The main aim of generating a knockout population is to uncover gene function. At least two approaches can be used to screen a mutant library: (1) Screen for phenotypes and then genotype interesting lines; (2) Genotype everything first, organize the lines into a collection, and then phenotype and make associations. Genotyping can be performed by sequencing the integrated gRNAs and/or target genes. A single primer pair is used to PCR amplify the gRNA(s) before submitting the product to sequencing. The identity of the sequenced gRNA(s) is then used as a proxy for the mutated gene(s), as the presence of a gRNA does not necessarily imply that the corresponding mutation has been introduced in the generated line (Jacobs et al., 2017; Lu et al., 2017). As noted earlier, transient expression of gRNAs can also lead to target mutations without insertion of the gRNA (Jacobs et al., 2017; Kuang et al., 2020), and thus additional genes may be mutated.
To confirm the mutation and assess indel efficiencies, genomic regions surrounding the target site are amplified and sequenced by Sanger or massively parallel sequencing (Fauser et al., 2014). Sequence alignment software such as TIDE, ICE, crispRVariant, CRISPResso2, or amplican can then be used to measure indel efficiencies and determine the mutant alleles (Brinkman et al., 2014; Lindsay et al., 2016; Clement et al., 2019; Hsiau et al., 2019; Labun et al., 2019). These genotyping assays must be established during the design phase (see above).
Following successful phenotyping and genotyping, the association between genetic perturbations and the mutant phenotype must be validated. We recommend the use of at least two independent mutant lines generated using independent gRNAs to validate any observed phenotypes. Alternatively, the analysis of progeny from at least two independent lines segregating for the phenotype can help confirm causative alleles (Jacobs et al., 2017).
The phenotyping of whole organisms in plant CRISPR screens is a highly complex endeavor compared to phenotyping using animal cell culture systems. At the same time, the use of whole plants presents a significant advantage, as many phenotypes can be uncovered by analyzing multiple traits in a mutant collection under a range of environmental conditions (Yang et al., 2020). Accordingly, except for a few CRISPR mutant collections in Drosophila (Meltzer et al., 2019; Port et al., 2020), whole organisms have generally not been used in CRISPR screens. Depending on the trait analyzed and the species used, plant phenotyping can be labor intensive and subjected to important environmental variation when conducted in the field. Using genetic reporter lines as background material for screening can facilitate phenotyping and provide further insight into the regulation of particular molecular pathways (Marquès-Bueno et al., 2016; Pusapati et al., 2018; Feldman et al., 2019; Wright and Nemhauser, 2019). Hence, it is important to consider how the phenotyping will be performed once the library is generated several months to years later. It may be desirable to perform pilot experiments in which a variety of reporter lines are tested to find the most suitable one.
Future prospects for CRISPR screens in plants
Since the first reports of CRISPR screening in mammalian cells in 2014 (Shalem et al., 2014; Wang et al., 2014), the use of CRISPR screens is slowly being adopted by the plant research community. As relatively few such experiments have been performed, many open questions remain on the best way to use and implement this technology in a research program. We and others are continuing to develop this technology, and we anticipate that in the near future, performing a CRISPR screen or ordering a CRISPR library will become as common as performing an EMS screen or requesting T-DNA/transposon insertion lines. In this section, we highlight some of the current limitations that should be overcome to make this technology easier to use, as well as some exciting possibilities that are not currently achievable with existing technologies.
Development of prediction tools for gRNA efficiency and editing outcome
Designing gRNAs is a key step in setting up a CRISPR experiment and greatly influences the quality of a CRISPR library. Having more active gRNAs improves the efficiency of the system and reduces the size of the necessary population. Bai et al. demonstrated that a gRNA pre-screening step increased the mutagenesis frequency of their library in soybean (Bai et al., 2019). However, such a pre-screen becomes less practical as the number of gRNAs increases. Therefore, it is desirable to have an in silico screening pipeline to select highly active gRNAs. Predictive tools have been established in mammalian cell systems that use scoring algorithms to predict on- and off-target activities (Hsu et al., 2013; Doench et al., 2014, 2016). The core rules of gRNA efficiency depend on the position, number, and identity of mismatches to the genomic target. While the various models can facilitate gRNA design in mammalian experimental systems, their predictive accuracy in plants is still unclear. In an effort to compare gRNA efficiency scores generated by different algorithms, Haeussler et al. observed substantial variation in the performance of predictive tools, suggesting that individual models that have been optimized for a particular dataset and model species are not fully transferable to other experimental systems (Haeussler et al., 2016). Along the same line in plants, Naim et al. compared eight design tools to score the experimental editing efficiency of gRNAs and observed little uniformity or predictive accuracy (Naim et al., 2020). This finding suggests that these models need to be adapted to increase their predictive power for plants. Screening approaches similar to those used for mammalian cell systems (Kim et al., 2018, 2020; Wang et al., 2019) could be used to improve the predictability of gRNA design algorithms for plants (Figure 3).

Development of gRNA prediction tools in plants using CRISPR screens. Prediction tools for gRNA efficiency can be generated by probing the activity of gRNA libraries. In the arrayed strategy, protoplasts are isolated, arrayed, and transfected with a ribonucleoprotein complex (Cas–gRNA) or with expression vectors. DNA is then isolated from individual transfections and analyzed using amplicon sequencing. Alternatively, a pooled strategy offers the advantage of being easily scalable for large datasets. A library of synthetic gRNA-target sites is built on a common vector backbone, pooled, and transformed into plants. Mutagenesis of the transformed vector occurs in plant cells and can be recorded using a common primer pair that binds to backbone sequences of the vector and can thus be used to amplify and record target regions for the whole library, followed by sequencing. For both the arrayed and pooled strategies, sequences of mutated target sites and gRNAs are used to train a machine-learning algorithm to generate gRNA design rules and a scoring system.
Forecasting the type of mutation generated at a target locus is another desirable component of gRNA design. Genome editing outcomes from CRISPR systems are nonrandom and are dependent on sequence features adjacent to the target site (van Overbeek et al., 2016; Shou et al., 2018). For example, in most knockout screens, it would be desirable to remove the gRNAs that primarily lead to in-frame deletions. In contrast, studies such as Butt et al. could benefit by utilizing gRNAs that only generate in-frame deletions. Multiple studies have investigated target sequence determinants by analyzing editing outcomes for up to 40,000 gRNA-target pairs from synthetic or endogenous target regions and have used these large editing datasets to train deep learning programs (InDelphi, FOREcast, or SPROUT) to predict mutational outcomes associated with gRNA sequences (Shen et al., 2018; Allen et al., 2019; Leenay et al., 2019). A large proportion (72%) of editing outcomes obtained from Cas9 activity in a maize CRISPR screen were predicted by the ForeCast model (Allen et al., 2019; Liu et al., 2020). This suggests that although the DNA repair machinery is not identical between mammalian and plant systems, the rules determining editing outcomes are largely transferable (Liu et al., 2020). These tools, together with the development of prediction tools for gRNA on- and off-target efficiency, would provide valuable resources for designing gRNAs and selecting those that generate the desired type of mutation.
Recording gene editing outcome at the single-cell level
The emergence of single-cell sequencing technologies has revolutionized the resolution at which cell identities can be characterized (Macosko et al., 2015; Tanay and Regev, 2017). By analyzing and integrating the regulatory interactions of individual cells from a specific tissue, gene regulatory networks underlying dynamic developmental processes such as cell differentiation can be inferred (Efroni et al., 2015).
Microfluidic systems have been used to process large numbers of single cells and have successfully been applied to plants by multiple independent groups to profile root tissues (Denyer et al., 2019; Jean-Baptiste et al., 2019; Shulse et al., 2019; Zhang et al., 2019a, 2019b). Profiling single Arabidopsis root cells revealed new cell-type-specific marker genes as well as how cell differentiation trajectories proceed from the proximal meristem to the differentiation zone (Denyer et al., 2019; Shulse et al., 2019). The influence of environmental factors such as sucrose or heat shock on cell identity has also been analyzed (Jean-Baptiste et al., 2019; Shulse et al., 2019). Together, these studies firmly established single-cell profiling protocols and analysis pipelines for Arabidopsis roots, which will facilitate functional genetic analysis in plants at high resolution.
Combining gene perturbations with single-cell profiling as a phenotypic readout will allow gene function to be probed in precise cellular contexts (Tanay and Regev, 2017). Multiple reports in mammalian cells have established experimental pipelines and computational frameworks for conducting CRISPR screens followed by single-cell sequencing (Perturb-seq, CROP-seq, and CRISP-seq; Adamson et al., 2016; Dixit et al., 2016; Datlinger et al., 2017). In these types of studies, the gRNA library is cloned in a vector that carries barcodes to label individual gRNAs that can subsequently be detected by RNA sequencing. By targeting a set of regulatory factors and recording the resulting transcriptional changes at the cellular level, the regulatory signatures of transcription factors, as well as the effects of cell cycle genes on cellular programs, have been investigated in various systems (Adamson et al., 2016; Dixit et al., 2016).
As both CRISPR screens and single-cell sequencing have been established in plants, we envision that these methods will be combined to further characterize genetic programs in plants (Figure 4). However, several challenges remain before this technology can successfully be applied to plants. The heterogeneity in plant cell size and the requirement to digest cell walls strongly decrease the efficiency of the single-cell isolation procedure. To date, although the number of cells analyzed by single-cell sequencing in Arabidopsis roots has varied from 3,000 to 12,000 (Denyer et al., 2019; Jean-Baptiste et al., 2019; Shulse et al., 2019; Zhang et al., 2019a), Perturb-seq experiments are used to analyze 26,000–104,000 cells (Dixit et al., 2016). Thus, scaling up the number of cells analyzed will be crucial before these methods can be applied to plants. This could be accomplished by using cell suspension cultures or protoplasts (Yoo et al., 2007; Moscatiello et al., 2013), as scaling up their culture would generate enough material for transformation with Cas9 and a gRNA library to be processed for single-cell sequencing (Figure 4). Finally, the use of a computational framework such as the one developed in Dixit et al. might also allow the number of cells analyzed to be downscaled to identify core regulatory signatures resulting from genetic perturbations (Dixit et al., 2016).

Mapping and characterization of plant gene regulatory networks using CRISPR screens. A, CRISPR screen followed by DROP-seq (Macosko, 2015) can decipher gene regulatory networks at single-cell resolution. Vectors encoding Cas9 and a library of barcoded constructs carrying gRNAs are transformed into plant protoplasts or cell suspension cultures. Protoplasts are then loaded on a microfluidic device and encapsulated with barcoded beads for single-cell sequencing. Single-cell profiles are generated and associated to individual gRNAs, thereby allowing transcriptional responses to be investigated upon gene perturbation. These profiles are used to infer gene regulatory networks. B, Multiplexed CRISPR screens can characterize redundant gene function or gene interactions. The number of possible gRNA combinations is first assessed and fitted to the desired size of the mutant population. A multiplexed gRNA library is then built and transformed into plants to generate a mutant collection. Individual lines are then genotyped and phenotyped and generate a datasets that can be used to associate phenotype with causal gene perturbations or to infer gene interaction maps.
Combinatorial CRISPR screens
The multiplexing capabilities of the CRISPR-Cas system combined with its simple programmability allow researchers to simultaneously target multiple genes to investigate genetic interactions or redundant genes. In mammalian and yeast cell cultures, combinatorial CRISPR screens have been conducted to test genetic interactions in a high-throughput manner, systematically interrogating thousands of pairwise genetic interactions using paired-gRNA libraries and facilitating the dissection of complex genetic networks (Wong et al., 2016; Shen et al., 2017; Najm et al., 2018; Roy et al., 2018; Zhao et al., 2018, 2019; Guo et al., 2019; Gonatopoulos-Pournatzis et al., 2020).
In contrast, large-scale combinatorial CRISPR screens have not yet been reported in plants. Although CRISPR-based single-gene knockout collections provide a valuable resource for plant functional genetics (Lu et al., 2017; Meng et al., 2017; Liu et al., 2020), they do not provide more insight than EMS mutagenesis or T-DNA/transposon insertion collections for investigating gene function and their putative interaction partners. Furthermore, polyploidy is a common phenomenon in plants that can lead to extensive functional redundancy, making the analysis of genetic screening challenging, especially when multiple multi-allelic genes are targeted and segregate over several generations (Comai, 2005; Zaman et al., 2019). Combinatorial CRISPR screens have the potential to overcome these challenges by simultaneously targeting all genes in a particular family, regulatory module, or metabolic pathway of interest. Additionally, by stacking multiple gene targets into a single plasmid, far fewer individuals need to be generated to obtain mutations in all target genes.
Thus far, two reports have laid the foundation for combinatorial CRISPR screens in plants (Jacobs et al., 2017; Bai et al., 2019). In principle, several strategies can be used to simultaneously target genes (McCarty et al., 2020). First, individual genes can be targeted via co-transformation of two or more plasmids carrying different individual gRNAs (Bai et al., 2019). This process results in the production of transgenic lines containing primarily one gRNA; however, two, three, or even more gRNAs can be integrated and lead to mutations. As the integration of plasmids is a random process, populations of plants containing various combinations of indels can be produced.
In a second approach, multiple guides can be expressed from a single plasmid (Ma et al., 2015; Zhang et al., 2016; Jacobs et al., 2017). This can either be done by expressing each gRNA with an individual promoter or as a single RNA transcript. Polycistronic RNA transcripts can be processed by components of the native CRISPR-Cas system such as Cas12 (Wang et al., 2017; Hu et al., 2019; Pu et al., 2019) or by flanking the gRNAs with RNA cleavage sites, for example, self-cleaving ribozymes (Tang et al., 2016; He et al., 2017), Csy4 recognition sites (Čermák et al., 2017; Yingzhu Liu et al., 2019a), or tRNA processing signals (Xie et al., 2015; Čermák et al., 2017; Figure 4B). This method requires the creation of large gRNA libraries, as all gRNA combinations need to be cloned, which can be facilitated using multiplexed cloning strategies (Čermák et al., 2017; Jacobs et al., 2017). The position of the gRNA in a guide array may affect its efficiency when multiple Pol III promoters are used (Jacobs et al., 2017). A positional-dependent effect was also observed using a polycistronic transcript containing ten gRNAs that was processed by Csy4 (Kurata et al., 2018). In both studies, the first guides generated mutations at higher efficiencies than the last, suggesting that the position of the gRNA on the array affects its efficiency (Kurata et al., 2018). However, other studies have not observed positional effects (Lowder et al., 2015; Čermák et al., 2017), suggesting that the positional effects of gRNAs might be dependent on the vector, species, or transformation method used.
Lastly, designing non-specific gRNAs that target closely related sequences can increase the number of targeted genes without delivering additional gRNAs (Bai et al., 2019). The bioinformatics tool CRISPys can be used to design gRNAs that target sets of genes (preferably from the same gene family) based on the observation that specific mismatches between the gRNA and target are tolerated (Hyams et al., 2018). This strategy is particularly suited for investigating genetic redundancy within gene families or gene function in polyploid plants (Bai et al., 2019) but can be harder to implement in species with less sequence homology, such as Arabidopsis.
Although the genetic tools needed to generate multiplexed gRNA libraries are available, many challenges remain before high-throughput pooled, combinatorial CRISPR screens can be conducted in plants. First, the ideal size of the screen (such as the number of target genes per pool/batch and number of gRNAs per vector) needs to be empirically determined. While using smaller gene pools and fewer gRNAs per vector might facilitate downstream analysis, this will also reduce the throughput of the screen. Another possibility is that the chance for lethality or severe phenotypes might increase with higher-order knockouts. Approximately 10% of all Arabidopsis genes are essential (Lloyd and Meinke, 2012), and as ever more gRNAs are stacked in a plasmid, the chances of obtaining highly mutagenized lines (or not, if very severe) that may be too strongly affected to be useful in a screen increase. Therefore, a balance will likely need to be struck between the efficiency of generating higher-order mutants and those that will provide useful phenotypes in a screen.
Furthermore, it is important to consider the combinations of gRNAs (genes) to target when multiplexing. For large libraries, it is not possible to target every gene in every possible combination. Even with the most efficient transformation systems, it is difficult to generate the number of independent events required to produce all higher-order mutants. For a genome with 30,000 protein-coding genes, there are n (n−1)/2 or 450 million possible pairwise combinations (Figure 4B). Even for a gene family with only 20 family members, there are 190 possible pairwise combinations. Thus, the number of individuals quickly becomes impossible to obtain for most systems when higher-order combinations are considered.
Another critical challenge is to associate genotypes with phenotypes in complex mutant collections resulting from multiplexed gene knockout (Costanzo et al., 2019). This has so far been done by manually interpreting the data, as there are currently no established automated analysis methods in plants for higher-order libraries (Hauser et al., 2013; Jacobs et al., 2017; Zhang et al., 2018; Bai et al., 2019). There is a need for an efficient method to link genes and combinations thereof to observed phenotypes and to calculate the contribution of each allele to the phenotype. Ideally, such a method could use the data gathered by the combinatorial CRISPR screen to predict combinatorial gene effects not identified in the screening population (Figure 4B; Liu et al., 2019b). High-throughput genotyping platforms have been established, and the development of robotized phenotyping platforms coupled to association mapping analysis methods could also be used to correlate complex traits with mutant genotypes (Yang et al., 2020). Although these methods are not directly related to CRISPR methodologies, we anticipate that their further development and integration with genetic screens will greatly advance the potential for using CRISPR screens for plant functional genetics.
Conclusions
The use of large-scale CRISPR screens in plants is starting to reveal the great potential of this method for both functional genomics and synthetic biology through targeted, in planta gene engineering. Thus far, this method has primarily been used to generate knockout mutant collections or to diversify gene sequences (Lu et al., 2017; Meng et al., 2017; Rodríguez-Leal et al., 2017; Kuang et al., 2020; Li et al., 2020; Liu et al., 2020). Together, these studies have provided a general framework for designing and conducting CRISPR screens in various plant species and have given insight into the current limitations. One of the main challenges is to transform and maintain whole plant mutant collections over multiple generations, which, compared to cell cultures, requires significantly more time and resources. On the other hand, the ability to grow and perturb mutants throughout development is a tremendous advantage for uncovering a wide range of genetic effects and should be fully exploited by the plant research community. Strategies to accelerate the creation of transgenic plants (Maher et al., 2020) or to shorten generation time in crops (Watson et al., 2018) will greatly facilitate the construction and screening of CRISPR libraries.
Based on established methods in animal cell cultures, we also envision that CRISPR screens could be combined with machine learning to design gRNAs with higher efficiency in plants and to predict genome editing outcomes, thereby increasing the mutagenesis efficiency and decreasing the necessary population size. Furthermore, in combination with single-cell genomics, CRISPR mutant collections could be used for high-resolution mapping of gene regulatory networks that function in plants in response to environmental or developmental cues.
Finally, the multiplexing capabilities of CRISPR genetic tools open new avenues to investigate molecular processes that could hardly be approached with classical screening methods. Combinatorial gene knockout can interrogate gene function in functionally redundant gene families or polyploid plants. Hypothesized gene regulatory networks inferred from genomic datasets could be tested and their models further refined (Krouk et al., 2013). Taken together, we anticipate that the continued development of CRISPR screens will tremendously increase the pace of plant functional genetics and synthetic biology.
Acknowledgments
We would like to thank Laurens Pauwels, Rafael Andrade Buono, and Ward Decaestecker for critically reading the manuscript.
Funding
This work was funded in part by VLAIO (Flanders Innovation & Entrepreneurship) WHEAT-BEG (project number HBC.2018.2152) and BASF Agricultural Solutions Belgium NV (T.B.J. and C.G.).
Conflict of interest statement. The author declares that there is no conflict of interest.
C.G., W.D., and T.B.J. conceived and wrote the manuscript.
The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Author (https://academic.oup.com/plcell) is: Thomas B. Jacobs ([email protected]).
References
Čermák T, Curtin SJ, Gil-Humanes J, Čegan R, Kono TJY, Konečná E, Belanto JJ, Starker CG, Mathre JW, Greenstein RL, Voytas DF, (2017) A Multipurpose Toolkit to Enable Advanced Genome Engineering in Plants. Plant Cell
Endo M, Mikami M, Endo A, Kaya H, Itoh T, Nishimasu H, Nureki O, Toki S (2019) Genome editing in plants by engineered CRISPR—Cas9 recognizing NG PAM. Nat Plant 5: 1417
Legut M, Daniloski Z, Xue X, McKenzie D, Guo X, Wessels H-H, Sanjana NE (2020) High-Throughput Screens of PAM-Flexible Cas9 Variants for Gene Knockout and Transcriptional Modulation. Cell Rep 30: 2859–2868.e5
Walton RT, Christie KA, Whittaker MN, Kleinstiver BP (2020) Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science 368: 290
Author notes
Senior author.