Cutting back malaria: CRISPR/Cas9 genome editing of Plasmodium

Abstract CRISPR/Cas9 approaches are revolutionizing our ability to perform functional genomics across a wide range of organisms, including the Plasmodium parasites that cause malaria. The ability to deliver single point mutations, epitope tags and gene deletions at increased speed and scale is enabling our understanding of the biology of these complex parasites, and pointing to potential new therapeutic targets. In this review, we describe some of the biological and technical considerations for designing CRISPR-based experiments, and discuss potential future developments that broaden the applications for CRISPR/Cas9 interrogation of the malaria parasite genome.


Introduction
To functionally interrogate a genome requires the capability to delete, insert, rewrite and modify not just the specific nucleotides that compose the genome of an organism, but also to alter gene expression and the epigenetic marks that contribute to how the genome is used. Our ability to experimentally tinker with these aspects has been dramatically enabled in the past few years by the development of CRISPR/Cas9 approaches, which have transformed the speed and scale with which genome 'editing' can be achieved. This is true across fields, including parasitology and the study of apicomplexans that include Plasmodium, Toxoplasma and Cryptosporidium. In the latter case, CRISPR editing has been the key that has unlocked the ability to genetically manipulate Cryptosporidium in the lab [1], whereas the more genetically facile Toxoplasma has been taken to the next level of genome-scale exploration through forward genetic screens [2]. The Plasmodium parasite, on which this review is focused, lies somewhere in between these two extremes, with in vitro transfection methods in use for Plasmodium falciparum since 1995 and in vivo rodent malaria models with reasonable transfection efficiencies for genome scale approaches [3,4]. Nonetheless, the introduction of CRISPR approaches has accelerated our collective ability to test essential pathways, generate conditional alleles, dissect domains and motifs and transplant single-nucleotide variants. In this review, we aim to highlight how CRISPR is being used in the malaria community, as well as synthesise our collective practical experience to date on how the system has worked, and the challenges for when it has not.

A critical capability
Site-specific nucleases, including homing endonucleases (e.g. I-SceI), zinc-finger nucleases (ZFNs), Transcription activator-like effector nucleases (TALENS) and CRISPR/Cas9, share the ability to selectively trigger a double-strand break at a defined site in the genome. Genome engineering has simply exploited the desire of the cell to repair this adverse event. In many organisms, this repair can occur by the potentially error-prone pathway of non-homologous end joining, with the resulting indels generating a gene disruption [5]. Plasmodium species lack this ability, a double-edged sword that removes one facile method for generating gene disruptions (now used so effectively in Toxoplasma), but decreases the concern about potential off-target effects from cleavage at unintended loci. To our knowledge, offtarget lesions have not been reported for Plasmodium after ZFN or CRISPR editing experiments, although a systematic examination of this question has yet to be performed. Furthermore, it should be noted that Plasmodium does possess the capability for microhomology-mediated end joining, which as the name suggests uses very short regions of homology flanking the double-strand break to repair the lesion, leading to potential indels. Nonetheless, this pathway appears to be relatively inefficient [6,7], and more work needs to be done to understand how it might be exploited for targeted gene disruptions.

CRISPR/Cas9 and its use in Plasmodium
CRISPR/Cas systems evolved in prokaryotes as a protective mechanism against invading bacteriophage and encode guide RNAs that program the Cas nuclease to bind and cut a specific target. Through evolution, prokaryotes have created a wide array of solutions to the same bacteriophage problem by inventing Cas proteins and guide RNAs of different compositions, structures and nucleic acid targets (reviewed in [8]). These nucleaseactive Cas proteins have been effectively repurposed for gene editing applications (gene deletion, tag insertion and targeted mutations), whereas nuclease-dead variants (e.g. dCas9) have been used to affect gene regulation purposes (transcriptional activation or repression, epigenetic modification). While many Cas proteins have now been bioinformatically and experimentally defined, Streptococcus pyogenes Cas9 (SpCas9) is the most widely used, and to date all published work with Plasmodium parasites only uses SpCas9 and its variants. Given the restriction in Plasmodium species to homology-directed repair pathways, CRISPR/Cas9 is effectively a three-component system. Regardless of the specific approach, editing requires the delivery into the parasite of the Cas9 nuclease, the guide RNA(s) and a donor template for the cellular repair machinery to utilise. How each of these three components are delivered, and the design of the donor template to produce the intended modification, are discussed in more detail below.

Species-specific CRISPR systems
The 1st reported CRISPR tools for P. falciparum were those by Ghorbal et al. [9] and Wagner et al. [10], which use a two-plasmid system to deliver Cas9, guide RNA (gRNA) and donor template. One critical feature for gRNA expression is the need for a precise transcription start at the 1st nucleotide of the gRNA (corresponding to the start of the target homology sequence), and thus in the majority of CRISPR systems, transcription has been driven from an RNA polymerase III promoter (although see below for alternate possibilities). The Ghorbal and Wagner studies solved this in two different ways, utilising respectively a parasite U6 snRNA promoter or a T7 phage promoter with corresponding co-expression of the T7 RNA polymerase. Variants of both the U6-and T7-based approaches have also been developed (e.g. [11,12]), with one of our labs (Lee) exploiting a short U6 promoter to generate an all-in-one plasmid for delivering all three components that is suitable for relatively small (<1.5 kb) donors.
The option of alternative positive selection (e.g. Blasticidin [13][14][15]) or negative selection markers, either on the donor plasmid [9] or the Cas9-gRNA plasmid [16], exists if counter-selection is desired. Additional methods to prevent the establishment of replicating episomes are to linearise the plasmid by restriction digest prior to transfection (as done by [9]) or to incorporate the specific gRNA site at the end of the donor sequence such that expression of the Cas9-gRNA within the parasite results in plasmid linearization (Lee, unpublished data).
In addition to P. falciparum, CRISPR reagents for another zoonotic malaria parasite, Plasmodium knowlesi have also been developed and are effective, which coupled with the higher transfection efficiency of this species should facilitate scaling of genetic modifications [17]. Plasmodium knowlesi CRISPR approaches can be used to functionally analyse genes and mechanisms relevant to the genetically intractable Plasmodium vivax parasite, to which P. knowlesi is closely related evolutionarily.
The rodent malaria species, Plasmodium yoelii, is also increasingly well resourced with CRISPR/Cas9 reagents. Jing Yuan and colleagues [18] developed a system that also uses RNA polymerase III (via a minimal U6 promoter) to transcribe gRNAs constitutively and at high levels. As only one drug-selectable marker, dihydrofolate reductase (DHFR), is commonly used in P. yoelii and the related Plasmodium berghei parasite, all of the necessary CRISPR DNA sequences were either packaged into a single plasmid, or separated across two plasmids (only one of which could be selected). Initial work demonstrated that gene editing by SpCas9 could be done efficiently in P. yoelii, but that the elimination of the plasmid was exceedingly challenging. The introduction of a negative drug selectable marker, the bifunctional yeast fusion cytosine deaminase/uracil phosphoribosyltransferase (yFCU) gene, allowed for the eventual elimination of parasites retaining the plasmid sequences and resulted in edited parasites that regained sensitivity to anti-folate drugs and could be edited again [19]. With this system, systematic interrogations of the ApiAP2 protein family and proteins related to ookinete motility have revealed key similarities and differences between P. yoelii, P. berghei and P. falciparum [19,20]. Further developments in P. yoelii have included the production of a male/female reporter line expressing sex-enriched fluorescent proteins, as well as a parasite line constitutively expressing SpCas9 that would reduce the size of plasmids required [21,22]. Recently, Walker and Lindner have reported the use of RNA polymerase II promoters to transcribe a ribozyme-guide-ribozyme system (CRISPR-RGR) in P. yoelii that can achieve high editing efficiencies for gene deletions and tag insertions [23]. This work also demonstrated that the number of gRNAs used influences gene editing outcomes, where using one gRNA can result in parasites bearing either plasmid integration and locus replacement gene edits, while using two gRNAs produces parasites with only locus replacement events. Moreover, this study also demonstrated that CRISPR interference (CRISPRi) is possible by placing nuclease dead variants of SpCas9 upstream of an endogenous gene. Because both of these systems from the Yuan and Lindner laboratories include DNA elements from both P. yoelii and P. berghei that have high sequence conservation, these plasmids should be functional in both species, although this remains to be tested. In addition, the development of CRISPR reagents specifically for P. berghei for gene deletion, editing and tagging is ongoing (B.Roberts and A.Waters, personal communication). A summary of CRISPR-based genome-editing experiments performed to date is provided in Table 1.

Purified Cas9 ribonucleoprotein
The majority of approaches described to date for Plasmodium species rely on delivery of the Cas9, gRNA and donor components on plasmids. Increasingly, however, CRISPR editing in mammalian systems is employing purified Cas9-gRNA ribonucleoprotein (RNP) that is complexed prior to delivery into the cell.
In addition to potential increases in efficiency, the RNP approach does not consume any selectable markers and the short lifetime of the Cas9-gRNA RNP in the cell may limit off-target damage. Cas9 protein can be purchased from a number of commercial vendors, or expressed in bacteria [24]. Similarly, gRNAs can be generated by in vitro transcription from oligonucleotide templates (for example [25]) or commercially synthesised. To date, there has been only one report of using Cas9 RNP for editing in Plasmodium, which described the use of a Cas9-gRNA RNP coelectroporated with a 200-nucleotide single-stranded oligonucleotide as a donor to deliver a drug-resistance point mutation into the pfatp4 gene [26]. Although parasites were recovered with the expected mutation, enrichment of the desired drug-resistant mutant required treatment of the culture with a PfATP4 inhibitor, reflecting the relative inefficiency of the editing event. Thus, the broad utility of the purified RNP method remains unclear, despite the potential advantages for streamlining of the design workf low.

Considerations for designing a CRISPR/Cas9 experiment
One of the positive features of the Cas9 system is that, unlike ZFNs and TALENs, the nuclease does not require modification to alter target specificity, greatly simplifying the design phase. Specificity instead is conferred by the 1st 20 nucleotides of the gRNA (Figure 1a). Nonetheless, careful selection of the appropriate gRNAs and consideration of donor template design can greatly increase the chances of a successful outcome.
Below are several parameters that factor into experimental design.

Guide RNA design
The identification in the target genome of potential gRNAs that conform to a 20-nucleotide sequence followed by a (-NGG) protospacer adjacent motif (PAM) is most easily accomplished using one of the many freely available tools, and users may initially wish to evaluate the output from multiple sources. Some programs commonly used by our labs include Benchling (Biology Software, 2018), Protospacer [27], CHOPCHOP [28] and EuPaGDT [29], all of which score gRNAs on the basis of the number and position of mismatches at potential off-target sites in the genome. Notably, mutations in the 'seed' region (Figure 1a), the 12 nucleotides directly upstream of the PAM, are most disruptive to binding. This effect also factors into the choice of silent 'shield' mutations that are inserted into the donor template, described below. In addition to the off-target score, some gRNA prediction tools provide an on-target score that aims to predict gRNA activity. The on-target score is modeled from a large-scale survey of the activity of several thousand gRNAs on a set of mammalian target genes [30,31]. Whether this model is reflective of gRNA activity on more AT-rich genomes found in some Plasmodium species is not clear, and our collective experience to date indicates that even gRNAs with low on-target scores can be successfully used. Nonetheless, a recent study by Ribeiro et al. [32] suggests that on-target scores are predictive of success even in P. falciparum, and report an annotated list of all 662 795 potential gRNA sites in the P. falciparum genome. Another consequence of AT-rich genomes is the potential for poly-T stretches within the gRNA sequence, which could result in premature termination by RNA pol III and T7 RNAP, and thus should be avoided unless an RNA pol II expression system, like CRISPR-RGR, is used [23,33]. Finally, recent ATAC-seq data on P. falciparum [34,35] may be useful to inform gRNA selection to bias towards open chromatin or to troubleshoot unsuccessful editing events, as chromatin accessibility has been shown to affect editing in other systems [36].
A general rule of thumb for gRNA selection is to find guides that bind as close to the desired site of modification as possible, with the fewest predicted off-target effects [37]. In the case of editing a single point mutation, the gRNAs should ideally be located within 100-200 basepairs (bp) of the target site, with the frequency of capturing the desired mutation by the repair event decreasing with distance from the cut site. Similarly, insertion of tags or regulatory elements at the 5 and 3 ends of a gene constrain the choice of gRNAs available. For the creation of both point mutations and tag insertions, silent 'shield' mutations at the gRNA-binding site allow preservation of the coding sequence while preventing Cas9 cleavage of the donor plasmid or the correctly repaired genomic locus (Fig. 1c). Most disruptive to gRNA binding is mutation of the PAM, or if not possible, the introduction of mutations in the seed region. Gene deletions, on the other hand, afford the use of any gRNA within the deleted region (Figure 1b).
The selection of more than one gRNA per target is recommended to increase the odds of obtaining at least one active gRNA, with our labs typically selecting two gRNAs per editing event. In addition to the conventional use of these gRNAs individually, an alternate approach is to express multiple gRNAs in a single cell to improve efficiency and as a hedge against having one poorly active gRNA. A simple approximation of this approach is used by the Lee lab by co-transfecting two separate gRNA plasmids, relying on the propensity of P. falciparum to take up multiple plasmids during transfection (our unpublished data). However, a variety of more sophisticated methods for multiplex gRNA expression have been developed for mammalian cells, Drosophila, plants and other organisms. The approaches range from tiling multiple gRNA expression cassettes within a single plasmid, to the use of a single-polycistronic transcript that f lanks each gRNA with tRNA [38] or ribozyme sequences [39], resulting in liberation by endogenous nucleases or self-cleavage, respectively. The advantage of the polycistronic approach is that the gRNAs are expressed from a single promoter, with promoter choice not restricted to RNA pol III-based expression, enabling stage-specific gRNA expression. Approaches for multiplex gRNA expression in P. yoelii using hammerhead and hepatitis delta virus ribozymes have now been developed (described above).

Donor design
The specifics of donor design are as diverse as the potential uses of CRISPR-gene disruption, single nucleotide modification, tagging or marker-free insertion of fluorescent reporters or conditional control elements. However, some general factors are relevant for all homology-directed repair approaches, with one primary consideration being the length of the homology region. In the absence of a nuclease-triggered double-strand break, long homology arms can assert a strong influence on the efficiency of gene targeting. For example, the PlasmoGEM large-scale knockout project in P. berghei examined the efficiency of integration with homology arms ranging from 0.4 to 14 kb, and observed improved efficiency above 1.25 kb up to 10 kb [4]. In contrast, the majority of Cas9-based templates used in Plasmodium are at or below 1 kb, likely reflecting the stimulating effect of an induced double-strand break. Our collective experience to date indicates that homology regions of >250-1000 bp are sufficient (Figure 1b), a similar range also observed by Ribeiro et al. [32], with efficient editing of P. yoelii with homology arms as short 80-100 bp as reported by Walker and Lindner [23]. However, to date there have been no reports of success using the very short homology regions (<50 bp) that are effective in Toxoplasma [40,41]. In a recent study, Kudyba et al. (2018) [42] tested insertion of a PCR-produced marker flanked by 50-100 bp homology regions, but saw little to no editing. To our knowledge, the smallest single-stranded oligonucleotide donor used successfully to date is the 200-nt repair template to introduce a drug-resistance mutation in PfATP4 [26]. However, it is unclear whether the relative inefficiency of this editing event is derived from the use of a short oligonucleotide template or the Cas9-RNP approach.
Another factor that influences donor design is the conversion tract length of the repair process from the site of the doublestrand break, in other words, how far along the donor template that sequence changes are captured. This effect varies between organisms [43,44], and at a practical level will inform how close the gRNA-binding site should be to the site of the desired mutation. Ideally, the double-strand break should be triggered directly at the site of the desired mutation, however in practice, this is often not possible. In P. falciparum, we have noted that when the desired point mutation or tag lies further than 100 bp from the shield mutations at the gRNA binding site, we observe variable capture (from 0-100%) of the desired event even if the shield mutations are edited with 100% efficiency. One solution is simply to perform replicate transfections in the hope that at least one will result in a conversion tract that covers the mutation of interest. However, an alternate strategy is to 'recodonise' the region between the gRNA site and the desired mutation, essentially disrupting the homology in the intervening space with silent mutations (Figure 1c). This stretch of silent mutations does not alter the protein coding sequence, but will ensure that the repair process is driven beyond the desired mutation before homology is encountered. Given the number of potential mutations to be introduced, the recodonising approach is best achieved using gene synthesis of the donor.

Challenges
A general caution for any genome-editing strategy should be the consideration of unexpected deletions and rearrangements that may be difficult to identify by standard PCR-based genotyping. Such events have been reported in mammalian systems [45], and the ability to perform whole-genome sequencing, including long-read sequencing, will be valuable in resolving these potential events. In addition, there are a number of Plasmodium-specific considerations that may impact editing outcomes. The Plasmodium genome contains a large number of multigene families, some with hundreds of members. If there is sufficient homology between family members, a potential challenge in targeting one specific member may be unintended repair from the paralogous gene sequences rather than the provided donor, as well as identifying unique gRNA sites in the first place. The latter point is addressed by the recent study of Ribeiro et al. [32], who annotate gRNAs that range in their ability to target a single family member, or that bind universally to all members.
Representatives of some of the largest multigene families, such as the var, rifin and stevor genes, are clustered in the subtelomeric regions, which presents an additional challenge for genome editing. When targeting the var2csa member of the var gene family, Bryant et al. [46] observed that the majority of editing events were not of the specific intron deletion that was desired, but rather the loss of the entire chromosome end downstream of the double-strand break to the telomere. This deleted region contained other non-essential genes and resulted in viable parasites that repaired the chromosome end with additional telomere repeats, in a process that is likely akin to telomere healing after spontaneous double-strand breaks [47]. Bryant et al. also noted that attempts to target other var gene members resulted in a similar outcome, suggesting that editing non-essential genes in subtelomeric regions will be challenging.
For more centrally located non-essential genes, an additional challenge arises when attempting to make non-disruptive edits such as point mutations and tag insertions. Rather than the desired edit, integration of the entire plasmid may occur in a manner similar to a conventional single-crossover recombination. This can result in apparent introduction of the silent mutations, and PCR genotyping of the 5 and 3 borders may appear correct; however, amplification across the locus may fail due to the insertion of the entire plasmid backbone. As this results in a gene disruption, it is not observed for essential genes; however, it appears to be a frequent competing outcome for non-essential genes, suggesting that extra care should be taken in genotyping these targets, with the isolation of clonal lines of particular importance (E.Hitz and T.Voss, personal communication, and [23]). Potential countermeasures could include making two cuts using two gRNAs, linearising the donor vector or the use of PCR or oligo donors, as well as negative selection on the plasmid backbone, although none of these measures are foolproof.

CRISPR modulation of gene expression
The utility of the CRISPR system for interrogating the genome is not restricted to alteration of the nucleotide sequence, but for mammalian systems now extends to a dizzying array of potential options for regulating gene expression and modifying epigenetic marks. These alternate CRISPR activities rely on the DNA binding, but not cleaving, function of Cas9. By disabling the nuclease activity of Cas9, the resulting 'dead' Cas9 (or dCas9), when bound to the target gene, can interfere with RNA polymerase-mediated transcription [48]. Beyond simple steric interference, however, dCas9 has become a sophisticated platform for targeted delivery to genomic sites of a variety of addon effector domains, either via direct fusion with dCas9, recruitment to a dCas9-linked epitope array such as the SunTag [49], or by using a modified gRNA that presents a protein-binding aptamer [50]. For a detailed review, see [51].
Enabling these methods for Plasmodium will require the development of parasite-specific tweaks to how these tools are deployed. For example, the strong viral transactivators such as VP64 that are routinely employed to increase transcription in mammalian systems ('CRISPR activation', 'CRISPRa') [52] were not thought to be able to function effectively in Plasmodium parasites. However, recent work from Lubin Jiang and colleagues demonstrated that a fusion of VP64 with the P65 and RTA transactivation domains can increase transcription of a targeted gene and affect related functions in that parasite [53]. Parasite transactivators, such as the Tati-2 [54] and TRAD4 domains [55], have been described, however these have yet to be validated in the dCas9 context, and more robust transactivators may ultimately be required. Transactivation domains are likely to be found in transcription factors, such as the ApiAP2 proteins, although effective transactivators remain elusive in Plasmodium. Effective stimulation of expression in mammalian systems requires the delivery of the transactivation domain near to the transcription start site [56], and the availability of genome-wide maps of transcription start sites for P. falciparum [57] will aid in this endeavour once effective systems are developed.
An alternate approach for gene regulation is to modify the epigenetic landscape through the recruitment of 'writers' and 'erasers' of epigenetic marks. Early examples of this approach in mammalian systems are the dCas9-mediated recruitment of the LSD1 histone demethylase to remove enhancer marks [58], and the core domain of the p300 histone acetyltransferase to deliver H3K27ac activation marks [59], with corresponding alterations in gene expression levels. Although the nature of the histone modifications in the parasite will differ from those employed in mammalian cells, our understanding of the types of marks, the proteins that deposit them and their regulatory consequences is increasing thanks to a number of genome-scale profiling studies [60][61][62]. Recently, the fusion of GCN5 or Sir2a to dCas9 was shown to activate or repress transcription of a target gene, respectively [53]. We anticipate that additional advances will continue to yield new CRISPR-mediated gene regulation approaches in the near future.
An additional challenge that may need to be overcome to improve CRISPR-mediated gene regulation of the AT-rich genome of Plasmodium species is the difficulty in identifying unique gRNA binding sites with the canonical (-NGG) PAM in intergenic regions, which have the highest AT-content of the genome. However, a wide variety of Cas9 variants with altered PAM specificities continue to be developed (see [63] for a comprehensive list), and new CRISPR nucleases are likely to emerge. One such nuclease that is well suited to AT-rich genomes is Cas12a (originally called Cpf1), which has a (TTTN-) PAM [64], and work by our groups is exploring whether variants of this nuclease (e.g. LbCas12a and AsCas12a) might provide a suitable system for P. falciparum, despite reported indiscriminate activity against single-stranded DNA [65]. Another intriguing possibility is development of RNA-targeting nucleases, such as Cas13, that may allow post-transcriptional regulation by RNA degradation as variants are developed that lack the non-specific RNAse activity (reviewed in [66]).

Conclusions
CRISPR/Cas9 advances are accelerating the pace of Plasmodium research like never before. However, several important questions and challenges remain. First, it is not always possible to achieve complete editing of all parasites in the population of transfected parasites. What is the limiting event, and can it be overcome? We and others frequently observe that genomeediting outcomes from the same gRNA-donor pairing can be highly variable across multiple transfections. Looking ahead, the donor-free approaches of CRISPR-mediated transcriptional regulation will permit transfection of libraries of gRNAs for genetic screens. However, transfection efficiencies for some Plasmodium species remain low compared to other eukaryotes. The development of methods to improve transfection efficiency and reduce editing variability would be highly beneficial. Nonetheless, CRISPR approaches have heralded gains in the speed and efficiency of gene tagging and replacement, and are enabling precise genome modifications that are paving the way for testing active site mutations in enzymes and transcription factors (e.g. Hsp70X and AP2-I), the introduction of drug resistance alleles (e.g. Kelch13) and the dissection of redundant gene function. Few times in the brief history of molecular parasitology has such a modification to our ability to leapfrog forward been so great. We can now envision systematic whole-genome methods to knock out all non-essential genes and create reagents to query all essential genes. Once successful implementation of conditional gene knockdown systems based on dCas9 are viable, this will greatly expand our capacity to explore gene essentiality and the function of numerous unknown genes. We encourage the community to continue to develop and openly share these new tools for wide dissemination and adoption by anyone interested in molecular parasitology the world over.

Key Points
• CRISPR/Cas9 systems have now been developed for most experimentally-tractable Plasmodium species, expanding the range and precision of genome modification. • A variety of approaches has been developed for delivery of the key components: the Cas9 nuclease, the guide RNA and the donor. Donor homology length and guide RNA selection are among the key considerations for experimental design. • Challenges for experimental design include targets located in subtelomeric regions and the AT-rich genomes of some Plasmodium species.