Exploration of cotton leaf curl virus resistance genes and their screening in Gossypium arboreum by targeting resistance gene analogues

Abstract Cotton leaf curl virus (CLCuV) disease is one of the major limiting factors in cotton production, particularly in widely cultivated Gossypium hirsutum varieties that are susceptible to attack by this virus. Several approaches have been employed to explore putative resistance genes in another cotton species, G. arboreum. However, the exact mechanisms conferring disease resistance in cotton are still unknown. In the current study, we used various approaches to identify possible resistance genes against CLCuV infection. We report the identification and isolation of a set of genes involved in the resistance response to viral infestation. PCR products containing genomic DNA gave multiple amplifications with a single primer in most reactions, and 38 fragments were cloned from G. arboreum and G. hirsutum. The sequences of cloned fragments belonged to various pathway genes and uncharacterized proteins. However, five amplified fragments (RM1, RM6, RM8, RM12 and RM31) showed similarity with R genes. Maximum homology (94 %) was observed with G. raimondii toll/interleukin receptor-like protein. BLAST search showed the homology of all resistance gene analogues (RGAs) with more than one chromosome, and multiple hits were observed on each chromosome for each RGA. Expression analysis through RT–PCR identified variable expression levels of the different RGAs in all tested genotypes. The expression level of RGAs differed between symptomatic and asymptomatic plants, with the exception of RGA 395, whose expression level was the same in both diseased and healthy plants. Knowledge of the interaction of these genes with various cotton pathogens could be utilized to improve the resistance of susceptible G. hirsutum and other plant species.


Introduction
Analogous to the vertebrate immune system, the resistance (R) genes in plants recognize pathogen effectors (avirulence factors) and awaken the defence system of plants to combat attack (Ellis and Jones 1998;Chen et al. 2015). Many R genes have been isolated (Zhang and Zhou 2010), characterized and used in crop improvement programmes with varying degrees of success (Gururani et al. 2012). R genes are race specific and exhibit resistance only in the presence of cognate pathogen effectors. Resistance is often manifested as a hypersensitive response at the site of invasion, which restricts pathogen entry (Heath 2000).
Five different classes of R genes are known (Chen et al. 2015). To date the largest class of R genes are nucleotide-binding site leucine-rich repeats (NBS-LRR) (Dangl and Jones 2001;Chen et al. 2015). Arabidopsis thaliana is reported to contain 149 NBS-LRR (Meyers 2003;Khan et al. 2016), and 653 genes of this class have been reported in Oryza sativa (Shang et al. 2009). TIR-NBS-LRR and CC-NBS-LRR are two subclasses of NBS-LRR that contain N terminal toll/interleukin-1 receptor (TIR) and coiled coil domains (CC), respectively (DeYoung and Innes 2006). Other classes of R genes produce surfacelocalized pattern recognition receptors (PRR). These include receptor-like kinases and receptor-like proteins (Monaghan and Zipfel 2012). Sixty-three resistance gene analogue (RGA) clusters have been reported in the diploid D-genome of G. raimondii, while Wang et al. (2013) and Wei et al. (2013) identified 355 NBS-encoding genes in the raimondii genome.
Expressed sequence tags (ESTs) are an important tool for gene discovery (Hughes and Friedman 2005), molecular marker identification (Michalek et al. 2002), microarray development (Alba et al. 2004) and comparative genomics (Schlueter et al. 2004). Expressed sequence tags can be assigned a known function on the basis of homology with known genes found through BLAST search. Expressed sequence tag homologues of R genes can be used to design markers linked to functional R genes (Brugmans et al. 2008). Various types of resistance-gene-based markers include resistance gene analogue polymorphism (RGAP), NBS profiling and inter small RNA polymorphism (iSNAP) (Poczai et al. 2013). Markers based on ESTs have genic function and show linkage to transcriptional regions (Bozhko et al. 2003). Expressed sequence tag-and RGA-based markers have also been used in cotton (Chee et al. 2004;Buyyarapu et al. 2011). Cotton leaf curl is a disease of viral origin, transmitted by the whitefly Bemisia tabaci. This disease is difficult to control owing to the occurrence of several virulent viral strains or related species (Rahman et al. 2017).
In this study, we used RGA and EST homologues of R genes to reveal their expression pattern in cotton leaf curl virus (CLCuV)-resistant and -susceptible cotton genotypes. The RGA and EST homologues expressed in resistant genotypes can be further studied to find genes involved in CLCuV resistance. The aim of the current study was to explore the disease resistance genes in the published literature/GenBank and to screen for them in G. arboreum using degenerate primers on genomic DNA and RT-PCR on RGAs detected from ESTs.

Materials and Methods
Degenerate primer design R genes of different classes were searched in cotton species at the NCBI. The nucleotide sequences of R genes were aligned using the multiple sequence alignment tool in the CLCBIO workbench. The conserved regions of alignments were used to design degenerate primers. Primer length was restricted to 24 mer. A maximum of four degeneracies was allowed per primer. Degeneracy was not allowed at the 3′ end of the primer. The primer sequences are shown in Supporting Information- Table S1.

Cloning of PCR fragments in TA vector
All the PCR fragments amplified in CLCuV-resistant G. arboreum and the fragments differentially expressed in tolerant G. hirsutum genotypes were cloned in TA vector. For this purpose the PCR bands were gel eluted and ligated in TA vector (pTZ57R/T; Thermo Fisher Scientific, USA). The ligation was used to transform Escherichia coli competent cells (DH5∞). The transformed cells were spread on Luria-Bertani (LB) agar plates and the white colonies were cultured to isolate plasmid. The presence of clones was confirmed by restriction analysis using XbaI, SmaI, HindIII, EcoRI, BamHI, PstI, EcoRV and SacI.

Analysis of sequencing data
Ninety-eight cloned fragments were sent for sequencing to Macrogen (Korea). M13 forward primer was used for sequencing. The sequences were trimmed from vector sequences and analysed for the presence of forward and reverse primer sites. The sequences were BLAST searched at NCBI to find their homology with sequences in the NCBI database. Later on the putative biotic stress resistance genes were also BLAST searched in the sequenced gossypium genomes at CottonGen (www.cottongen.org) to determine their chromosomal location.

Retrieval of R gene sequences and primer design
Cotton RGAs of the NBS-LRR class reported by Azhar et al. (2010) were used to design specific primers. Gene sequences representing all five classes of R genes (including N, L6, RPP5, I2, RPS2, RPMI, Cf9, Xa21, RPW8, RRS1R, Ve1 and Pto) were retrieved from GenBank and BLAST searched in G. arboreum ESTs. The homologous ESTs were translated into amino acid sequences to find the open reading frame (ORF). The protein sequences of RGAs in G. arboreum and their homologues in G. hirsutum were aligned using the ClustalW program. Only those G. arboreum ESTs that differed from their homologues in G. hirsutum were selected. Primers were designed from the coding regions of ESTs using Beacon Designer software. Primer parameters were set as follows: primer length 18-24 nucleotides, GC content 40-50 %, melting temperature 55-60 °C and amplicon size 200-250 bp. The list of primers designed using the NBS class of previously reported RGAs is shown in Supporting Information- Table S2. Primers designed from the ESTs homologues of disease resistance genes are listed in Supporting Information- Table S3.

Plant material and RNA isolation
Six cotton genotypes were used for the study, including one from G. arboreum (Ravi, resistant to CLCuV) and five from G. hirsutum, namely Coker (highly susceptible to CLCuV), NIBGE-2, . Young top leaf samples of CLCuVsusceptible and -non-susceptible plants (40-45 days old) of each genotype were taken from the National Institute for Biotechnology and Genetic Engineering (NIBGE) cotton field. Leaf samples were ground in liquid nitrogen and Invitrogen RNA purification reagent was used to isolate total RNA. The quality of RNA was checked by running on 1 % agarose gel. High-quality RNA was used to make first-strand cDNA using oligo dT primers. The cDNA was stored at −70 °C until further use.

RT-PCR analysis
The concentration of the cDNA was measured on a spectrophotometer and equal concentrations were prepared for the reaction mixture. The RT-PCR reaction consisted of 5 µL of cDNA template at an appropriate dilution, 5 µL of 10× (NH 4 ) 2 SO 4 Taq buffer, 4 µL of 25 mM MgCl 2, 1 µL of 10 mM dNTPs, 1 µL of each 50 ng µl −1 forward and reverse primer, 1 µL of 2 units of Taq DNA polymerase and 36 µL of PCR water. PCR was performed on a BioRad thermal cycler and the PCR profile was set as: first denaturation at 94 °C for 4 min, followed by 40 cycles of denaturation at 94 °C for 1 min, annealing at 50 °C for 1 min, extension at 72 °C for 30 s, and last cycle of final extension at 72 °C for 10 min.

Analysis of PCR products amplified with degenerate primers
The use of degenerate primers in PCR with genomic DNA gave multiple amplifications with a single primer in most reactions. The fragments appearing in CLCuV-resistant G. arboreum and partly tolerant G. hirsutum genotypes that differed with respect to totally susceptible Coker were selected and cloned for sequencing. In this way 38 fragments were cloned from G. arboreum and G. hirsutum. The details of the cloned fragments including their appearance in the different cotton genotypes and BLASTX results are presented in Table 1.
The sequences of cloned fragments belonged to a variety of different pathway genes and uncharacterized proteins. However, five fragments (RM1, RM6, RM8, RM12 and RM31) showed similarity with R genes. RM1, RM6 and RM8 were cloned from G. arboreum and RM12 and RM13 were cloned from G. hirsutum genotype NIBGE-115. RM1 belongs to the NBS class and showed homology with TMV resistance like protein N. Maximum homology was found with a G. raimondii gene. RM6 showed homology with the receptor-like protein kinase of Jatropha curcas. RM8 showed the presence of TIR-2 family signatures in BLASTX results and homology with the TIR class of proteins in different plant species. Maximum homology (94 %) was observed with G. raimondii toll/interleukin receptor-like protein. RM12 and RM31 showed homology with the serine/threonine protein kinase class of cell surface receptors of G. raimondii.

BLAST of putative R genes in sequenced Gossypium genomes
The identified putative RGAs were BLAST searched (at cottongen.org) in the sequenced genomes of Gossypium species to determine their chromosomal location (Table 2). All the RGAs showed homology with more than one chromosome, and multiple hits were observed on each chromosome for each RGA.  (Table 4). The ESTs were further used for BLASTx search to determine their similarity with other genes in different species. The E-value and percent similarity were significant, inferring that the selected ESTs were representative of the R genes. The primer design was based on G. arboreum ESTs and thus it was expected that all primers would result in amplification in at least G. arboreum. However, the RT-PCR results (Table 5) indicated that primers for RM3, RM20, RM22 and RM27 did not result in amplification in G. arboreum or in genotypes of G. hirsutum. All other ESTs were expressed in G. arboreum and G. hirsutum genotypes. RM1, RM2, RM10, RM14 and RM26 (homologues of STKs, RLKs and Rar1 resistance genes) are important ESTs as they are expressed in G. arboreum and tolerant G. hirsutum genotypes, whereas no expression was detected in susceptible Coker ( Fig. 1.3-1.7). The ESTs expressed in G. arboreum were also expressed in all genotypes of G. hirsutum except for RM28, which was expressed at low levels only in some G. hirsutum genotypes. The expression level of ESTs ranged from low to high in Ravi. Relatively high EST expression was observed in symptomatic and asymptomatic plants of Coker. The expression pattern of ESTs was similar and high in symptomatic and asymptomatic plants of NIBGE-2, N-253 and IR-3701. Higher expression levels were detected in NIBGE-115, with more ESTs in symptomatic plants compared with asymptomatic ones.

Discussion
Degenerate PCR primers can be used to isolate genes from species on the basis of conserved domains within the genes. This approach has been used in cotton. Gao et al. (2006) cloned RGAs of the NBS class and also defence gene analogues from G. barbadense using the degenerate primer approach. Similarly, Tan et al. (2003) reported the cloning of 33 RGAs from G. hirsutum using degenerate primers based on conserved domains of NBS motifs. In this study, degenerate primers were  5,9,8,11 Chr. no. 3,9,11 Chr. no. 5,6,7,9,11 610   designed using the conserved regions of nucleotide sequence alignments of the reported plant R genes of various classes deposited in GenBank. The PCR amplified 38 fragments from genomic DNA using degenerate primers, and these were cloned from G. arboreum, G. hirsutum and the genotype NIBGE-115. Out of these amplified fragments, five fragments corresponded to RGAs encoding the TIR and STK classes of R genes. Resistance gene analogues from cotton belonging to the NBS class have also been reported (Tan et al. 2003;Gao et al. 2006).

Symptomatic Asymptomatic Symptomatic Asymptomatic Symptomatic Asymptomatic Symptomatic Asymptomatic Symptomatic
Prior sequence information and the annealing of primers at multiple complementary but non-specific sites are some of the limitations of PCR (Garibyan and Avashia 2013). In this study, PCR on genomic DNA using degenerate primers resulted in non-specific amplifications due to non-specific primer binding. The forward and reverse primer sequences were observed in almost all of the cloned fragments; however, few of them represented R genes.
Gossypium arboreum has a very sturdy defence mechanism to combat CLCuV (Khan et al. 2016). The ESTs identified from G. arboreum show conserved domains of 2,3,4,5,6,10,14,9,17,18,19,20,21,27,28 contained only the PKc domain) while the LRR domain was represented by seven ESTs (ESTs No. 12,13,15,16,23,24 and 25). PKc is a member of the AGC group in the protein kinase superfamily (Newton and Messing 2010). These enzymes are involved in the phosphorylation of serine and threonine amino acids found in proteins (Trotta et al. 2016). Serine/threonine kinase is one of the PKc enzymes involved in the regulation of cell proliferation, programmed cell death (apoptosis), cell differentiation and embryonic development (Newton and Messing 2010;Trotta et al. 2016). The BLASTX results indicated that the G. arboreum ESTs contained PKc-like domains and had similarity with STKs and the LRR-kinase class of genes in other crop plants. Two G. arboreum ESTs had domains of PKc-SPS1 (ESTs No. 8,11). This is an important class of kinases that are known to activate the p38 MAP kinase pathway of stress-induced signal transduction in mammals. The presence of a PKc-like domain in G. arboreum ESTs was interesting and it can be inferred that these ESTs were from the protein kinase class of genes in G. arboreum.
Leucine-rich repeat receptor kinases (LRR-RKs) are the largest subfamily of transmembrane receptors, with over 200 members in Arabidopsis (Shiu and Bleecker 2001). Leucine-rich repeat receptor kinases are mediators of plant defence and developmental processes. Some ESTs in this study (ESTs No. 12,13,15,16,23 and 24) contained the domain for LRR-8, which is an important domain of LRR-RK proteins that plays a critical role in ligand recognition in the CLV1 and BAM family of cell receptors (Shinohara et al. 2012). Expressed sequence tags 23, 24 and 25 containing LRR domains showed significant homology with polygalacturonaseinhibiting protein (PGIP) in the BLASTX results. Plant PGIPs help confer resistance against cell wall-degrading   phytopathogens like fungi, bacteria and nematodes (Juge 2006;Kikot et al. 2009;Danchin 2011).
One EST (26) showed significant similarity with the resistance signalling gene Rar1 bearing the zinc binding CHORD domain that is found in all eukaryotes except yeast. Muskett et al. (2002) reported that Rar1 acts as a rate-limiting factor for R gene triggered defence activation. It regulates the extent of pathogen containment, hypersensitive plant cell death and oxidative burst at primary infection sites. So the selected ESTs were representative of the important genes in plant defence mechanisms.
Resistance gene analogues have been used to study the genetics of resistance in plant species using R gene specific or degenerate primers (Leister et al. 1996;Chen et al. 1998). Azhar et al. (2010) identified 24 RGAs of the NBS class in cotton and utilized these RGAs as markers in interspecific hybrid selection for analysing G. arboreum genome contribution. In the present study, RGAs were used to reveal the expression pattern in diploid and tetraploid cotton. Expression analysis of cell surface receptors was also conducted by utilizing their conserved domain regions of LRR and STK collected from G. arboreum ESTs. The selected ESTs were confirmed to contain the LRR and STK domains of resistance genes through BLASTX and DELTA-BLAST search as shown in Table 4. ESTs represent the more conserved part of the genome and EST-based markers can be studied in other species of cotton (Scott et al. 2000). Gossypium arboreum ESTs were not found in the BLAST search of the NBS class of genes. The expression of STK and LRR ESTs was higher than that of the NBS class of RGAs. It can be deduced that in G. arboreum R genes of classes other than NBS-LRR might be more active to counter biotic and abiotic stresses. Besides the importance of the NBS class of R genes, transmembrane receptors might also play a role in the recognition of and defence against CLCuV in G. arboreum. The kinase domain of disease resistance genes might play a role in phosphorylation and thus help transfer phosphate from the high-energy phosphate donor molecules.
Using the same set of primers designed using G. arboreum ESTs, amplification in RT-PCR was seen in both G. arboreum and G. hirsutum. However, there was a variable level of expression among the tested genotypes. Expression of functional genes is mostly variable in the sense that some genes are constitutively expressed and some are induced only in particular conditions and thus expression may be stage specific. Resistance gene analogues that are not expressed at all might be nonfunctional as they were identified from genomic DNA. Expressed sequence tags with no expression in the genotypes under study might be influenced by the environment, and these may also be stage or tissue specific, because cotton ESTs from the NCBI were mainly fibre specific. The RGAs and ESTs expressed in G. hirsutum but not G. arboreum might be D-genome specific. The RGAs and ESTs expressed in both G. arboreum and G. hirsutum may be A-genome specific.
Resistance gene analogues and ESTs expressed only in G. arboreum and in asymptomatic plants of G. hirsutum could be useful in the study of resistance against CLCuV. Resistance gene analogues and ESTs not expressed in Coker might be helpful in the study of CLCuV resistance. In this context, RGA 383 and 384 of the NBS-LRR class and the ESTs RM1, RM2, RM10, RM14 and RM26 (homologues of STKs, RLKs and Rar1 resistance genes) may be important for further studies on CLCuV resistance and the role of the kinase domain of R genes under similar conditions of inoculum and biotic stresses. As mentioned earlier these genes are known to be important players in plant defence, especially in the context of causing programmed cell death in infected tissues. So these RGAs and ESTs may be useful for isolating fulllength genes, and their particular functions should be explored in response to biotic or abiotic stresses.

Supporting Information
The following additional information is available in the online version of this article- Table S1. List of degenerate primers designed on the conserved regions of R genes retrieved from GenBank. Table S2. The list of primers designed on nucleotidebinding site (NBS) class of previously reported resistance gene analogues (RGAs). Table S3. The list of primers designed from the expressed sequence tag (EST) homologues of disease resistance genes.