Analysis of the PEBP gene family and identification of a novel FLOWERING LOCUS T orthologue in sugarcane

Abstract Sugarcane (Saccharum spp.) is an important economic crop for both sugar and biomass, the yields of which are negatively affected by flowering. The molecular mechanisms controlling flowering in sugarcane are nevertheless poorly understood. RNA-seq data analysis and database searches have enabled a comprehensive description of the PEBP gene family in sugarcane. It is shown to consist of at least 13 FLOWERING LOCUS T (FT)-like genes, two MOTHER OF FT AND TFL (MFT)-like genes, and four TERMINAL FLOWER (TFL)-like genes. As expected, these genes all show very high homology to their corresponding genes in Sorghum, and also to FT-like, MFT-like, and TFL-like genes in maize, rice, and Arabidopsis. Functional analysis in Arabidopsis showed that the sugarcane ScFT3 gene can rescue the late flowering phenotype of the Arabidopsis ft-10 mutant, whereas ScFT5 cannot. High expression levels of ScFT3 in leaves of short day-induced sugarcane plants coincided with initial stages of floral induction in the shoot apical meristem as shown by histological analysis of meristem dissections. This suggests that ScFT3 is likely to play a role in floral induction in sugarcane; however, other sugarcane FT-like genes may also be involved in the flowering process.

In Sorghum, five possible isoforms have been predicted for the SbFT13 locus ("LOC8080507"), however the isoforms X1, X2, X3, X4 have an additional exon located in intron 3, diverging from the canonical 4 exons present in PEBP genes, and the resulting translated proteins have very dissimilar C terminal sequences to any other PEBP gene family members. This not the case for isoform X5, which shows good homology to known PEBP proteins for the full sequence and is therefore the best candidate to use to determine the ORF positions of ScFT2.
Two contigs were found in the Sugarcane Genome Hub database (Sh_219F24 and Sh_220O20) containing sequences homologous to the predicted coding regions in Sorghum SbFT13-X5, however the sugarcane contigs have very different sequences in the non-coding regions with Sh220O20 being similar to the Sorghum gene in these regions, but Sh219F24 being quite different. The Sucest-Fun database contains another 9 contigs similar to Sh219F24 (only 2 represented here, see Supp table 2 for others), and another 3 similar to Sh220O20. The promoter and 5'UTR of Sh220O20 and related contigs also show some rearrangements when compared to the Sorghum sequence or Sh219F24, possibly due to mobile elements, which could affect the production of a valid transcript.
Bottom half. Gene structure of the sugarcane ScFT2 gene. The Sorghum SbFT13-X5 isoform is the closest Sorghum homologue to ScFT2, its coding sequence being 90% similar with a conserved intron-exon structure to the sugarcane gene, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, the red stars to stop codons. The sequence of our expressed transcript_110985 matches the sequence of Sh219F24 and several other Sucest-Fun contigs, which are all distinctly recognisable by several SNPs in the 3' region. It is also worth noting that this transcript may not have been fully spliced at the time of detection since it still contains the first intron sequence.

Supplementary Fig. S3
Representation of the ScFT4 gene. Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment. The blue lines indicate matching genomic sequence (>90% similarity), with the black arrow indicating the start point of alignment of the contig. Contig "scga7_uti_cns_0115631" from the Sucest-Fun database covers the complete ScFT4 transcript. Three contigs containing regions matching to ScFT4 (Sh_246B09, Sh_227M09, and Sh_248B09) are present in the Sugarcane Genome Hub database, their sequences are identical over the coding sequence region.
Bottom half. Gene structure of the sugarcane ScFT4 gene. SbFT1 is the closest homologue in Sorghum with 98% similarity in its coding sequence but only 20% in the 3'UTR, and it has a conserved intron-exon structure to the sugarcane gene, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, and the red stars to stop codons.

Supplementary Fig. S4
Representation of the ScFT5 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment. The blue lines indicate matching genomic contig sequences (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. Three contigs containing the coding sequence from the Sucest-Fun database are shown (see Supp Table 1). No sequence with significant similarity was found in the Sugarcane Genome Hub database.
Bottom half. Gene structure of the sugarcane ScFT5 gene. SbFT5 is the closest homologue from Sorghum (96.5% coding sequence similarity, but only 35% and 15% in 5' and 3'UTRs respectively) and it has a conserved intron-exon structure to the sugarcane gene, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, and the red stars to stop codons.

Supplementary Fig. S5
Representation of the ScFT6 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment (including insertions). The blue lines indicate matching genomic sequence (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. Indels (>5bp) are represented by red rectangles with the pointed end marking a deletion in one contig and the opposite side corresponding to an insertion in the other. The sequence of the ScFT6 RNAseq transcript detected in this study is present in contigs "scga7_uti_cns_0150350" and "scga7_uti_cns_0347422", but with the latter missing the 3'end of the gene. No sequence with sufficient similarity was found in the Sugarcane Genome Hub database.
Bottom half. Gene structure of the sugarcane ScFT6 gene. SbFT3 is the closest homologue in Sorghum with 97% coding sequence similarity (85% and 88% in the 5' and 3'UTR regions respectively) and a conserved intron-exon structure to the sugarcane gene, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, and the red stars to stop codons. The long 5' and 3' UTRs include a number of noticeable indels: 300bp upstream of the start codon, a 45bp insertion is present in contig "scga7_uti_cns_0347422" and the ScFT6 transcript, which is also present in the Sorghum SbFT3 at the same position. In SbFT3 however another 25bp insertion is symmetrically nested within the 45bp insertion, possibly the result of mobile element activity at this site following the split between the sorghum and sugarcane lineages. A recently published transcript (Genbank ref MN458470.1 from cultivar SP80-3280) of ScFT6 is mostly identical to our ScFT6 transcript, The only difference in the coding sequence being a single base pair change (T to G at position 416 of MN458470.1) which causes an amino acid change from V to F at position 129 (a variable amino acid position across all FT-like genes).

Supplementary Fig. S6
Representation of the ScFT7 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment. The blue lines indicate matching genomic contig sequences (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. The sequence of ScFT7 was obtained by searching for a homolog of SbFT7 in the Sugarcane Genome Hub database. Subsequently, three contigs containing the gene were then found in the Sucest-Fun database. Their coding sequence has 76.8% similarity with most of the difference accounted for by sequence divergence in the last exon, introducing a premature stop codon 144bp before the 3' end of SbFT7.
Bottom half. Gene structure of the sugarcane ScFT7 gene. ScFT7 has a 97.8% protein sequence similarity to SbFT7, with most the difference explained by the shorter C-terminus of ScFT7 (48aa less). Apart from the last exon the two genes have a conserved intron-exon structure the sizes of exons (Ex) and introns (In) are indicated. The 5' & 3' UTRs are depicted but show low sequence similarity to sorghum (<50%) and have not being confirmed by sequencing of a transcript. The red arrows correspond to translational starts, and the red stars to stop codons.

Supplementary Fig. S7
Representation of the ScFT8 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment (including insertions). The blue lines indicate matching genomic contig sequences (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. The sequence of ScFT8 was obtained by searching for a homolog of SbFT8 in the Sugarcane Genome Hub database. The BAC "Sh208I14" contains the full gene, whereas "Sh220P06" contains a deletion of 74bp removing the second exon as well as a second deletion of 246bp in the 3'UTR. Two other contigs containing the gene were found in the Sucest-Fun database, both matching the complete "Sh208l14" sequence.
Bottom half. Gene structure of the sugarcane ScFT8 gene. ScFT8 has a 96% coding sequence similarity to SbFT8 and a conserved intron-exon structure, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, and the red stars to stop codons. The 5' and 3' UTRs are depicted but have not being confirmed by sequencing of a transcript. The 3'UTR sequence from sugarcane shows low similarity to sorghum.

Supplementary Fig. S8
Representation of the ScFT9 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment. The blue lines indicate matching genomic contig sequences (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. The sequence of ScFT9 was obtained by searching for a homologue of SbFT4 in the Sugarcane Genome Hub database. Subsequently two contigs containing the gene (cga7_uti_cns_0418904 and scga7_uti_cns_0408569) were also found in the Sucest-Fun database.
Bottom half. Gene structure of the sugarcane ScFT9 gene. ScFT9 has a 95% sequence similarity to SbFT4 over the full gene sequence (coding and non-coding), and a conserved intron-exon structure, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, and the red stars to stop codons. The 5' and 3' UTRs are depicted but have not being confirmed by sequencing of a transcript.

Supplementary Fig. S9
Representation of the ScFT10 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment. The blue lines indicate matching genomic contig sequences (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. The sequence of ScFT10 was obtained by searching for a homolog of SbFT10 in the Sugarcane Genome Hub database. Subsequently three contigs containing the gene were found in the Sucest-Fun database.
Bottom half. Gene structure of the sugarcane ScFT10 gene. ScFT10 has a 93.7% protein sequence (77.7% coding sequence) similarity to SbFT10, and a conserved intron-exon structure, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, and the red stars to stop codons. The 5' and 3' UTRs are depicted, but show low sequence similarity to sorghum (<50%) and have not being confirmed by sequencing of a transcript.

Supplementary Fig. S10
Representation of the ScFT11 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment. The blue lines indicate matching genomic contig sequences (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. The sequence of ScFT11 was obtained by searching for a homolog of SbFT11, "Sh_249H13_t000010" was identified in the Sugarcane Genome Hub database and then three other contigs that include the full gene were found in the Sucest-Fun database.
Bottom half. Gene structure of the sugarcane ScFT11 gene. ScFT9 has a 94% coding sequence similarity to SbFT11 and a conserved intron-exon structure, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, the red stars to stop codons. The 3' and 5' UTRs are depicted, and show high sequence similarity to sorghum, but have not being confirmed by sequencing of a transcript.

Representation of the ScFT13 gene
Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment (including insertions). The blue lines indicate matching genomic contig sequences (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. Gaps inserted into the aligned genomic sequences are represented by parallel diagonal lines and indicate large insertions. The sequence of ScFT13 was obtained by searching for a homolog of SbFT12, five contigs containing all, or parts, of the gene were found in the Sucest-Fun database. No homologous sequence was found in the Sugarcane Genome Hub database.
Bottom half. Gene structure of the sugarcane ScFT13 gene. SbFT12 and ScFT13 share 97% protein sequence similarity and a conserved intron-exon structure, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, the red stars to stop codons.

Supplementary Fig. S13
Representation of the ScMFT1 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment (including insertions). The blue lines indicate matching genomic sequence (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. Diagonal cross lines followed by hatched line indicate the end of homology amongst contigs. The Sugarcane Genome Hub database contains the full-length predicted transcript (Sh_204E12_t000040). Three contigs from the Sucest-Fun database containing this coding sequence are represented (see Supp  Table 2 for contig details). Scga7_uti_cns_0255087 contains the full gene, and its sequence similarity to Sh_204E12_t000040 extends for several kilobases in the 5' and 3' direction. Scga7_uti_cns_0158804 sequence diverges100bp after the stop codon. Scga7_uti_cns_0345261 diverges before the end of the gene, 1.7kb after the start codon.
Bottom half. Gene structure of the sugarcane ScMFT1 gene. SbMFT1 is the closest homologue in Sorghum, its coding sequence being 95% similar (but only 5% similarity in the 3'UTR) and it has a conserved intron-exon structure to the sugarcane gene, the sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, and the red stars to stop codons. The 3' and 5' UTRs are depicted but have not being confirmed by sequencing of a transcript. Supplementary Fig. S14 Representation of the ScMFT2 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignment (including insertions). The blue lines indicate matching genomic sequence (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs. The Sugarcane Genome Hub database contains the full-length predicted transcript (Sh_210E07), three other contigs containing the gene were also found in the Sucest-Fun database, all of them containing the same deletion of 260bp in the first intron, and two of them further deletions in the third intron, but with no change in their coding sequence.
Bottom half. Gene structure of the sugarcane ScMFT2 gene. ScMFT2 has 99.5% protein sequence similarity to SbMFT2, and 97% coding sequence similarity and a conserved intron-exon structure. The sizes of exons (Ex) and introns (In) are indicated. Their full sequences (including UTRs) are however very divergent (only 40% similarity). The red arrows correspond to translational starts, and the red stars to stop codons.

Supplementary Fig. S15
Representation of the ScTFL2 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignments (including insertions). The blue lines indicate matching genomic sequence (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs and the small red triangles indicating single nucleotide changes. The 7 contigs containing ScTFL2 were found by screening the Sucest-Fun database using the sequence of SbTFL1_4. No matching gene was found in the Sugarcane genome hub. The first four contigs shown here have an identical coding region sequences, the changes seen in the other two could be sequencing errors.
Bottom half. Gene structure of the sugarcane ScTFL2 gene. ScTFL2 has 97.8% coding sequence similarity to SbTFL1_4, with a conserved intron-exon structure. The sizes of exons (Ex) and introns (In) are indicated. The red arrows correspond to translational starts, the red stars to stop codons. The 3' and 5' UTRs of ScTFL2 are depicted but show low similarity to Sorghum and haven't confirmed by sequencing of a transcript. Supplementary Fig. S16 Representation of the ScTFL3 gene Top half. Alignment of contigs containing the predicted transcript sequence. The top grey line indicates the distances relative to the consensus sequence of the alignments (including insertions). The blue lines indicate matching genomic sequence (>90% similarity) with the black arrows indicating the point of alignment of each of the contigs.

Classification of TFL-like genes in sugarcane
This gene tree was obtained by aligning nucleotide coding sequences for TFL-like genes found in publicly available databases. The 4 TFL-like genes from Sorghum (SbTFL1_1 to SbTFL1_4) are indicated with dots. The published sequences from sugarcane (ScTFL1, ScTFL3 & ScTFL4) are indicated with stars, in the case of ScTFL2 the four entries indicated by diamonds have identical sequence, and are the most parsimonious to SbTFL1-4. They were therefore chosen as the reference sequence. The three other entries have only a single nucleotide difference each. The various copies found in the Sucest-Fun database are designated by the 4 last digits of their contig number (see Supp Table 2 for full reference). ScTFL_202_P16 is the only TFL-like gene found in the Sugarcane Genome Hub database. All sequences obtained from contigs fall within defined single clades containing one of the sugarcane TFLlike genes (ScTFL1 to ScTFL4), suggesting they are redundant copies of the same gene with little variation. They were therefore considered as a single gene in this study. Single nucleotide variation observed between entries within individual clade of each gene have very little effect on the amino acid sequence.

Expression of ScFT3 in SD and LD over the developmental timecourse experiment.
Expression of ScFT3 is calculated as the average 2^-(∆Ct) relative to two reference genes (ScTUB and ScUBQ1). The overall difference in ScFT3 expression in SD compared to LD, as calculated by REST, is shown in the manuscript (Fig.5b).