The Rhodoexplorer Platform for Red Algal Genomics and Whole-Genome Assemblies for Several Gracilaria Species

Abstract Macroalgal (seaweed) genomic resources are generally lacking as compared with other eukaryotic taxa, and this is particularly true in the red algae (Rhodophyta). Understanding red algal genomes is critical to understanding eukaryotic evolution given that red algal genes are spread across eukaryotic lineages from secondary endosymbiosis and red algae diverged early in the Archaeplastids. The Gracilariales is a highly diverse and widely distributed order including species that can serve as ecosystem engineers in intertidal habitats and several notorious introduced species. The genus Gracilaria is cultivated worldwide, in part for its production of agar and other bioactive compounds with downstream pharmaceutical and industrial applications. This genus is also emerging as a model for algal evolutionary ecology. Here, we report new whole-genome assemblies for two species (Gracilaria chilensis and Gracilaria gracilis), a draft genome assembly of Gracilaria caudata, and genome annotation of the previously published Gracilaria vermiculophylla genome. To facilitate accessibility and comparative analysis, we integrated these data in a newly created web-based portal dedicated to red algal genomics (https://rhodoexplorer.sb-roscoff.fr). These genomes will provide a resource for understanding algal biology and, more broadly, eukaryotic evolution.


Introduction
Red algae (Rhodophyta) represent a lineage of photosynthetic eukaryotes in the Archaeplastids that diverged from green algae around 1,700 Ma (Yang et al. 2016). Within the Rhodophyta, the Cyanidiophyceae were the earliest to diverge ∼1,200 Ma, while the Florideophyceae diverged more recently (i.e., 412 Ma; Yang et al. 2016) and constitute the most speciose group (Graham et al. 2016). In this context, the genomic resources currently available (supplementary table S1, Supplementary Material online) represent only a fraction of the diversity of red algae, limiting our capacity to reconstruct the evolutionary history of the unique features of this group.
The Florideophyceae have a life cycle in which haploid male and female gametophytes alternate with a diploid tetrasporophyte (but see supplementary fig. S1, Supplementary Material online). Many species have "isomorphic" gametophytes and tetrasporophytes, which are hard to discern without the aid of molecular tools (e.g., sexlinked markers, Martinez et al. 1999;Guillemin et al. 2012;or microsatellites, Krueger-Hadfield et al. 2016).
Here, we focus on four Gracilaria (There is controversy over the systematics of Gracilaria Greville, but for the purposes of this paper, we consider the four species as belonging to the genus Gracilaria [sensu Lyra et al. 2021;Guiry and Guiry 2022]). species spanning roughly 170 Myr of evolution (Lyra et al. 2021). These species were chosen based on their evolutionary, ecological, and/or economic importance. Species in the genus Gracilaria produce agars in their cell wall (Popper et al. 2011); they can be propagated vegetatively and serve as ecosystem engineers in intertidal zone (Kain and Destombe 1995). The four taxa chosen can be divided into three clades based on their molecular divergence: 1) Gracilaria chilensis and Gracilaria vermiculophylla, 2) Gracilaria caudata, and 3) Gracilaria gracilis (Lyra et al. 2021). Gracilaria gracilis and G. caudata are evolutionarily more distinct than the phylogenetic group that contains G. chilensis and G. vermiculophylla. Gracilaria chilensis C.J. Bird et al. is an important crop along the Chilean coastline, where it has been both harvested and subsequently planted after a crash in natural stands likely due to overharvesting (Buschmann et al. 2001). The artificial selection for tetrasporophytes has resulted in early stages of domestication (Valero et al. 2017) and loss of sexual reproduction (Guillemin et al. 2008). Gracilaria vermiculophylla (Ohmi) Papenfuss is a successful invader in many of the bays and estuaries of the Northern Hemisphere (Krueger-Hadfield et al. 2017). These invasions were likely facilitated by adaptive shifts in temperature and salinity tolerance (e.g., Sotka et al. 2018) and to biofoulers (e.g., Bonthond et al. 2020), as well as the ability to fragment (Krueger-Hadfield et al. 2016). Gracilaria caudata J. Agardh can form dense stands in the intertidal zone (Plastino and Oliveira 1997) and has been subjected to intense harvesting pressure, leading to declines in native populations (Hayashi et al. 2014; see also Ayres-Ostrock et al. 2019). Finally, G. gracilis (Stackhouse) Steentoft, L.M. Irvine & Farnham is a long-lived species that inhabits tide pools along European coastlines. This species serves as model species to test hypotheses related to the evolution of sex (e.g., alternation of haploid and diploid phases in life cycles, Destombe et al. 1989Destombe et al. , 1992Destombe et al. , 1993Hughes and Otto 1999; mating system and sexual selection, Richerd et al. 1993;Engel et al. 1999).
The availability of genomic and genetic resources for these four Gracilaria species should aid in our understanding of the evolutionary ecology of red algae in their dynamic environment, during invasions of new habitats, under cultivation practices, and in response to climate change. Moreover, these new resources will add to the existing genomic data and illuminate key processes in eukaryotic evolution. The Rhodoexplorer Red Algal Genome Database currently includes the Gracilaria species discussed here but will include all the high-quality genomic resources available for the Rhodophyta (e.g., genomes and transcriptomes), thereby providing a unique resource for comparative analyses.

Results and Discussion
Genome Assembly Genome assembly sizes were 72 and 76 Mb for G. gracilis and G. chilensis, respectively. In addition, we created a draft genome assembly based on the Illumina sequencing only for G. caudata (30 Mb) and reassembled the genome of G. vermiculophylla (Flanagan et al. 2021) to a final 45 Mb after bacterial contamination removal. The above genome assemblies were comparable with the genomes of Gracilaria domingensis (78 Mb; Nakamura-Gouvea et al. 2022) and Gracilaria changii (36 Mb; Ho et al. 2018). PacBio assemblies of G. chilensis and G. gracilis produced here (138 and 279 contigs, respectively; N50 of 1.56 and 0.56 Mb, respectively) are the most contiguous red macroalgal genomes presently available in public databases, apart from G. vermiculophylla and Pyropia yezoensis where the addition of a HiC library enabled scaffolding nearly at the chromosome level (Wang et al. 2020;Flanagan et al. 2021). In G. vermiculophylla, however, regardless of the high N50 of 2.56 Mb, the total number of contigs/scaffolds was also high (7,753/4,240). The G. caudata assembly was fragmented with a low N50 of 21 kb and 55,767/5,535 contigs/scaffolds. Despite the differences in assembly size, BUSCO scores were similar across the long-readsequenced G. gracilis and G. chilensis (83.6% and 81.6% of conserved proteins present) and the more fragmented G. caudata genome (81.6%, Eukaryota_odb10; Manni et al. 2021, Simão et al. 2015; table 1). The reassembled genome of G. vermiculophylla contained 71.8% of the conserved proteins. Given the diversity of Rhodophyta and the lack of lineage-specific databases, these results are in the expected range. A recent study estimated the presence of conserved eukaryotic genes (Eukaryota_odb10) in red algal genomes at a median level of 69% (Hanschen et al. 2020).
Red algal genomes are repeat rich, with half or more of their genomic sequence being constituted by repetitive elements, as reported previously for Porphyra umbilicalis

Gene Prediction and Annotation
Gene prediction yielded a total of 7,943, 8,737, and 9,460 protein-coding sequences for G. chilensis, G. caudata, and G. gracilis (table 1), which was comparable with other red macroalgal genomes, C. crispus (9,815 genes; Collén et al. 2013) and G. changii genome (10,912 genes; Ho et al. 2018). In addition, we annotated the reassembled genome of G. vermiculophylla, which yielded fewer genes (6,807). Among these genes, 70.6-76.6% did not contain any introns, as typical for the compact genomes of red algae (Qiu et al. 2015). Most Gracilaria genes had homologous sequences in the Uniprot database (84.2-89.7%) and were annotated with at least one INTERPRO hit (91.7-93.6%). Between 47.9% and 54.4% of genes were associated with gene ontology (GO) annotations.
OrthoFinder analyses identified 4,666 orthogroups present in all four genomes (supplementary fig. S2, Supplementary Material online) versus 408-620 orthogroups or orphan genes specific to only one of the sequenced species (supplementary fig. S2, Supplementary Material online). Among the species-specific sequences, the rate of GO  annotation was lower than for the entire data set, ranging from 12.7% for G. chilensis to 18.2% for G. caudata. Both the annotated and the unknown species-specific genes constitute attractive targets to study their role in adaptation and speciation.

Rhodoexplorer Red Algal Genome Database
In addition to depositing the raw reads and sequenced genome in a public repository, we integrated the data into the newly created Rhodoexplorer Red Algal Genome Database (https://rhodoexplorer.sb-roscoff.fr), which will include more red algal genomes in the future. The services provided include the following: • Information about the sequenced strains, with links to external databases (NCBI, WoRMS, and Algaebase). • Assembly and annotation metrics. • Data downloads: genomic, genes and proteomic data sets, structural and functional annotations, orthology clusters, etc. • A Blast interface with a selection of red algal genomes, predicted and de novo assembled transcriptomes and proteomes. • Visualization tools: a genome browser to visualize the predicted genes and the RNA-sequence (RNAseq) data mapped on the genome and a web interface to visualize functional annotations and retrieve individual protein sequences.

Culture Conditions
Cultures were initiated either from lab crosses or from tetraspores released by field-collected tetrasporophytes. Gracilaria caudata was grown in the modified von Stosch nutrient solution (Ursi and Plastino 2001) diluted to 25% in seawater (32 practical salinity unit [psu]), with weekly renewals. The algae were kept in culture chambers at 25 °C under fluorescent illumination of 70 μmol m −2 s −1 14-h photoperiod, following previously established optimal growth conditions Oliveira 1992a, 1992b). Gracilaria chilensis was grown in Provasoli medium (McLachlan 1973), changed weekly during the first 2 months and twice a week thereafter. Cultures were kept at 13 °C under 40-60 μmol m −2 s −1 of light with 12-h day length.

Nucleic Acid Extraction, Library Preparation, and Sequencing
Genomic DNA (gDNA) was extracted using DNeasy PowerPlant Pro Kit for G. caudata or an in-house protocol based on Faugeron et al. (2001)
All codes used for genomes assembly and annotation are available on the Gitpage dedicated to the genome database project https://abims-sbr.gitlab.io/rhodoexplorer/doc/ data_process/.

Rhodoexplorer Red Algal Genome Database
The main web portal (https://rhodoexplorer.sb-roscoff.fr) has been implemented using the Python web framework Django, with data stored in a relational database (PostgreSQL).
For each red algal species, an integrated environment of visualization tools has been deployed based on the Galaxy Genome Annotation (GGA) project (Bretaudeau et al. 2019). Each GGA environment deployed for the Rhodoexplorer database includes the following: Chado, a PostgreSQL relational database schema for storing biological data (Mungall et al. 2007); JBrowse, a webbased genome browser (Buels et al. 2016); Tripal, a Drupal-based application for creating biological websites (Sanderson et al. 2013); Elasticsearch, a distributed, free, and open search and analytics engine for all types of data (https://www.elastic.co/products/elasticsearch); and Galaxy, a browser-accessible workbench for scientific computing used as a data loading orchestrator for administrators (The Galaxy Community 2022). To facilitate the deployment and the administration of the GGA service, a set of Python tools has been developed (http://gitlab. sb-roscoff.fr/abims/e-infra/gga_load_data) allowing mass deployment of Docker containers and automated data loading through Galaxy with the Bioblend API (Sloggett et al 2013).
The documentation website for navigating the platform web portal and resources (https://abims-sbr.gitlab.io/ rhodoexplorer/doc/) is published from a GitLab repository, with Pages and MkDocs, a static site generator.
The entire informatic infrastructure is deployed and maintained on the ABiMS Bioinformatics platform of the Roscoff Biological Station, part of the national infrastructure French Bioinformatic Institute.

Supplementary material
Supplementary data are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

Acknowledgments
This project was supported by start-up funds from the College of Arts and Sciences at the University of Alabama at Birmingham to S.A.K.-H.; ANID NCN2021-033 and FONDECYT 1221456 and 1221477 to M.L.G., J.B., and S.F.; the International Research Networks DEBMA "Diversity, Evolution and Biotechnology of Marine Algae" (CNRS GDRI 0803) and DABMA "Diversity, Adaptation, and Biotechnology of Marine Algae" (CNRS IRN 00022); the ERC (grant number 864038 to S.M.C.); and the ANR project IDEALG (ANR-10-BTBR-04, "Investissements d'Avenir, Biotechnologies-Bioressources"). We are grateful to the Roscoff Bioinformatics platform ABiMS (http://abims. sb-roscoff.fr), part of the Institut Français de Bioinformatique (ANR-11-INBS-0013) and BioGenouest network, and the Max Planck Institute for Biology Tubingen for providing computational resources. We also wish to thank Kristy Hill-Spanik, Rosário Petti, and Vivian Viana for field and technical support.

Data Availability
Sequencing data have been deposited in the SRA database under BioProjects PRJNA936482, PRJNA931233, PRJNA938301, and PRJNA938403. The accession numbers for the raw sequence data are provided in supplementary table S2, Supplementary Material online.
Gracilaria chilensis, G. gracilis, and G. caudata Whole Genome Shotgun project have been deposited at DDBJ/ ENA/GenBank under the accessions JARGXX000000000, JARGSG000000000, and JASCIV000000000, respectively. Gracilaria vermiculophylla updated assembly has been deposited under JAHNZQ000000000.