First Complete Genome of the Thermophilic Polyhydroxyalkanoates-Producing Bacterium Schlegelella thermodepolymerans DSM 15344

Abstract Schlegelella thermodepolymerans is a moderately thermophilic bacterium capable of producing polyhydroxyalkanoates—biodegradable polymers representing an alternative to conventional plastics. Here, we present the first complete genome of the type strain S. thermodepolymerans DSM 15344 that was assembled by hybrid approach using both long (Oxford Nanopore) and short (Illumina) reads. The genome consists of a single 3,858,501-bp-long circular chromosome with GC content of 70.3%. Genome annotation identified 3,650 genes in total, whereas 3,598 open reading frames belonged to protein-coding genes. Functional annotation of the genome and division of genes into clusters of orthologous groups revealed a relatively high number of 1,013 genes with unknown function or unknown clusters of orthologous groups, which reflects the fact that only a little is known about thermophilic polyhydroxyalkanoates-producing bacteria on a genome level. On the other hand, 270 genes involved in energy conversion and production were detected. This group covers genes involved in catabolic processes, which suggests capability of S. thermodepolymerans DSM 15344 to utilize and biotechnologically convert various substrates such as lignocellulose-based saccharides, glycerol, or lipids. Based on the knowledge of its genome, it can be stated that S. thermodepolymerans DSM 15344 is a very interesting, metabolically versatile bacterium with great biotechnological potential.


Introduction
Polyhydroxyalkanoates (PHA) are polyesters of hydroxyalkanoic acids. As the PHA are produced naturally by microbial fermentation, they can be regarded as an environmental friendly alternative to petroleum-based polymers (Muhammadi et al. 2015;Sabapathy et al. 2020). Although some facts regarding PHA fermentation are known, for example, microorganisms use PHA to store unused energy and carbon into cytoplasm in a form of intracellular granules and these granules help the organism to cope with stressors (Obruca et al. 2018), additional basic knowledge is needed to establish viable industrial processes. Although production of bioplastics is considered to be the future way and inseparable part of circular economy, less than 1% of the total plastic production comes from bioplastics industry (Shogren et al. 2019).
The type strain S. thermodepolymerans DSM 15344 is a thermophilic, Gram-negative bacterium that was originally investigated for its ability to degrade extracellular PHA materials such as copolymers of 3-hydroxybutyrate and 3-mercaptopropionate (Elbanna et al. 2003). So far, two draft genome assemblies of the strain were published. The assembly available under the GenBank accession number GCA_002933415.1 submitted by Zhejiang Academy of Agricultural Sciences consists of 48 contigs with N50 length of 174,537 bp and the assembly GCA_003349825.1 by DOE Joint Genome Institute contains 28 scaffolds with N50 length of 324,832 bp. Although these represent relatively high-quality draft assemblies, probably due to missing high-quality complete genome assembly and functional annotation of the genome, other important features of the strain remained hidden. Only recently, its ability to produce PHA was reported together with the unique capability of xylose utilization (Kourilova et al. 2020). Optimal growth temperature of S. thermodepolymerans DSM 15344, 55 C, reduces the risk of microbial contamination; therefore, the strain presents an ideal organism for utilization in the "Next Generation Industrial Biotechnology" concept in which biotechnological process is conducted under unsterile conditions (Chen and Jiang 2018). In this article, we present its first high-quality complete genome sequence, which is currently a reference sequence for S. thermodepolymerans species in GenBank database. We annotated the genome, predicted the operon structure, and searched for prophage DNA and CRISPR arrays.
Genomic DNA was extracted using MagAttract HMW DNA Kit (Qiagene, NL). The DNA purity was checked using NanoDrop (Thermo Fisher Scientific), the concentration was measured using Qubit 2.0 Fluorometr (Thermo Fisher Scientific), and the proper length of the isolated DNA was confirmed using Agilent 4200 TapeStation (Agilent technologies). The sequencing library for Oxford Nanopore sequencing was prepared using Ligation sequencing 1D Kit (Oxford Nanopore Technologies, UK). The sequencing was performed using the R9.4.1 flowcell and the MinION platform (Oxford Nanopore Technologies). The sequencing library for shortread sequencing was prepared using KAPA HyperPlus kit and was carried out using Miseq Reagent Kit v2 (500 cycles) and Illumina MiSeq platform (Illumina).

Genome Annotation and Analysis
Genome annotation was done through the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) (Tatusova et al. 2016).
An operon prediction was performed using Operon-mapper (Taboada et al. 2018), and the results were manually inserted into the genome record. Functional annotation of the proteincoding genes was performed by classifying them into clusters of orthologous groups (COG) from the eggNOG database using the eggNOG-mapper (Huerta-Cepas et al. 2019). Chromosomal map of the circular genome was subsequently produced with the Artemis (Rutherford et al. 2000)-integrated DNAPlotter (Carver et al. 2009). Prophage DNA was searched using Prophage Hunter (Song et al. 2019) and PHASTER (Arndt et al. 2016). Finally, the annotated genome sequence was further analyzed for presence of CRISPR loci using CRISPRDetect tool (Biswas et al. 2016).

Genome Assembly and Properties
Schlegelella thermodepolymerans DSM 15344 initial genome assembly was reconstructed from nearly 1.8 million Oxford Nanopore Technologies reads with a median read length of 4.9 kb and finalized by mapping more than 2.4 million highquality (average Phred score Q % 35) Illumina read pairs (88% of all Illumina reads) to the initial assembly. Whole process resulted into the final assembly consisting of one circular chromosome with coverage exceeding 5,500Â. The genome has been deposited at the DDBJ/EMBL/GenBank under accession number CP064338.1.
The genome length is 3,858,501 bp and contains 3,650 genes in total, divided into 1,729 operons. Most of the genes are protein-coding sequences (CDSs), but 33 pseudogenes were also found, which is less than 44 and 50 pseudogenes detected in previously published draft genome sequences PSNY00000000.1 and QQAP00000000.1, respectively. The GC content reached the value of 70.28% which is more than the average for Gram-negative bacteria (Li and Du 2014). However, it met our expectations, as it corresponded to the value 70.3% of the previously published draft genomes. High GC content can be associated with the adaptation of the bacterium to high-temperature environments. Although only single copies of rRNA genes were detected in draft genomes of S. thermodepolymerans DSM 15344, the complete genome sequence contains 5S, 16S, and 23S rRNA genes in duplicates. Moreover, copies of 16S and 23S rRNA genes differ in three and one positions, respectively. Such information is useful for future identification of S. thermodepolymerans in metagenomics studies and quantification of its abundance in microbial studies based on amplicon sequencing. The overall sequence features are summarized in table 1.

Functional Annotation
The protein-coding genes were classified according to COG into 22 categories. In total, 2,576 CDSs were assigned a COG category with the most prevalent groups E-amino acid metabolism and transport containing 7.80% of the total number of CDS (280 out of 3,589) and C-energy production and conversion containing 7.52% of the total number of CDS (270 out of 3,589). This suggests that S. thermodepolymerans has a functional apparatus capable of utilizing a wide range of substrates as reported recently (Kourilova et al. 2020). Unfortunately, 9.33% (335 genes) were not assigned any COG and 18.89% (678 genes) were assigned group S with an unknown function. In fact, such a result was expected as only a little is known about genomes of thermophilic bacteria capable of PHA synthesis so far. (For details of each group, including the number of assigned genes assigned see supplementary table S1, Supplementary Material online.) The position of individual features in the circular genome is shown in figure 1. Each COG is marked with a different color. Moreover, RNAs are divided into tRNA, rRNA, and ncRNA categories and displayed in the fourth outermost circle.
Searching for viral DNA resulted only in inconclusively identified prophages. Although Prophage Hunter identified five putative prophages, PHASTER results consisted of a single incomplete prophage that overlapped with one candidate indentified by Prophage Hunter. None of these phages was identified as active. This is according to our expectations as phages are viruses for which temperature is a crucial factor for survivability (Nasser and Oman 1999). Optimal temperature for growth of the strain (55 C) is too high for most phages (Farrell and Campbell 1969). Although a group of thermophilic phages also exists, they usually occur in specific environment (Jo nczyk et al. 2011) and were not identified in the S. thermodepolymerans DSM 15344 genome. Only a single 164-bp-long CRISPR array containing two spacer units was found in the S. thermodepolymerans DSM 15344 genome. Unfortunately, no cas or cas-like genes were found in its neighborhood. Nevertheless, this does not prevent the CRISPR-Cas9 being utilized for S. thermodepolymerans DSM 15344 genome editing as a foreign system could be used.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.