Genome features of a novel hydrocarbonoclastic Chryseobacterium oranimense strain and its comparison to bacterial oil-degraders and to other C. oranimense strains

Abstract For the first time, we report the whole genome sequence of a hydrocarbonoclastic Chryseobacterium oranimense strain isolated from Trinidad and Tobago (COTT) and its genes involved in the biotransformation of hydrocarbons and xenobiotics through functional annotation. The assembly consisted of 11 contigs with 2,794 predicted protein-coding genes which included a diverse group of gene families involved in aliphatic and polycyclic hydrocarbon degradation. Comparative genomic analyses with 18 crude-oil degrading bacteria in addition to two C. oranimense strains not associated with oil were carried out. The data revealed important differences in terms of annotated genes involved in the hydrocarbon degradation process that may explain the molecular mechanisms of hydrocarbon and xenobiotic biotransformation. Notably, many gene families were expanded to explain COTT’s competitive ability to manage habitat-specific stressors. Gene-based evidence of the metabolic potential of COTT supports the application of indigenous microbes for the remediation of polluted terrestrial environments and provides a genomic resource for improving our understanding of how to optimize these characteristics for more effective bioremediation.


Introduction
The Chryseobacterium genus consists of gram-negative bacteria assigned to the family Weeksellaceae, phylum Bacteroidetes, 1 of which there are 160 species and two subspecies (http://www.bacterio.net/).Member species of this genus are adapted to survive in a range of ecosystems and three Chryseobacterium species have been reported as survivalists in oil-polluted soil, C. hungaricum, 2,3 C. indoltheticum 4 and C. nepalense. 5ecently and for the first time, the isolation and identification of a crude oil-degrading C. oranimense strain COTT from chronically polluted soil in Trinidad was reported. 6The novelty of the Trinidad C. oranimense strain COTT, compared to other C. oranimense strains, lies in the long-term exposure and evolution of enhanced capabilities for biotransformation of petroleum in an extreme terrestrial environment; one that has been chronically polluted with crude oil as a result of natural oil seeps that have been in existence for more than 100 years. 7Given the short generation times of some microbes that allow for up to tens of generations of evolution daily, indigenous microbes can exhibit selective enrichment and undergo genetic modifications, which enable higher degradation rates. 8Additionally, artificially introduced microbes have difficulty competing with the pre-existing/ indigenous microbial community that have evolved to survive in a particular habitat. 9hile it is expected that whole genome sequencing of different Chryseobacterium strains will reveal both nichespecific and convergent genome features, the COTT strain is hydrocarbonoclastic and it is not known what genomic traits support specific strategies employed in the metabolism and catabolism of hydrocarbons, as well as other xenobiotics. 9dditionally, to what degree these unique features are required to sustain long-term survival in naturally occurring oil seeps is also unknown.
The main objective of this study was to characterize the genome of the C. oranimense hydrocarbonoclastic strain, COTT.We sequenced, assembled, annotated and characterized the genome of COTT.An inventory of unique genes of the COTT genome compared to the high-quality genomes of 18 other crude oil-degrading bacterial species, and two other strains of C. oranimense was also generated.The findings of this study extend the current knowledge about the genome sequence diversity of the Chryseobacterium genus as well as intra-specific genome variation of C. oranimense.Genome analysis gave new insights into this strain's capacity to degrade petrogenic and xenobiotic pollutants, in addition to some of its unique mechanisms for coping with different

Genome assembly and annotation
Reads were assembled to contigs in Shovill (Galaxy v 1.1.0+galaxy0)using the Spades assembler.The obtained contigs were validated using QUAST (Galaxy v 5.0.2+galaxy3).Annotation was initially done using BV-BRC Annotations using Subsystems Technology tool kit (RASTtk) 13 and genome data submitted to NCBI.The Integrated Microbial Genomes Expert Review (IMG/ER) (https://img.jgi.doe.gov/cgi-bin/mer/main.cgi)server of the Joint Genome Institute (JGI) was used for expert annotation of the assembled genome and was the primary source for genome predictions and functional analysis.This server is used for functional annotation and curation of microbial genomes of interest prior to release to GenBank. 14The detailed annotation system used by IMG/ER can be viewed in Additional file 2: Supplementary Table S3.Proksee (https://proksee.ca/) was used to generate the circular map of the COTT genome.

Genome analysis-COG and KEGG analysis and comparison
The cluster of orthologous groups (COG) and The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in the COTT genome were annotated and analysed, using the abundance profile search tool and the statistical analysis tool on JGI IMG/ER.COG and KEGG CDSs of the COTT genome were compared to the three genomes available for C. oranimense: DSM 19055, G311 287168971/referred to as G311_A in text and G311 2602041528/referred to as G311_B in text (Additional file 2: Supplementary Table S4).Venn diagrams comparing the unique gene inventories in the C. oranimense genomes were done using OmicsBox (https:// www.biobam.com/omicsbox).Fisher's exact t-test was used to evaluate statistically significant differences in gene abundance in each COG and KEGG category between the COTT strain and other reference genomes deposited in the IMG bacteria genome database.The COTT genome was compared with 18 crude oil degrading specific bacterial genomes which were obtained via the Genomes by Ecosystem search tool on JGI (Additional file 2: Supplementary Table S5).

Antibiotic susceptibility
Antibiotic susceptibility testing (AST) was performed using HiMedia Antimicrobial Sensitivity Test using Single Test Discs according to the manufacture's guidelines.Zone size was interpreted as per Clinical & Laboratory Standards Institute (CLSI) (https://clsi.org/) and European Committee on Antimicrobial Susceptibility Testing (EUCAST) (http:// www.eucast.org).The experiments were performed in duplicate.Subsequently, antimicrobial resistance genes were identified in JGI.antiSMASH was used for rapid genome-wide identification, annotation and analysis of secondary metabolite biosynthesis gene clusters. 15

Phylogenetic analysis
An axenic COTT strain that was able to grow on 3% crude oil amended media in a previous study was used in this WGS analysis (Fig. 1).The 16S rRNA phylogenetic tree was constructed to show the phylogenetic position of the hydrocarbonoclastic COTT strain and to authenticate the COTT genome assembly (Additional file 1: Supplementary Fig. S1).16S rRNA sequence comparison confirmed that COTT was most similar to C. oranimense type strain H8 (= LMG 24030 = DSM 19055; NR_044168; 100%/99.47%QC/ID) 16 and C. oranimense strain FSB_HA2 (MG322209; 100%/99.05%QC/ID). 17he protein sequences of 1,000 housekeeping genes C. oranimense and other species were aligned using the bacterial phylogenetic tree building service in BV-BRC.The tree showed multiple highly supported clades (Fig. 2).Although the COTT strain had the closest similarity to G311 and DSM 19055, COTT was positioned separately within this C. oranimense clade; there seemed to be cladogenetic splitting into three distinct branches which may have arisen due to unique morphotype and/or microhabitat relationship within a given community structure. 18Cladogenesis as seen here, may indicate some evolutionary advantage to utilize new resources which perhaps, also implies rapid diversification of ecotypes as a trade-off for improved competitive ability in new ecological niches.

Genome project history
A summary of the project and information about the genome sequence is shown in Additional file 2: Supplementary Table S6.A complete report on sequencing data quality control, mapping statistics and SNP, InDel, SV and CNV detection and annotation can be viewed in Additional file 3: Notes and Analysis S1.

Genome features
The genome annotation from JGI IMG/ER was used as the final and primary source for genome predictions and comparisons.As such, detailed data of the annotation data from JGI is presented in the main text and BV-BRC annotation data  and analysis can be viewed in Additional file 3: Notes and Analysis S2.
The genome features are summarized in Table 1 and Fig. 3.The COTT genome has 11 contigs, a total length of 4,265,502 base pairs (bp), G+C content of 37.91% and a total of 4,040 genes.Biological roles were assigned to 2,794 of the 3,959 predicted protein coding genes (CDS), whereas 1,165 proteins were without a predicted function.2,690 have CDS with clusters of orthologous groups (COGs) and 936 CDS are connected to KEGG pathways.No plasmids were detected in this strain, similar to the G311 and DSM 19055. 19ll CDS can be viewed in Additional file 2: Supplementary Table S7.The list of annotated genes and their protein products as assigned to COGs and KEGG Orthology (KO) are presented in Additional file 2: Supplementary Tables S8 and  S9, respectively.

Comparison of COTT to 18 crude oil-degrading bacteria
Comparative analysis was performed to determine the difference in gene abundance in the COTT genome and 18 crude oildegrading bacteria for which high-quality crude oil-degrading bacterial genomes were available on JGI (Additional file 2: Supplementary Table S10 a-c; abundance of 0 means that a specific gene according to COG ID and COG Name was absent in the species and any number greater than 0 reflects the gene count for that specific gene).Among the COG categories, the COTT genome has a significantly higher mean abundance (Fisher's Exact t-test; P < 0.001; P < 0.05 Additional file 2: Supplementary Table S11) pertaining to categories such as cell wall/membrane/envelope biogenesis, post-translational modification, protein turnover, chaperones, replication, recombination and repair, and translation, etc. compared to the other 18 genomes.Other COG categories, e.g.responsible for amino acid metabolism, energy production and conversion and lipid metabolism, etc., were significantly lower than the mean average levels in COTT (Fisher's Exact t-test; P < 0.001; Additional file 2: Supplementary Table S11).
We then examined gene abundance and found that the COTT genome has 24 characterized CDS in COG, in addition to 16 'uncharacterized' proteins, that were absent from the other 18 genomes (Table 2 and Additional file 2: Supplementary Table S12).There are also 21 proteins present in multiple copies (≥5 copies) in the COTT genome that were present in single or less than 5 copies in the other 18 genomes (Additional file 2: Supplementary Table S13).Statistically (Fisher's Exact t-test; P < 0.001; P < 0.05), comparative analysis of COG genes between COTT and all 18 genomes showed that there were 28 genes that were significantly different, where 25 were significantly higher in mean abundance in COTT and 3 were significantly lower in COTT compared to the 18 genomes (Additional file 2: Supplementary Table S14).
The COTT genome has a significantly higher (Fisher's Exact t-test; P < 0.001; P < 0.05) mean abundance of genes mapped by KEGG (Additional file 2: Supplementary Table S15); notably, proteins of higher abundance were related to multi-drug resistance 20 .Statistically (Fisher's Exact t-test; P < 0.001; P < 0.05), comparative analysis of KO genes between COTT and all 18 genomes showed that there were 22 KO genes that were significantly different, all of which were significantly higher in mean abundance in COTT compared to the 18 genomes (Additional file 2: Supplementary Table S16); notably, proteins of higher abundance were related to multidrug resistance. 20

Comparison of COTT to other C. oranimense genomes
The average nucleotide identity (ANI) score was calculated to be 98.25636% (ANI calculator on JGI IMG/ER) for each of the COTT-G311 and COTT-DSM19055 and for G311-DSM19055 pairwise comparisons. 21

COG comparisons
Analysis of gene abundance in the COG categories between COTT and two C. oranimense genomes, DSM 19055 and two assemblies of G311, G311_A (G311 2871689712) and G311_B (G311 2602041528), is shown in Fig. 4(a)-(c) and in Additional file 2: Supplementary Table S17.A comparison of enzymes of industrial interest predicted for COTT is shown in Fig 4(d).Investigation of the COGs revealed genes that were only found in the COTT genome (Table 3 and Additional file 2: Supplementary Table S18).Fisher's Exact t-test did not reveal any significant differences (P < 0.05) in gene abundance according to COG category (Additional file 2: Supplementary Table S19).However, there were higher abundances for genes in COTT compared to the other C. oranimense genomes in 9 out of 24 COG categories (Additional file 2: Supplementary Table S19).There were significant differences (P < 0.05) in abundance of COG0454 N-acetyltransferase, GNAT superfamily (includes histone acetyltransferase HPA2), COG0477 MFS family permease and COG0457 tetratricopeptide (TPR) repeat (Additional file 2: Supplementary Table S20).The reason may lie in COTT's need for metabolic self-regulation and survival in soil chronically contaminated with crude oil. 22

KEGG comparisons
Genes associated with functional KEGG categories were also examined (Fig. 4(b) and Additional file 2: Supplementary Table S21) and it was found that differences in gene abundance according to KEGG category and KO function were not significant (Additional file 2: Supplementary Table S22 and S23, respectively).

Genes only found in the COTT genome
Bacteria involved in xenobiotic degradation have robust biocatalytic systems that allow them to function as microbial factories. 23The ability of COTT to survive in its environment was predicted based on its unique gene inventories (Fig. 5, Table 3 and Additional file 2: Supplementary Table S18).Genes found in COTT but not in the other Chryseobacterium genomes according to the COG category are general function prediction only-1; replication, recombination and repair-6; post-translational modification, protein turnover, chaperones-1; inorganic ion transport and metabolism-3; carbohydrate transport and metabolism-3; coenzyme transport and metabolism-5; cell wall/membrane/envelope biogenesis-2; amino acid transport and metabolism-2; nucleotide transport and metabolism-2; mobilome: prophages, transposons-2; function unknown-8 (Additional file 3: Notes and Analysis S3-Table N12 contains a description of each of the above functions).These gene-based predictions suggest specific areas for functional validation.

Xenobiotic and petrogenic hydrocarbon degradation in COTT
The COTT genome specifies 34 genes encoding proteins associated with xenobiotic biodegradation and metabolism (Additional file 2: Supplementary Table S24) including those associated with benzoate degradation, chloroalkane and chloroalkene degradation, chlorocyclohexane and chlorobenzene degradation, ethylbenzene degradation, metabolism of xenobiotics by cytochrome P450, naphthalene degradation, nitrotoluene degradation, steroid degradation and styrene degradation.Genes encoding enzymes associated with non-specific alkane metabolism were also detected in the COTT genome (Additional file 2: Supplementary Table S25).
Bacteria are known to engage in ring cleavage in the oxidative bio-transformation of crude oil. 24The COTT genome encodes numerous proteins involved in the degradation of aromatic compounds: in the phenylpropionate degradation pathway, 2-hydroxy-6-oxonona-2,4-dienedioate hydrolase (mhpC) similar to the petroleum oil-degrader Pseudomonas aeruginosa 25 ; alcohol dehydrogenase (adh) and propanolpreferring alcohol dehydrogenase (adhP) that act upon aliphatic alcohols, xenobiotic aromatic and aliphatic hydroxyls via similar pathways 26 ; gluconolactonase (gnl, RGN) that is involved in the degradation of cyclohexanols which are moderately resistant to biodegradation. 272-Haloacid dehalogenase was found to be encoded in the COTT genome, which suggests that this strain can potentially degrade and/or detoxify recalcitrant halogenated xenobiotic pollutants. 28,29ovel sources of haloacid dehalogenase may offer biotechnological applications ranging from waste treatment to synthesis of stereoisomers 30 and bioremediation. 31Additional file 2: Supplementary Table S24 contains this data.
Members of the NAD(P)H:FMN oxidoreductase family are thought to participate in oxidative stress responses and in the degradation of xenobiotic compounds including a wide array of nitroalkanes and nitroaromatics. 32Evidence of 2,4,6-trinitrotoluene (TNT) degradation via an anaerobic pathway has been revealed for the COTT strain (http:// eawag-bbd.ethz.ch/tnt2/tnt2_map.html).It is possible that COTT adopted this anaerobic mechanism as its genome possesses genes encoding Fe-S oxidoreductase, sulphite reductase, alpha subunit (flavoprotein) and sulphite reductase, beta subunit (haemoprotein), carbon monoxide dehydrogenase and two genes encoding dienelactone hydrolase for anaerobic TNT degradation.Additional file 2: Supplementary Tables S8 and S24 contain this data.

Mono-and di-oxygenases for aromatic degradation
The degradation of aromatic hydrocarbons requires ring activation which is catalyzed by monooxygenases or dioxygenases/ dehydrogenases. 33Genes encoding nitronate monooxygenase (ncd2, npd) were detected in COTT.Detoxification of nitroalkane compounds that are used in industry is of considerable interest. 34Another example is cholesterol oxidase which is involved in styrene degradation, and is encoded in the COTT genome.Cholesterol oxidase has an application as biocatalysts in industry. 35he COTT genome encodes catechol 2,3-dioxygenase (C23O), which catalyzes the meta (extradiol) ring cleavage of catechol and its alkyl derivatives under aerobic conditions.C23O genes are essential for the degradation of monoaromatic hydrocarbons e.g., BTEX and phenol [36][37][38] as well as the degradation of PAHs e.g., naphthalene, phenanthrene 39 and pyrene. 40,41Six quercetin 2,3-dioxygenase genes/redox-sensitive bicupin YhaK belonging to the pirin family were detected in COTT and these genes have been implicated as associated with hydrocarbon-pollutant degradation. 42Additional file 2: Supplementary Tables S8 and S24 contain this data.
In addition, oxygenase-type enzymes required for pollutant detoxification, e.g.catalases (katG and katE) and dye-decolourizing peroxidase (DyP-type peroxidases) were identified in the COTT genome.Porphyrinogen peroxidase (yfeX) and chloroperoxidase (cpo), which are of industrial interest as biological decolourizers of synthetic dyes were also encoded.Additional file 2: Supplementary Tables S7, S8 and S24 contain this data.Protein-there were 16 uncharacterized proteins detected for COTT that were absent in the other 18 hydrocarbonoclastic bacteria strains.COG data for these proteins is shown in Additional file 2, Supplementary Table S12.

Propanoate metabolism
Propionate, a short chain fatty acid (SCFA), is a major component of subsurface petroleum reservoirs. 43Propanoate metabolism was found to be stimulated by the presence of crude oil. 44SCFAs produced by hydrocarbonoclastic microbes are involved in biodegradation of petroleum and has been shown to be crucial for petroleum recovery and energy extraction. 43,45,46In COTT, the succinate pathway appears to be the dominant pathway for propionate production.Additional file 2: Supplementary Table S24 contains this data.

Lipid metabolism
The COTT genome has 72 lipid metabolism genes (Additional file 2: Supplementary Table S24).Lipid metabolism plays an integral role in glycolipid biosurfactant production and in the formation of the lipopolysaccharide and phospholipid scaffolds of the outer membrane of Gram-negative bacteria. 47otably, the esterase and lipase superfamily are economically important enzymes to a range of industries. 48,49The COTT genome contains two genes encoding esterase/lipase (COG4782) which were not found in the DSM 19055 strain and only one copy was found in G311.Additional file 2: Supplementary Table S24 contains this data.

Chaperones in protein folding
Chaperones, also known as heat shock proteins, assist in the correct three-dimensional folding of proteins, acting to stabilize or protect disassembled polypeptides, for normal cell growth; they are stress-induced under conditions of high temperatures and in the presence of pollutants and heavy metals. 50As expected, the COTT genome harbours a comparatively higher number of genes coding for chaperones, e.g.chaperone modulatory protein CbpM, heat shock proteins GroEL, GroES, HtpG/Hsp90, HSP20, and DnaK, DnaJ, GrpE.
Additional file 2: Supplementary Table S24 contains this data.

Quorum sensing and biofilm formation
Quorum sensing (QS) enables communication and exchange of chemical signals in dense populations of bacteria especially in heavily polluted soil 51,52 and helps to regulate antibiotic production. 53QS output with respect to biofilm formation is advantageous as it provides greater resistance to stress, antimicrobial agents, predation and toxic chemicals. 54The removal of xenobiotic compounds, including crude oil and PAHs, by biofilm-forming strains has been reported 55 ; biofilms of Burkholderia sp.NK8 and P. aeruginosa PA01 have been reported to be involved in the degradation of chlorinated benzoates; biofilm of P. stutzeri T102 is involved in the emulsification of naphthalene 54 phenanthrene, and pyrene in activated sludge. 56wenty-six genes encoding enzymes that function in QS were detected in the COTT genome.In discriminating the lipid environment during QS to facilitate optimal binding to fulfil specific functions and target certain cell types, COTT has thiol-activated cytolysin (slo). 57In the family of twocomponent systems (TCSs), extracellular polysaccharide/ exopolysaccharides EspA/wza, EspP and EspB/etk-/wzc, were identified in the COTT genome.These gene products are required for bacterial attachment to solid surfaces, to other bacteria and in biofilm formation. 58Thirty-two related biofilm formation genes were detected in the COTT including exopolysaccharide (EPS) genes that are known to provide the three-dimensional structure of the biofilm 59 (Additional file 2: Supplementary Table S18 and S24).Cells use EPS to navigate into the more nutrient and oxygen-rich regions at the air-liquid interface. 60  In addition to biofilm formation, genes in the COTT genome such as anthranilate synthase component 1 (trpE) and component 2 (trpG) can also serve as novel drug targets, offer antibiotic tolerance and virulence. 61,62Genes encoding multidrug efflux system outer membrane protein (oprM) were detected in the QS pathway for COTT.Resistance-nodulation-division efflux pumps have versatile physiological functions, e.g.multidrug resistance, microbial environmental adaptability, pathogenesis and organic solvent tolerance. 63,64In the PAH-degrading strain P. putida B6-2, efflux pumps were critical for releasing the toxicity caused by intermediates of PAH degradation. 65Although the functions of multidrug resistance efflux pumps (MDREPs) in a clinical sense is well-understood, their roles in the degradation of PAHs require additional characterization. 66Additional file 2: Supplementary Table S24 contains this data.

Membrane transport
Nutrient uptake and elimination of toxic by-products via transporters, e.g., ABC transporters, are essential to microbial survival in polluted environments. 67The genome of COTT has a total of 27 genes that encode various ABC transporters (Additional file 2: Supplementary Table S24) in addition to ABC subfamily B multi-drug efflux pump (mdlA, mdlB), which plays a role in lipid A and possibly glycerophospholipid transport. 68

Secretion systems
Bacterial secretion systems facilitate protein export 69 attachment to eukaryotic cells, and scavenging for energy resources. 70Among the genes responsible for type I, II, V and VI secretion systems present in the COTT genome (Additional file 2: Supplementary Table S24), type VI secretory protein (vgrG), is implicated in (i) bacterial interactions and community structure in a range of environmental niches, (ii) in the ability of many pathogenic bacteria to outcompete rival bacteria, and (iii) in the direct interaction of symbionts with their eukaryotic hosts. 71

Regulation patterns
Bacteria can sense and respond to environmental stimuli which is important to survival particularly in highly dynamic, competitive niches.There are several recognized classes of sensory signal transduction systems in prokaryotes. 72Based on KEGG analysis, 91 CDSs in the COTT genome were classified into signal transduction pathways and of these, 66 were classified into TCSs which consist of two types of signal transducers, a sensor kinase and its cognate response regulator (Additional file 2: Table S24).The COTT genome contains 18 genes that encode proteins of the OmpR family for TCS responding to phosphate limitation and assimilation, cell envelope protein folding and protein degradation, serine protease, copper/silver COG name/protein-there were 57 uncharacterized proteins detected for COTT that were absent in the other C. oranimense strains.COG data for these proteins is shown in Additional file 2: Supplementary Table S18.
Table 3. Continued efflux, potassium transport, DNA replication, iron acquisition, lantbiotics biosynthesis and immunity and adhesion, autolysis, multidrug resistance and virulence genes.Phosphorous plays a major role in bacterial biological activity in oil reservoirs. 74hoP and PhoR phosphate regulon response and sensor, respectively, were detected in COTT's genome.
Response regulator YesN is also present in the COTT genome.YesN proteins belong to the AraC/XylS family of transcriptional regulators that control the expression of genes with diverse biological functions. 75The apparent redundancy of regulatory genes may indicate a flexible mechanism associated with nitrogen fixation and assimilation. 76adC, DNA repair protein, is commonly associated with microorganisms that inhabit harsh environments. 77The radC gene was more abundant in the COTT genome than in the other C. oranimense genomes.The atoB gene, which is present in COTT, encodes acetyl-CoA C-acetyltransferase that is involved in short-chain fatty acid metabolism and induces modulation of chemotactic behaviour, as well as other yet undefined responses. 78Also detected in this family were C4-dicarboxylate transport proteins DctA (similar to DctP above) and RNA polymerase σ (sigma)-54 factor RpoN. 79 COTT has four gloA genes, which code for a type I glyoxalase, which may serve as an oxygen sensor and regulator in transcription of the Fe-Mo transport operon for nitrogen fixation, 80 as well as three type II glyoxalases (gloB) which may be important to minimize the effects of environmental stressors on protein stability or enzymatic activity in bacteria. 81bb 3 oxidases were found in COTT.cbb 3 oxidases possess a high affinity for oxygen and are encoded by the tetracistronic ccoNOQP-1 and ccoNOQP-2 operons. 82,83The enzymatic and transcriptional characteristics of cbb 3 oxidases assist bacteria (e.g.P. aeruginosa) to grow in low-oxygen environments. 84,85ytochrome bd quinol oxidase was present in COTT and was reported to become active under hypoxic conditions 86,87 and under high H 2 S concentration. 8812.Nutrient uptake (Fe, S, P, N)

Iron
Most prokaryotes regulate transcription of metal ionresponsive genes to coordinate and regulate metal homeostasis, e.g.ferric uptake regulator, Fur, responds to changes in iron availability. 88The fur/zur gene is present in the COTT genome, and studies suggest that Fur downregulates the production of iron sequesters, e.g., siderophore biosynthesis. 88The ferric enterobactin receptors (fepA, pfeA, iroN, pirA) are also present in the COTT genome.Additionally, COTT has several genes essential to iron uptake into the cytoplasm, including energy transduction genes exbB, exbD, tonB, tolQ/exbB and the tonB-dependant receptor (TBDR). 89dditional file 2: Supplementary Table S24 contains this gene inventory.

Sulphur
The COTT genome has 15 genes encoding enzymes involved in sulphur utilization and one sulphate permease encoding gene of the SulP family. 90Less attention has been paid to the mechanisms for sulphur (sulphate) uptake from the environment and how it is subsequently assimilated into organic compounds such as cysteine. 91There is no data on the mechanism of sulphate uptake in crude oil environments where high concentrations of sulphate exist, as would be the case for the oilpolluted habitat of the COTT strain.The cysH,D,N,E,K,G,J,I genes are implicated in the assimilation of sulphur into cysteine in the COTT genome.Additional file 2: Supplementary Table S24 contains this gene inventory.

Phosphorus
phoA,B,H,L,R,B1,P phosphate response regulators belonging to the OmpR family, are present in COTT.These pho genes may play a vital role in phosphate/phosphonate transport systems by way of organophosphonate biosynthesis and catabolism.Phosphonates are catabolized by phn genes and COTT had one phosphodiesterase (phnP), one phosphonoacetate hydrolase (phnA) and two PhnB protein (phnB) genes. 92,93he genome also had phosphate transport system substratebinding protein (pstS) which encodes a periplasmic phosphate binding protein, exopolyphosphatase (ppx) which, when conditions change and phosphate becomes limiting, allows phosphate to be liberated with no expenditure of energy; ppx has been previously described to be involved in heavy metal resistance and efflux and polyphosphate kinase (ppk), which is present in COTT, is involved in polyphosphate storage. 94hosphate starvation-inducible phoH-like protein (phoH/L) was also detected in COTT.GDP-α-D-mannose is a key substrate in glycoprotein formation and is produced by the enzyme mannose-1-phosphate guanylyltransferase (manC/ cpsB).It is responsible for transferring phosphorus-containing groups and was more abundant on COTT genome than the other C. oranimense genomes.cpsB expression is increased by osmotic shock, physiological response for enhanced environmental application. 95Additional file 2: Supplementary Table S24 contains this gene inventory.

Nitrogen
In the nitrogen regulatory protein C (NtrC) family, three genes glnA, ntrY and ntrX are involved in nitrogen metabolism when nitrogen availability is low and were found in the COTT genome.Similarly, COTT glutamine synthetase (glnA), is involved in glutamate metabolism under limiting conditions. 96For symbiotic nitrogen-fixing bacteria, Azorhizobium caulinodans and Azospirillum brasilense ntrY gene expression is upregulated in response to hypoxic conditions which in turn signals regulator ntrX to increase expression of nitrogen respiration enzymes. 97This system thus functions as a redox sensor.A drag gene encoding ADP-ribosyl-[dinitrogen reductase] hydrolase was identified in COTT.draG is a key player in the regulation of nitrogenase activity. 98Glutaminase, encoded by glsA, present in COTT may be integral to supplying nitrogen required for the biosynthesis of a variety of metabolic intermediates. 99,100COTT had the highest abundance of serine protease, subtilisin family, and it is predicted that these exoproteases break down proteins present in the environment in response to low levels of available nitrogen. 101Additional file 2: Supplementary Table S24 contains this gene inventory.

Heavy metal resistance
Crude oil and heavy metal pollution occur simultaneously in soil. 102Toxic concentrations of cobalt (Co), nickel (Ni) and zinc (Zn) are detoxified by cation efflux mechanisms in bacteria. 103,104The COTT genome contains genes associated with cation antiporter cobalt-zinc-cadmium CzcCBA.Czc in C. metallidurans allowed its survival in metal-rich environments. 105Zinc and cadmium transporter, ZipB, is a ZIP zinc transporter protein, which controls the influx of zinc into the cytoplasm from outside of the cell and transport of cadmium 106,107 and it was identified in COTT.One means of relieving copper excess is through Cu 2+ export facilitated by different systems, e.g.Cus, Cop and Cut, which contribute to copper homeostasis (Cu 2+ import, export, detoxification) in bacteria. 108,109Importantly, high levels of Cu 2+ disrupt Fe-S clusters which in turn affects disulphide bond formation and correct protein folding. 110The multiprotein complex that controls copper efflux, CusCBA, was detected in COTT.Copper resistance phosphate regulon response regulator (CusR) was also detected in COTT, which under anaerobic conditions, activates transcription of the cus operon. 109COTT also possesses genes in the copper inducible cop operon which aids in regulating copper resistance and detoxification, as well as heavy metal tolerance. 111Additional file 2: Supplementary Table S24 contains this data.

Oxidative stress
Environmental pollutants can serve as oxidants that undergo biotransformation to produce free radical oxides, cations and anions. 112The COTT genome has three genes that code for Cu/Zn superoxide dismutase SOD2, and one Fe/Mn superoxide dismutase SOD1.After SOD converts radical species to H 2 O 2 , catalase neutralizes H 2 O 2 into harmless H 2 O and O 2 . 113n alkyl hydroperoxide reductase (ahpC) encoded gene was found in COTT genome; it is known to be directly involved in H 2 O 2 detoxification and it also plays an important role in biofilm formation. 114The COTT genome had three diheme cytochrome c peroxidase genes which are responsible for H 2 O 2 reduction. 115The presence of oxidative stress proteins may enable COTT to cope with shifts in sediment oxygen concentration. 116

Antibiotic resistome of COTT
There has been an ongoing, dedicated search for new antibiotics for the treatment of multidrug-resistant (MDR) infections caused by bacteria.Based on antiSMASH analysis, there were genes encoding novel antimicrobial compounds in the COTT genome (Additional file 2: Supplementary Table S26).Flexirubin and carotenoid clusters were identified in antiSMASH (Additional file 1: Supplementary Fig. S2) and are likely expressed to produce pigments to give colonies of this genus (chryseos = golden) a yellow to orange colour. 117he COTT strain was resistant to aztreonam, kanamycin, chloramphenicol, ampicillin and erythromycin but was susceptible to imipenem-EDTA, streptomycin, ciprofloxacin and trimethoprim in in vitro disc assays (Additional file 1: Supplementary Fig. S3 and Additional file 2: Supplementary Table S27).The resistome of COTT consisted of 21 β-lactam resistance genes, 13 cationic antimicrobial peptide (CAMP) resistance genes, 5 vancomycin resistance genes and 2 tetracycline resistance genes (Additional file 2: Supplementary Table S28).The COTT genome was the only C. oranimense genome containing the bla regulator protein BlaR1 (K02172), which largely controls β-lactam antibiotic resistance 118 and membrane carboxypeptidase (penicillin-binding protein) which would render COTT resistant to penicillin (Additional file 2: Supplementary Table S18 and S28).There were also three phenicol and two bacteriocin genes identified in COTT (Additional file 2: Table S28).
The COTT genome has 16 gene copies encoding the major facilitator superfamily (MFS) of permeases (Additional file 2: Table S18).Notably, among these were the DHA1 and DHA2 family of multidrug resistance proteins which confer resistance to tetracycline, bicyclomycin/ chloramphenicol; the FSR family of transporters which enables resistance to fosmidomycin, EmrB/QacA subfamily drug resistance transporter and PAT family β-lactamase induction signal transducer, AmpG, which allow ampicillin resistance (Additional file 2: Supplementary Tables S7 and  S9).A summary of the AMR and multi-drug genes annotated in this genome and corresponding mechanisms are provided in Table 4.In this context, this report may have additional clinical value.
This work predicted distinctive metabolic capabilities that are unique to COTT (Fig. 5) and analysis of the COTT resistome indicated that these findings have clinical value.Notably, we also provided gene-based evidence that COTT is capable of producing several biocatalysts that may be important to different industrial processes.annotation and bioinformatic analysis was supported by the U.S. Department of Energy Joint Genome Institute.

Figure 2 .
Figure 2. Codon tree of the COTT and 29 related Chryseobacterium species.The RaxML tree is based 1,000 CDS alignment on 949,272 nucleotide sequences and 316,424 aligned amino acids.The tree was re-drawn and edited in iTol.

Figure 3 .
Figure 3. Genome features of Chryseobacterium oranimense from Trinidad (COTT).(a) Circular chromosome map of C. oranimense TT showing the distribution of CDSs, tRNAs, rRNAs and transfer-messenger RNAs (tmRNAs) and the GC content and skew.From inner centric circle to outer circle: circle 1 represents the scale, circle 2 represents CDS on the reverse strand, circle 3 represents the GC content; circle 4 represents GC skew (+/-); circle 6 represents the CDS on the forward strand and tRNA, rRNA and tmRNA distribution; the blue-green ring (circle 5) represents the backbone of the sequence; this map was generated using Prokesee (https://proksee.ca/).(b) COG statistics by category for the COTT genome.(c) KEGG statistics by module for the COTT genome.

Figure 4 .
Figure 4. Comparative analysis of gene load according to functional categories among the Chryseobacterium oranimense genomes.(a) COG categories and (b) KEGG modules.Each bar indicates the number of genes in COTT, DSM 19055, G311_A, G311_B, respectively.(c) Venn diagram comparing gene inventories in the three C. oranimense genomes using OmicsBox (https://www.biobam.com/omicsbox).(d) Comparison of COTT genes encoding enzymes of industrial interest.

Figure 5 .
Figure 5. Features of the hydrocarbonoclastic COTT strain.Predicted catabolic pathways (text in oval with blue outline) for bio-transformation of petrogenic compounds (text in boxes with black outline) are indicated by the dashed arrows.Export or import of solutes is designated by the direction of the arrow at the extracellular and cytoplasmic regions of the transporter, respectively.Genome-based model of the cellular metabolism and catabolism including biosurfactant production, toxin export systems, GCN5-related N-acetyltransferases (GNAT) involved in acetylation reactions, e.g., xenobiotic metabolism, two-component systems for bacterial acclimation, CBS domain-containing proteins to sense cellular energy levels, heat shock proteins (Hsp and CspA) as chaperones in correct folding and refolding of proteins and in modulating responses to temperature changes, NAD(P)H-hydrate (*NAD(P)H) repair enzyme prevents hydration of NAD(P)H to NAD(P)HX which inhibits the activity of several dehydrogenases including those involved in xenobiotic metabolism, export and import systems, phosphate-selective porins for aerobic and denitrifying rates of oil degradation, periplasmic repair proteins, membrane secretion channels, siderophore production and export for iron uptake under iron-limiting conditions of crude oil-contaminated soil, antioxidants, DNA repair and uptake proteins, energy-coupling factor (ECF) a unique group of ATP-binding cassette (ABC) transporters for micronutrient uptake, MFS superfamily permeases (uniporters, symporters or antiporters), phosphotransacetylase proteins (Pta) to prevent accumulation of pyruvate/acetyl-CoA where pyruvate serves as an intermediate of central metabolism, AraC protein for relieving sugarphosphate stress and interaction with histidine kinase sensors in signal transduction in TCS, antimicrobial peptides (AMPs) to inhibit/cell kill bacterial competitors, extracellular polymeric substances (EPS) and biofilm functions.

Table 1 .
General genome features of COTT

Table 2 .
Proteins present in COTT but absent in the 18 bacterial crude oil-degrading genomes

Table 4 .
Antimicrobial and multi-drug resistance genes The k-mer-based AMR genes detection method was used, which utilizes PATRIC's curated collection of representative AMR gene sequence variants and assigns to each AMR gene functional annotation, broad mechanism of antibiotic resistance, drug class and, in some cases, a specific antibiotic it confers resistance to.Please note, that the presence of AMR-related genes (even full length) in a given genome does not directly imply an antibioticresistant phenotype.It is important to consider specific AMR mechanisms and especially the absence/presence of SNP mutations conveying resistance.MDR genes were mined from the KEGG database. 1