CHOmine: an integrated data warehouse for CHO systems biology and modeling

Gerstl, Matthias P.; Hanscho, Michael; Ruckerbauer, David E.; Zanghellini, Jürgen; Borth, Nicole

doi:10.1093/database/bax034

Abstract

The last decade has seen a surge in published genome-scale information for Chinese hamster ovary (CHO) cells, which are the main production vehicles for therapeutic proteins. While a single access point is available at www.CHOgenome.org, the primary data is distributed over several databases at different institutions. Currently research is frequently hampered by a plethora of gene names and IDs that vary between published draft genomes and databases making systems biology analyses cumbersome and elaborate. Here we present CHOmine, an integrative data warehouse connecting data from various databases and links to other ones. Furthermore, we introduce CHOmodel, a web based resource that provides access to recently published CHO cell line specific metabolic reconstructions. Both resources allow to query CHO relevant data, find interconnections between different types of data and thus provides a simple, standardized entry point to the world of CHO systems biology.

Database URL:http://www.chogenome.org

Introduction

Chinese hamster ovary (CHO) cells have been used for production of biotherapeutic proteins since 1985 (1), with the biopharmaceutical market of CHO derived products grown to > 100 billion US$ by 2013 (2). Due to the importance of this cell line for the biopharma industry a plethora of -omics data was generated during the last years. Today a sequenced CHO-K1 genome (3) and two Chinese hamster genomes (4, 5) are available. Unfortunately, these draft genomes are not consistent in the usage of gene IDs for annotation while other databases, such as UniProt use again other IDs. To overcome some of the difficulties that are associated with connecting such diverse and large data sets, special data warehouses like BioMart (6) or InterMine (7) were developed and already exist for important model organisms, like mouse (8) or fly (9). These solutions provide interfaces to search for information in a user friendly way and enable to connect different databases for a given gene, thus providing all the available information from a single entry point. Currently, the CHO community accesses relevant data via www.CHOgenome.org (10), which hosts all published information, however does not provide links between different data types. Therefore, we introduce CHOmine, an InterMine based data warehouse for CHO data that connects gene information to each other and provides links to outside websites. The resource also fully integrates a recently published consensus genome-scale metabolic reconstruction of different CHO cell lines (11).

Materials and methods

CHOmine is based on the latest stable version of InterMine (7) and runs on the latest stable Debian operating system. Data is stored in a PostgreSQL database which is directly installed from the Debian repository together with the Apache Tomcat^® webserver. The Java Development Kit was downloaded from Oracle. InterMine provides many predefined data loader, which we used for importing data from UniProt (12), InterPro (13), KEGG (14) and PubMed (15) as well as for loading sequence ontologies (16) and gene ontologies (17). To handle the unusual situation of loading three genomes for an organism, some preprocessing steps were required as well as the creation of extended or new importer classes (Figure 1). As the GFF3 files from (3) and (5) contain the same IDs for different genes, artificial IDs were assigned to all genes. Furthermore, we created a file that links genes of the three genomes to their corresponding protein of UniProt. New created importer classes enabled us to load all three genomes as well as the upload of gene to protein links from the previously created linkage file. Gene ontology information provided by Brinkrolf et al. (4) was extracted from the GFF3 file and formatted, so that the InterMine GO-annotation data loader can import it. Furthermore, we downloaded miRNAs from miRBase (18) and aligned them to the three genomes using Bowtie (19), and imported the result to CHOmine by another new importer class. The pipeline for building the current version of CHOmine can be found at https://github.com/chomine/chomine.

Figure 1.

Open in new tab Download slide

CHOmine building pipeline. Automatically downloaded files or links for every new CHOmine version. File published or created in a preprocessing step. CHOmine specific preprocessing steps. CHOmine specific data loader. InterMine data loader. Dashed arrows indicate preprocessing steps. All other arrows indicate CHOmine building steps.

Genome-scale metabolic reconstructions

To connect the consensus model for the Chinese hamster and metabolic models of different CHO cell lines (11), a data loader for reading the SBML files was added to CHOmine. In order to further improve the user experience when analyzing the metabolic reconstructions, a second webpage, called CHOmodel, was created. CHOmodel makes use of the PHP framework Laravel and a separate PostgreSQL database. Materialized views were prepared in the PostgreSQL database to allow efficient browsing through the information. As the CHOmodel webpage was developed in parallel to CHOmine, we added links from CHOmodel to CHOmine and vice versa.

Discussion and conclusion

Although the amount of data does not yet reflect the entire published dataset of CHO cells, CHOmine provides a comprehensive overview and thus a valuable resource for finding CHO relevant data. As CHOmine is based on InterMine, all of its powerful features can be used. Data can be easily searched and downloaded or different APIs can be used to access the data by scripts. CHOmine already includes different data types, like genome information, proteins, miRNAs and metabolic models, with links to many outside databases. CHOmine will be actively improved and new data types included in future versions. Raw data for older versions of CHOmine will be kept at least for two years and will be made available via the contact form of CHOmine. We are convinced that this resource will become the first point where to search for information when working with CHO cells.

Funding

Austrian BMWFW, BMVIT, SFG, Standortagentur Tirol, Government of Lower Austria and ZIT through the Austrian FFG-COMET-K2 Funding Program.

Conflict of interest. None declared.

References

1

Kaufman

R.J.

,

Wasley

L.C.

,

Spiliotes

A.J.

et al. (

1985

)

Coamplification and coexpression of human tissue-type plasminogen activator and murine dihydrofolate reductase sequences in Chinese hamster ovary cells

.

Mol. Cell. Biol

.,

5

,

1750

–

1759

.

2

Walsh

G.

(

2014

)

Biopharmaceutical benchmarks 2014

.

Nat. Biotechnol

.,

32

,

992

–

1000

. doi:10.1038/nbt.3040.

3

Xu

X.

,

Nagarajan

H.

,

Lewis

N.E.

et al. (

2011

)

The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell line

.

Nat. Biotechnol

.,

29

,

735

–

741

.

4

Brinkrolf

K.

,

Rupp

O.

,

Laux

H.

et al. (

2013

)

Chinese hamster genome sequenced from sorted chromosomes

.

Nat. Biotechnol

.,

31

,

694

–

695

5

Lewis

N.E.

,

Liu

X.

,

Li

Y.

et al. (

2013

)

Genomic landscapes of Chinese hamster ovary cell lines as revealed by the Cricetulus griseus draft genome

.

Nat. Biotechnol

.,

31

,

759

–

765

.

6

Smedley

D.

,

Haider

S.

,

Durinck

S.

et al. (

2015

)

The BioMart community portal: an innovative alternative to large, centralized data repositories

.

Nucleic Acids Res

.,

43

,

W589

–

W598

.

7

Smith

R.N.

,

Aleksic

J.

,

Butano

D.

et al. (

2012

)

InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data

.

Bioinformatics

,

28

,

3163

–

3165

.

8

Motenko

H.

,

Neuhauser

S.B.

,

O’keefe

M.

,

Richardson

J.E.

(

2015

)

MouseMine: a new data warehouse for MGI

.

Mamm. Genome

,

26

,

325

–

330

.

9

Lyne

R.

,

Smith

R.

,

Rutherford

K.

et al. (

2007

)

FlyMine: an integrated database for Drosophila and Anopheles genomics

.

Genome Biol

.,

8

,

R129

.

10

Hammond

S.

,

Kaplarevic

M.

,

Borth

N.

et al. (

2011

)

Chinese hamster genome database: an online resource for the CHO community at www.CHOgenome.org

.

Biotechnol. Bioeng

.,

109

,

1353

–

1356

.

11

Hefzi

H.

,

Ang

K.S.

,

Hanscho

M.

et al. (

2016

)

A consensus genome-scale reconstruction of Chinese hamster ovary cell metabolism

.

Cell Syst

.,

3

,

434

–

443.e8

.

12

The UniProt Consortium

. (

2014

)

UniProt: a hub for protein information

.

Nucleic Acids Res

.,

43

,

D204

–

D212

.

PubMed

OpenURL Placeholder Text

WorldCat

13

Mitchell

A.

,

Chang

H.Y.

,

Daugherty

L.

et al. (

2014

)

The InterPro protein families database: the classification resource after 15 years

.

Nucleic Acids Res

.,

43

,

D213

–

D221

.

14

Kanehisa

M.

,

Sato

Y.

,

Kawashima

M.

et al. (

2015

)

KEGG as a reference resource for gene and protein annotation

.

Nucleic Acids Res

.,

44

,

D457

–

D462

.

15

Roberts

R.J.

(

2001

)

PubMed Central: The GenBank of the published literature

.

Proc. Natl. Acad. Sci. U. S. A

.,

98

,

381

–

382

.

16

Eilbeck

K.

,

Lewis

S.E.

,

Mungall

C.J.

et al. (

2005

)

The Sequence Ontology: a tool for the unification of genome annotations

.

Genome Biol

.,

6

,

R44

.

17

Ashburner

M.

,

Ball

C.A.

,

Blake

J.A.

et al. (

2000

)

Gene Ontology: tool for the unification of biology

.

Nat. Genet

.,

25

,

25

–

29

.

18

Kozomara

A.

,

Griffiths-Jones

S.

(

2013

)

miRBase: annotating high confidence microRNAs using deep sequencing data

.

Nucleic Acids Res

.,

42

,

D68

–

D73

.

19

Langmead

B.

,

Trapnell

C.

,

Pop

M.

,

Salzberg

S.L.

(

2009

)

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

.

Genome Biol

.,

10

,

R25.

Author notes

*

Corresponding author: Tel: +43 1 47654 79064, Fax: +43 1 47654 79009, Email: nicole.borth@boku.ac.at

Citation details: Gerstl,M.P., Hanscho,M., Ruckerbauer,D.E. et al. CHOmine: an integrated data warehouse for CHO systems biology and modeling. Database (2017) Vol. 2017: article ID bax034; doi:10.1093/database/bax034

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
April 2017	16
May 2017	42
June 2017	30
July 2017	18
August 2017	15
September 2017	10
October 2017	17
November 2017	14
December 2017	90
January 2018	87
February 2018	21
March 2018	25
April 2018	33
May 2018	23
June 2018	18
July 2018	36
August 2018	33
September 2018	21
October 2018	22
November 2018	25
December 2018	10
January 2019	13
February 2019	22
March 2019	24
April 2019	28
May 2019	42
June 2019	34
July 2019	20
August 2019	42
September 2019	90
October 2019	77
November 2019	14
December 2019	36
January 2020	23
February 2020	27
March 2020	14
April 2020	27
May 2020	19
June 2020	107
July 2020	72
August 2020	20
September 2020	28
October 2020	17
November 2020	30
December 2020	9
January 2021	19
February 2021	15
March 2021	25
April 2021	27
May 2021	16
June 2021	15
July 2021	13
August 2021	14
September 2021	21
October 2021	28
November 2021	16
December 2021	22
January 2022	14
February 2022	11
March 2022	20
April 2022	29
May 2022	8
June 2022	27
July 2022	15
August 2022	7
September 2022	31
October 2022	27
November 2022	13
December 2022	4
January 2023	17
February 2023	12
March 2023	5
April 2023	1
May 2023	27
June 2023	30
July 2023	30
August 2023	28
September 2023	13
October 2023	11
November 2023	17
December 2023	24
January 2024	46
February 2024	48
March 2024	26
April 2024	22

Article Contents

CHOmine: an integrated data warehouse for CHO systems biology and modeling

Abstract

Introduction

Materials and methods

Genome-scale metabolic reconstructions

Discussion and conclusion

Funding

References

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

CHOmine: an integrated data warehouse for CHO systems biology and modeling

Abstract

Introduction

Materials and methods

Genome-scale metabolic reconstructions

Discussion and conclusion

Funding

References

Author notes

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only