Abstract

The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees. Users can develop computational pipelines to analyse these data, in conjunction with data of their own that they can upload. Computationally estimated protein-protein interactions and biochemical pathways can be visualized at another site. Finally, we comment on our future plans and how they fit within this scalable system for the dissemination, visualization, and analysis of large multi-species data sets.

Introduction

The 1,000 plants (1KP) project is an international multidisciplinary consortium that has now generated transcriptome data from over 1,000 plant species. One of the goals of our species selection process was to provide exemplars for all of the major lineages across the Viridip l antae (green plants), representing approximately one billion years of evolution, including flowering plants, conifers, ferns, mosses and streptophyte green algae. Whereas genomics has long strived for completeness within species (e.g., every gene in the species), we were focused on completeness across an evolutionary clade — obviously not every species, but one representative species for everything at some phylogenetic level (e.g., one species per family, and perhaps more than one species when the family is especially large). Because many of our species had never been subjected to large-scale sequencing, 2 gigabases (Gb) of data per sample was sufficient to increase the number of plant genes by approximately 100-fold in comparison to the totality of the public databases.

The 1KP project began as a public-private partnership, with 75% of the funding provided by the Government of Alberta and 25% by Musea Ventures. Significant in-kind contributions were provided by BGI-Shenzhen in the form of reduced sequencing costs and by the NSF-funded iPlant collaborative [1] in the form of computational informatics support. Many plant scientists from around the world were involved in the collection of live tissue samples and in the extraction of RNA. Additional computing resources were provided by Compute Canada and by the China National GeneBank. Despite the constraints of this funding model, we released our data (on a collaborative basis) to scientists who approached us with goals that did not compete with ours. For the general community, access was provided through a BLAST portal [2].

We believed that there would be intrinsic value in data of this nature that is beyond our imagination. But for the initial publication, we agreed on two objectives. Firstly, by adopting a phylogenomics approach we hoped to resolve many of the lingering uncertainties in species relationships, especially in the early lineages of streptophyte green algae and land plants, where previous analyses were based on comparatively sparse taxonomic densities. And secondly, despite the limitations of these data, we hoped to identify some of the gene changes associated with the major innovations in Viridiplantae evolution, such as multicellularity, transitions from marine to freshwater or terrestrial environments, maternal retention of zygotes and embryos, complex life history involving haploid and diploid phases, vascular systems, seeds and flowers.

Our RNA extraction protocols [3] and our RNA-Seq transcriptome assembly algorithms [4] have already been published. Here, we are publishing the second of two linked papers. The first is a review of the state-of-knowledge for Viridiplantae species relationships and our initial foray into the phylogenomics on a subset of 1KP [5]. The other is a description of the websites that we created in order to provide access to the data (from raw reads to computed results), visualize the results, and perform custom analyses in conjunction with external data that the users can upload. An initial gene annotation is also provided, which focuses on the functional relationships between proteins and their associated metabolites.

Review

Access to raw and processed data

Our initial phylogenomics effort used sequences from multiple sources. They include transcriptomes from 1KP representing 85 species, transcriptomes from other sources representing 7 species, and genomes representing an additional 11 species. A summary of these data sources is given in Table 1. We submitted all of the unassembled reads from the 1KP transcriptomes to the Short Reads Archive (SRA) under project accession PRJEB4921 “1000 Plant (1KP) Transcriptome: The Pilot Study.” Note that, with the exception of Eschscholzia californica, we sequenced only one sample per species.

To make it easier for others to reproduce our phylogenomics analyses, we are releasing our intermediate computations, not just the final results. Everything is hosted at the iPlant Data Store, a high performance, large capacity, distributed storage system. The contents include transcriptome assemblies, putative coding sequences, orthogroups (i.e., from the 11 reference genomes), as well as gene and species trees with related sequence alignments. There are quite a lot of files and their total sizes are not negligible; so before users begin to download these files, we suggest that they consult Table 2 for a description of what to expect.

At the simplest level, anonymous downloads are permitted from a designated area of the iPlant Data Store [6]. However, much greater functionality is available through the iPlant resources that we describe in the following sections.

Visualization and custom analyses

To take full advantage of the iPlant computational infrastructure, it is necessary to first register at [7]. Accounts are free, and in addition to 1KP data, users will find high performance computing and cloud-based services. Multiple access modalities are supported: anonymous and secure web interfaces, desktop clients and high-speed command lines. However, we feel that for most users the best option is the iPlant discovery environment (DE), a web-based interface that provides users with high-performance computing resources and data storage. Most contemporary web browsers are supported, including Safari v. 6.1, Firefox v. 24, and Chrome v. 34. The caveat is that some of these functionalities (see below) require Java 1.6.

To guide users through its resources, iPlant is constantly producing new tutorials and teaching materials, including live and recorded webinars. The full catalog can be found at [8]. Here, we describe the new resources specifically created for 1KP.

Discovery environment (DE)

For access to the 1KP files, users should visit [9] and search for a folder called Community Data/onekp_pilotFigure 1.

From the data window it is possible to download individual files or perform bulk downloads of multiple files and directories through a Java plugin. Note that for security reasons, some operating systems will not allow users to run Java applets. In this instance, a window will pop up to tell the user that there is a problem, and the user should follow the instructions that are given to configure an iDrop desktop [10] Figure 2.

It is possible to perform analyses directly in the DE using any of the 1KP files as input; for example, users can re-compute the sequence alignments and gene trees using different algorithms and parameters [11] Figure 3. More generally, users can select from a variety of applications in the Apps catalogue, which is constantly growing, and includes many popular bioinformatics tools for large-scale phylogenetics, genome-wide associations and next generation sequence analyses.

Species and gene trees can be explored with the iPlant tree viewer, Phylozoom, a newly developed web-based phylogenetic tree viewer that supports trees with hundreds of thousand leaves and allows for semantic zooming Figure 4. To access the tree viewer, users need only click on a tree file. This will open a preview window with two tabs: one for the tree's newick string (a format for graph-theoretical trees as defined at [12]) and another for the web link that opens a window to the tree display. Notice that pop-ups must be enabled on the user's browser.

To zoom in and expand the collapsed clades, click on the node of interest. To zoom out, click and drag the tree figure to the left. To zoom out completely, click the space bar. The web address is a unique identifier that can be shared with others to let them to visualize the tree.

Table 1

Data sources for phylogenomics analyses

SpeciesTypeAccessioniPlant ID
Arabidopsis thalianagenomen/an/a
Brachypodium distachyongenomen/an/a
Carica papayagenomen/an/a
Medicago truncatulagenomen/an/a
Oryza sativagenomen/an/a
Physcomitrella patensgenomen/an/a
Populus trichocarpagenomen/an/a
Selaginella moellendorffiigenomen/an/a
Sorghum bicolorgenomen/an/a
Vitis viniferagenomen/an/a
Zea maysgenomen/an/a
Aquilegia formosameta-assemblyPlantGDBAQUI
Cycas rumphiimeta-assemblySRX022306, SRX022215CYCA
Liriodendron tulipiferameta-assemblyPRJNA46857LIRI
Persea americanameta-assemblyPRJNA46857PERS
Pinus taedameta-assemblyPRJNA79733PINU
Pteridium aquilinummeta-assemblyPRJNA48473PTER
Zamia vazqueziimeta-assemblyPRJNA46857ZAMI
Acorus americanusOneKP meta-assemblyERR364395, PRJNA46857ACOR
Amborella trichopodaOneKP meta-assemblyERR364329, PRJNA46857AMBO
Catharanthus roseusOneKP meta-assemblyERR364390, PRJNA79951, PRJNA236160CATH
Eschscholzia californicaOneKP meta-assemblyERR364338, ERR364335, ERR364336, ERR364337, ERR364334, SRX002988, SRX002987, PlantGDBESCH
Ginkgo bilobaOneKP meta-assemblyERR364401, PlantGDBGINK
Nuphar advenaOneKP meta-assemblyERR364330, PRJNA46857NUPH
Ophioglossum petiolatumOneKP meta-assemblyERR364410, SRX666586OPHI
Saruma henryiOneKP meta-assemblyERR364383, PRJNA46857SARU
Welwitschia mirabilisOneKP meta-assemblyERR364404, PRJNA46857WELW
Allamanda catharticaOneKPERR364389MGVU
Angiopteris evectaOneKPERR364409NHCM
Anomodon attenuatusOneKPERR364349QMWB
Bazzania trilobataOneKPERR364415WZYK
Boehmeria niveaOneKPERR364387ACFP
Bryum argenteumOneKPERR364348JMXW
Cedrus libaniOneKPERR364342GGEA
Ceratodon purpureusOneKPERR364350FFPD
Chaetosphaeridium globosumOneKPERR364369DRGY
Chara vulgarisOneKPERR364366CHAR
Chlorokybus atmophyticusOneKPERR364371AZZW
Colchicum autumnaleOneKPERR364397NHIX
Coleochaete irregularisOneKPERR364367QPDY
Coleochaete scutataOneKPERR364368VQBJ
Cosmarium ochthodesOneKPERR364376STKJ
Cunninghamia lanceolataOneKPERR364340OUOI
Cyathea (Alsophila) spinulosaOneKPERR364412GANB
Cycas micholitziiOneKPERR364405XZUY
Cylindrocystis brebissoniiOneKPERR364378YOXI
Cylindrocystis cushleckaeOneKPERR364373JOJQ
Dendrolycopodium obscurumOneKPERR364346XNXF
Dioscorea villosaOneKPERR364396OCWZ
Diospyros malabaricaOneKPERR364339KVFU
Entransia fimbriataOneKPERR364372BFIK
Ephedra sinicaOneKPERR364402VDAO
Equisetum diffusumOneKPERR364408CAPN
Gnetum montanumOneKPERR364403GTHK
Hedwigia ciliataOneKPERR364352YWNF
Hibiscus cannabinusOneKPERR364388OLXF
Houttuynia cordataOneKPERR364332CSSK
Huperzia squarrosaOneKPERR364407GAON
Inula heleniumOneKPERR364393AFQQ
Ipomoea purpureaOneKPERR364392VXKB
Juniperus scopulorumOneKPERR364341XMGP
Kadsura heteroclitaOneKPERR364331NWMY
Klebsormidium subtileOneKPERR364370FQLP
Kochia scopariaOneKPERR364385WGET
Larrea tridentataOneKPERR364386UDUT
Leucodon brachypusOneKPERR364353ZACW
Marchantia emarginataOneKPERR364417TFYI
Marchantia polymorphaOneKPERR364416JPYU
Mesostigma virideOneKPERR364365KYIO
Mesotaenium endlicherianumOneKPERR364377WDCW
Metzgeria crassipilisOneKPERR364359NRWZ
Monomastix opisthostigmaOneKPERR364362BTFM
Mougeotia sp.OneKPERR364374ZRMT
Nephroselmis pyriformisOneKPERR364363ISIM
Netrium digitusOneKPERR364379FFGR
Nothoceros aenigmaticusOneKPERR364356DXOU
Nothoceros vincentianusOneKPERR364357TCBC
Penium margaritaceumOneKPERR364382AEKF
Podophyllum peltatumOneKPERR364384WFBF
Polytrichum communeOneKPERR364413SZYG
Prumnopitys andinaOneKPERR364343EGLZ
Pseudolycopodiella carolinianaOneKPERR364345UPMJ
Psilotum nudumOneKPERR364411QVMR
Pyramimonas parkeaeOneKPERR364361TNAW
Rhynchostegium serrulatumOneKPERR364355JADL
Ricciocarpos natansOneKPERR364358WJLO
Rosmarinus officinalisOneKPERR364391FDMM
Rosulabryum cf. capillareOneKPERR364351XWHK
Roya obtusaOneKPERR364380XRTZ
Sabal bermudanaOneKPERR364400HWUP
Sarcandra glabraOneKPERR364333OSHQ
Sciadopitys verticillataOneKPERR364344YFZK
Selaginella stauntonianaOneKPERR364347ZZOL
Smilax bona-noxOneKPERR364398MWYQ
Sphaerocarpos texanusOneKPERR364360HERT
Sphagnum lescuriiOneKPERR364414GOWD
Spirogyra sp.OneKPERR364375HAOX
Spirotaenia minutaOneKPERR364381NNHQ
Tanacetum partheniumOneKPERR364394DUQG
Taxus baccataOneKPERR364406WWSS
Thuidium delicatulumOneKPERR364354EEMJ
Uronema sp.OneKPERR364364ISGT
Yucca filamentosaOneKPERR364399ICNN
SpeciesTypeAccessioniPlant ID
Arabidopsis thalianagenomen/an/a
Brachypodium distachyongenomen/an/a
Carica papayagenomen/an/a
Medicago truncatulagenomen/an/a
Oryza sativagenomen/an/a
Physcomitrella patensgenomen/an/a
Populus trichocarpagenomen/an/a
Selaginella moellendorffiigenomen/an/a
Sorghum bicolorgenomen/an/a
Vitis viniferagenomen/an/a
Zea maysgenomen/an/a
Aquilegia formosameta-assemblyPlantGDBAQUI
Cycas rumphiimeta-assemblySRX022306, SRX022215CYCA
Liriodendron tulipiferameta-assemblyPRJNA46857LIRI
Persea americanameta-assemblyPRJNA46857PERS
Pinus taedameta-assemblyPRJNA79733PINU
Pteridium aquilinummeta-assemblyPRJNA48473PTER
Zamia vazqueziimeta-assemblyPRJNA46857ZAMI
Acorus americanusOneKP meta-assemblyERR364395, PRJNA46857ACOR
Amborella trichopodaOneKP meta-assemblyERR364329, PRJNA46857AMBO
Catharanthus roseusOneKP meta-assemblyERR364390, PRJNA79951, PRJNA236160CATH
Eschscholzia californicaOneKP meta-assemblyERR364338, ERR364335, ERR364336, ERR364337, ERR364334, SRX002988, SRX002987, PlantGDBESCH
Ginkgo bilobaOneKP meta-assemblyERR364401, PlantGDBGINK
Nuphar advenaOneKP meta-assemblyERR364330, PRJNA46857NUPH
Ophioglossum petiolatumOneKP meta-assemblyERR364410, SRX666586OPHI
Saruma henryiOneKP meta-assemblyERR364383, PRJNA46857SARU
Welwitschia mirabilisOneKP meta-assemblyERR364404, PRJNA46857WELW
Allamanda catharticaOneKPERR364389MGVU
Angiopteris evectaOneKPERR364409NHCM
Anomodon attenuatusOneKPERR364349QMWB
Bazzania trilobataOneKPERR364415WZYK
Boehmeria niveaOneKPERR364387ACFP
Bryum argenteumOneKPERR364348JMXW
Cedrus libaniOneKPERR364342GGEA
Ceratodon purpureusOneKPERR364350FFPD
Chaetosphaeridium globosumOneKPERR364369DRGY
Chara vulgarisOneKPERR364366CHAR
Chlorokybus atmophyticusOneKPERR364371AZZW
Colchicum autumnaleOneKPERR364397NHIX
Coleochaete irregularisOneKPERR364367QPDY
Coleochaete scutataOneKPERR364368VQBJ
Cosmarium ochthodesOneKPERR364376STKJ
Cunninghamia lanceolataOneKPERR364340OUOI
Cyathea (Alsophila) spinulosaOneKPERR364412GANB
Cycas micholitziiOneKPERR364405XZUY
Cylindrocystis brebissoniiOneKPERR364378YOXI
Cylindrocystis cushleckaeOneKPERR364373JOJQ
Dendrolycopodium obscurumOneKPERR364346XNXF
Dioscorea villosaOneKPERR364396OCWZ
Diospyros malabaricaOneKPERR364339KVFU
Entransia fimbriataOneKPERR364372BFIK
Ephedra sinicaOneKPERR364402VDAO
Equisetum diffusumOneKPERR364408CAPN
Gnetum montanumOneKPERR364403GTHK
Hedwigia ciliataOneKPERR364352YWNF
Hibiscus cannabinusOneKPERR364388OLXF
Houttuynia cordataOneKPERR364332CSSK
Huperzia squarrosaOneKPERR364407GAON
Inula heleniumOneKPERR364393AFQQ
Ipomoea purpureaOneKPERR364392VXKB
Juniperus scopulorumOneKPERR364341XMGP
Kadsura heteroclitaOneKPERR364331NWMY
Klebsormidium subtileOneKPERR364370FQLP
Kochia scopariaOneKPERR364385WGET
Larrea tridentataOneKPERR364386UDUT
Leucodon brachypusOneKPERR364353ZACW
Marchantia emarginataOneKPERR364417TFYI
Marchantia polymorphaOneKPERR364416JPYU
Mesostigma virideOneKPERR364365KYIO
Mesotaenium endlicherianumOneKPERR364377WDCW
Metzgeria crassipilisOneKPERR364359NRWZ
Monomastix opisthostigmaOneKPERR364362BTFM
Mougeotia sp.OneKPERR364374ZRMT
Nephroselmis pyriformisOneKPERR364363ISIM
Netrium digitusOneKPERR364379FFGR
Nothoceros aenigmaticusOneKPERR364356DXOU
Nothoceros vincentianusOneKPERR364357TCBC
Penium margaritaceumOneKPERR364382AEKF
Podophyllum peltatumOneKPERR364384WFBF
Polytrichum communeOneKPERR364413SZYG
Prumnopitys andinaOneKPERR364343EGLZ
Pseudolycopodiella carolinianaOneKPERR364345UPMJ
Psilotum nudumOneKPERR364411QVMR
Pyramimonas parkeaeOneKPERR364361TNAW
Rhynchostegium serrulatumOneKPERR364355JADL
Ricciocarpos natansOneKPERR364358WJLO
Rosmarinus officinalisOneKPERR364391FDMM
Rosulabryum cf. capillareOneKPERR364351XWHK
Roya obtusaOneKPERR364380XRTZ
Sabal bermudanaOneKPERR364400HWUP
Sarcandra glabraOneKPERR364333OSHQ
Sciadopitys verticillataOneKPERR364344YFZK
Selaginella stauntonianaOneKPERR364347ZZOL
Smilax bona-noxOneKPERR364398MWYQ
Sphaerocarpos texanusOneKPERR364360HERT
Sphagnum lescuriiOneKPERR364414GOWD
Spirogyra sp.OneKPERR364375HAOX
Spirotaenia minutaOneKPERR364381NNHQ
Tanacetum partheniumOneKPERR364394DUQG
Taxus baccataOneKPERR364406WWSS
Thuidium delicatulumOneKPERR364354EEMJ
Uronema sp.OneKPERR364364ISGT
Yucca filamentosaOneKPERR364399ICNN

Meta-assembly refers to a transcriptome assembled from more than one sequenced sample. Some of these were a combination of 1KP and other data; some were entirely non-1KP. Accession numbers (SRA or otherwise) are given for all of the transcriptomes that we used.

Table 1

Data sources for phylogenomics analyses

SpeciesTypeAccessioniPlant ID
Arabidopsis thalianagenomen/an/a
Brachypodium distachyongenomen/an/a
Carica papayagenomen/an/a
Medicago truncatulagenomen/an/a
Oryza sativagenomen/an/a
Physcomitrella patensgenomen/an/a
Populus trichocarpagenomen/an/a
Selaginella moellendorffiigenomen/an/a
Sorghum bicolorgenomen/an/a
Vitis viniferagenomen/an/a
Zea maysgenomen/an/a
Aquilegia formosameta-assemblyPlantGDBAQUI
Cycas rumphiimeta-assemblySRX022306, SRX022215CYCA
Liriodendron tulipiferameta-assemblyPRJNA46857LIRI
Persea americanameta-assemblyPRJNA46857PERS
Pinus taedameta-assemblyPRJNA79733PINU
Pteridium aquilinummeta-assemblyPRJNA48473PTER
Zamia vazqueziimeta-assemblyPRJNA46857ZAMI
Acorus americanusOneKP meta-assemblyERR364395, PRJNA46857ACOR
Amborella trichopodaOneKP meta-assemblyERR364329, PRJNA46857AMBO
Catharanthus roseusOneKP meta-assemblyERR364390, PRJNA79951, PRJNA236160CATH
Eschscholzia californicaOneKP meta-assemblyERR364338, ERR364335, ERR364336, ERR364337, ERR364334, SRX002988, SRX002987, PlantGDBESCH
Ginkgo bilobaOneKP meta-assemblyERR364401, PlantGDBGINK
Nuphar advenaOneKP meta-assemblyERR364330, PRJNA46857NUPH
Ophioglossum petiolatumOneKP meta-assemblyERR364410, SRX666586OPHI
Saruma henryiOneKP meta-assemblyERR364383, PRJNA46857SARU
Welwitschia mirabilisOneKP meta-assemblyERR364404, PRJNA46857WELW
Allamanda catharticaOneKPERR364389MGVU
Angiopteris evectaOneKPERR364409NHCM
Anomodon attenuatusOneKPERR364349QMWB
Bazzania trilobataOneKPERR364415WZYK
Boehmeria niveaOneKPERR364387ACFP
Bryum argenteumOneKPERR364348JMXW
Cedrus libaniOneKPERR364342GGEA
Ceratodon purpureusOneKPERR364350FFPD
Chaetosphaeridium globosumOneKPERR364369DRGY
Chara vulgarisOneKPERR364366CHAR
Chlorokybus atmophyticusOneKPERR364371AZZW
Colchicum autumnaleOneKPERR364397NHIX
Coleochaete irregularisOneKPERR364367QPDY
Coleochaete scutataOneKPERR364368VQBJ
Cosmarium ochthodesOneKPERR364376STKJ
Cunninghamia lanceolataOneKPERR364340OUOI
Cyathea (Alsophila) spinulosaOneKPERR364412GANB
Cycas micholitziiOneKPERR364405XZUY
Cylindrocystis brebissoniiOneKPERR364378YOXI
Cylindrocystis cushleckaeOneKPERR364373JOJQ
Dendrolycopodium obscurumOneKPERR364346XNXF
Dioscorea villosaOneKPERR364396OCWZ
Diospyros malabaricaOneKPERR364339KVFU
Entransia fimbriataOneKPERR364372BFIK
Ephedra sinicaOneKPERR364402VDAO
Equisetum diffusumOneKPERR364408CAPN
Gnetum montanumOneKPERR364403GTHK
Hedwigia ciliataOneKPERR364352YWNF
Hibiscus cannabinusOneKPERR364388OLXF
Houttuynia cordataOneKPERR364332CSSK
Huperzia squarrosaOneKPERR364407GAON
Inula heleniumOneKPERR364393AFQQ
Ipomoea purpureaOneKPERR364392VXKB
Juniperus scopulorumOneKPERR364341XMGP
Kadsura heteroclitaOneKPERR364331NWMY
Klebsormidium subtileOneKPERR364370FQLP
Kochia scopariaOneKPERR364385WGET
Larrea tridentataOneKPERR364386UDUT
Leucodon brachypusOneKPERR364353ZACW
Marchantia emarginataOneKPERR364417TFYI
Marchantia polymorphaOneKPERR364416JPYU
Mesostigma virideOneKPERR364365KYIO
Mesotaenium endlicherianumOneKPERR364377WDCW
Metzgeria crassipilisOneKPERR364359NRWZ
Monomastix opisthostigmaOneKPERR364362BTFM
Mougeotia sp.OneKPERR364374ZRMT
Nephroselmis pyriformisOneKPERR364363ISIM
Netrium digitusOneKPERR364379FFGR
Nothoceros aenigmaticusOneKPERR364356DXOU
Nothoceros vincentianusOneKPERR364357TCBC
Penium margaritaceumOneKPERR364382AEKF
Podophyllum peltatumOneKPERR364384WFBF
Polytrichum communeOneKPERR364413SZYG
Prumnopitys andinaOneKPERR364343EGLZ
Pseudolycopodiella carolinianaOneKPERR364345UPMJ
Psilotum nudumOneKPERR364411QVMR
Pyramimonas parkeaeOneKPERR364361TNAW
Rhynchostegium serrulatumOneKPERR364355JADL
Ricciocarpos natansOneKPERR364358WJLO
Rosmarinus officinalisOneKPERR364391FDMM
Rosulabryum cf. capillareOneKPERR364351XWHK
Roya obtusaOneKPERR364380XRTZ
Sabal bermudanaOneKPERR364400HWUP
Sarcandra glabraOneKPERR364333OSHQ
Sciadopitys verticillataOneKPERR364344YFZK
Selaginella stauntonianaOneKPERR364347ZZOL
Smilax bona-noxOneKPERR364398MWYQ
Sphaerocarpos texanusOneKPERR364360HERT
Sphagnum lescuriiOneKPERR364414GOWD
Spirogyra sp.OneKPERR364375HAOX
Spirotaenia minutaOneKPERR364381NNHQ
Tanacetum partheniumOneKPERR364394DUQG
Taxus baccataOneKPERR364406WWSS
Thuidium delicatulumOneKPERR364354EEMJ
Uronema sp.OneKPERR364364ISGT
Yucca filamentosaOneKPERR364399ICNN
SpeciesTypeAccessioniPlant ID
Arabidopsis thalianagenomen/an/a
Brachypodium distachyongenomen/an/a
Carica papayagenomen/an/a
Medicago truncatulagenomen/an/a
Oryza sativagenomen/an/a
Physcomitrella patensgenomen/an/a
Populus trichocarpagenomen/an/a
Selaginella moellendorffiigenomen/an/a
Sorghum bicolorgenomen/an/a
Vitis viniferagenomen/an/a
Zea maysgenomen/an/a
Aquilegia formosameta-assemblyPlantGDBAQUI
Cycas rumphiimeta-assemblySRX022306, SRX022215CYCA
Liriodendron tulipiferameta-assemblyPRJNA46857LIRI
Persea americanameta-assemblyPRJNA46857PERS
Pinus taedameta-assemblyPRJNA79733PINU
Pteridium aquilinummeta-assemblyPRJNA48473PTER
Zamia vazqueziimeta-assemblyPRJNA46857ZAMI
Acorus americanusOneKP meta-assemblyERR364395, PRJNA46857ACOR
Amborella trichopodaOneKP meta-assemblyERR364329, PRJNA46857AMBO
Catharanthus roseusOneKP meta-assemblyERR364390, PRJNA79951, PRJNA236160CATH
Eschscholzia californicaOneKP meta-assemblyERR364338, ERR364335, ERR364336, ERR364337, ERR364334, SRX002988, SRX002987, PlantGDBESCH
Ginkgo bilobaOneKP meta-assemblyERR364401, PlantGDBGINK
Nuphar advenaOneKP meta-assemblyERR364330, PRJNA46857NUPH
Ophioglossum petiolatumOneKP meta-assemblyERR364410, SRX666586OPHI
Saruma henryiOneKP meta-assemblyERR364383, PRJNA46857SARU
Welwitschia mirabilisOneKP meta-assemblyERR364404, PRJNA46857WELW
Allamanda catharticaOneKPERR364389MGVU
Angiopteris evectaOneKPERR364409NHCM
Anomodon attenuatusOneKPERR364349QMWB
Bazzania trilobataOneKPERR364415WZYK
Boehmeria niveaOneKPERR364387ACFP
Bryum argenteumOneKPERR364348JMXW
Cedrus libaniOneKPERR364342GGEA
Ceratodon purpureusOneKPERR364350FFPD
Chaetosphaeridium globosumOneKPERR364369DRGY
Chara vulgarisOneKPERR364366CHAR
Chlorokybus atmophyticusOneKPERR364371AZZW
Colchicum autumnaleOneKPERR364397NHIX
Coleochaete irregularisOneKPERR364367QPDY
Coleochaete scutataOneKPERR364368VQBJ
Cosmarium ochthodesOneKPERR364376STKJ
Cunninghamia lanceolataOneKPERR364340OUOI
Cyathea (Alsophila) spinulosaOneKPERR364412GANB
Cycas micholitziiOneKPERR364405XZUY
Cylindrocystis brebissoniiOneKPERR364378YOXI
Cylindrocystis cushleckaeOneKPERR364373JOJQ
Dendrolycopodium obscurumOneKPERR364346XNXF
Dioscorea villosaOneKPERR364396OCWZ
Diospyros malabaricaOneKPERR364339KVFU
Entransia fimbriataOneKPERR364372BFIK
Ephedra sinicaOneKPERR364402VDAO
Equisetum diffusumOneKPERR364408CAPN
Gnetum montanumOneKPERR364403GTHK
Hedwigia ciliataOneKPERR364352YWNF
Hibiscus cannabinusOneKPERR364388OLXF
Houttuynia cordataOneKPERR364332CSSK
Huperzia squarrosaOneKPERR364407GAON
Inula heleniumOneKPERR364393AFQQ
Ipomoea purpureaOneKPERR364392VXKB
Juniperus scopulorumOneKPERR364341XMGP
Kadsura heteroclitaOneKPERR364331NWMY
Klebsormidium subtileOneKPERR364370FQLP
Kochia scopariaOneKPERR364385WGET
Larrea tridentataOneKPERR364386UDUT
Leucodon brachypusOneKPERR364353ZACW
Marchantia emarginataOneKPERR364417TFYI
Marchantia polymorphaOneKPERR364416JPYU
Mesostigma virideOneKPERR364365KYIO
Mesotaenium endlicherianumOneKPERR364377WDCW
Metzgeria crassipilisOneKPERR364359NRWZ
Monomastix opisthostigmaOneKPERR364362BTFM
Mougeotia sp.OneKPERR364374ZRMT
Nephroselmis pyriformisOneKPERR364363ISIM
Netrium digitusOneKPERR364379FFGR
Nothoceros aenigmaticusOneKPERR364356DXOU
Nothoceros vincentianusOneKPERR364357TCBC
Penium margaritaceumOneKPERR364382AEKF
Podophyllum peltatumOneKPERR364384WFBF
Polytrichum communeOneKPERR364413SZYG
Prumnopitys andinaOneKPERR364343EGLZ
Pseudolycopodiella carolinianaOneKPERR364345UPMJ
Psilotum nudumOneKPERR364411QVMR
Pyramimonas parkeaeOneKPERR364361TNAW
Rhynchostegium serrulatumOneKPERR364355JADL
Ricciocarpos natansOneKPERR364358WJLO
Rosmarinus officinalisOneKPERR364391FDMM
Rosulabryum cf. capillareOneKPERR364351XWHK
Roya obtusaOneKPERR364380XRTZ
Sabal bermudanaOneKPERR364400HWUP
Sarcandra glabraOneKPERR364333OSHQ
Sciadopitys verticillataOneKPERR364344YFZK
Selaginella stauntonianaOneKPERR364347ZZOL
Smilax bona-noxOneKPERR364398MWYQ
Sphaerocarpos texanusOneKPERR364360HERT
Sphagnum lescuriiOneKPERR364414GOWD
Spirogyra sp.OneKPERR364375HAOX
Spirotaenia minutaOneKPERR364381NNHQ
Tanacetum partheniumOneKPERR364394DUQG
Taxus baccataOneKPERR364406WWSS
Thuidium delicatulumOneKPERR364354EEMJ
Uronema sp.OneKPERR364364ISGT
Yucca filamentosaOneKPERR364399ICNN

Meta-assembly refers to a transcriptome assembled from more than one sequenced sample. Some of these were a combination of 1KP and other data; some were entirely non-1KP. Accession numbers (SRA or otherwise) are given for all of the transcriptomes that we used.

For more advanced users wanting to perform more complicated procedures, iPlant capabilities are available from a command line. It is based on the integrated ruleoriented data system (iRODS) [13]. All the user has to do is install a command line utility, icommands, which mimics UNIX and enables high-speed parallel data transfers. Instructions are available at [14].

Interactions and pathways

In addition to the tree-based species and gene relationships at the iPlant site, functional relationships between proteins and their associated metabolites are available from the Computational Biology Group at the University of Washington, developers of CANDO [15]. Sequence similarity-based methods are used to map 1KP proteins to curated repositories of proteinprotein interactions (i.e., BioGRID [16]) and biochemical pathways (i.e., Kyoto Encylopedia of Genes and Genomes [KEGG] [17]). The user can select any metabolic pathway defined by KEGG and, within this context, see all the 1KP proteins from their chosen species, with functional annotations inferred from KEGG. This website is at [18] Figure 5.

Note that, over the course of this project, there have been many improvements in the transcriptome assemblies. The phylogenomics work (now being published) was done with the SOAPdenovo algorithm. A second assembly was subsequently done with the newer SOAPdenovo-trans algorithm, which we incorporated into the newer interactions and pathways work. However, both sets of assemblies are available through the iPlant data store.

Conclusions

The rest of the 1KP data will be released, on much the same platform, along with our analyses of all one thousand species. Our scientific objectives are given at [19]. We have always been open about our intentions, because we wanted to avoid conflict among the scientists who were already working with 1KP and offer early pre-publication access to other non-competing scientists. As soon as we see a draft of a paper, we track its progress through the review process at [20]. Some of these papers have already been published, and more than a few required years of follow-up experiments, resulting for example in fundamental discoveries for molecular evolution [21] and (surprisingly) new tools for mammalian neurosciences [22].

Many of these studies were not anticipated when 1KP was conceived. We only knew that, just as there was value in sequencing every gene in a genome, despite not knowing a priori what many of the genes might do, there would be value in sequencing across an ancient and ecologically dominant clade, even when many of the species have no obvious economic or scientific value that would justify a genome sequencing effort. Transcriptomes were a less expensive way to explore plant diversity, and demonstrate value beyond the obvious species.

Table 2

Number and size of data files on websites

File countMedian size (Mb)Average size (Mb)Largest size (Mb)Total size (Mb)Similar directoriesiPlant directory name
68,2530.00.3481.123,116.6onekp_pilot
48,0530.00.3481.114,956.7onekp_pilot/orthogroups
19,2200.10.7243.813,276.5onekp_pilot/orthogroups/alignments
9,6100.10.379.83,289.6onekp_pilot/orthogroups/alignments/FAA
9,6100.21.0243.89,986.9onekp_pilot/orthogroups/alignments/FNA
28,8330.00.1481.11,680.2onekp_pilot/orthogroups/gene_trees
9,6110.00.1481.1583.3onekp_pilot/orthogroups/gene_trees/FAA
9,6100.00.00.5102.2onekp_pilot/orthogroups/gene_trees/FAA/trees
19,2220.00.1458.01,096.8onekp_pilot/orthogroups/gene_trees/FNA
9,6110.00.1458.0556.6onekp_pilot/orthogroups/gene_trees/FNA/12_codon
9,6100.00.00.598.5onekp_pilot/orthogroups/gene_trees/FNA/12_codon/trees
9,6110.00.1439.1540.3onekp_pilot/orthogroups/gene_trees/FNA/all_codon
9,6100.00.00.5101.2onekp_pilot/orthogroups/gene_trees/FNA/all_codon/dna_tree
19,9190.00.2175.23,468.8onekp_pilot/phylogenetic_analysis
2,5560.10.11.0292.7onekp_pilot/phylogenetic_analysis/alignments
8520.00.00.341.8onekp_pilot/phylogenetic_analysis/alignments/FAA
8520.10.11.0125.5onekp_pilot/phylogenetic_analysis/alignments/FNA
8520.10.10.9125.4onekp_pilot/phylogenetic_analysis/alignments/FNA2AA
17,1970.00.10.41,827.3onekp_pilot/phylogenetic_analysis/gene_trees
1,7040.00.10.4238.3onekp_pilot/phylogenetic_analysis/gene_trees/FAA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FAA/raxmlboot.####
1,7040.00.10.4238.3onekp_pilot/phylogenetic_analysis/gene_trees/FNA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA/raxmlboot.####
3,4080.00.10.4476.7onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA/raxmlboot.####
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA/raxmlboot.####.c1c2
10,3810.00.10.4874.0onekp_pilot/phylogenetic_analysis/gene_trees/filtered
2,5480.00.10.4169.3onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA/raxmlboot.####.f25
10.20.10.40.2852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA/raxmlboot.####.filterlen33
8520.00.00.03.8onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA/raxmlboot.####.f25
6,9800.00.10.4700.9onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.GAMMA.2
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.c1c2.GAMMA.2
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.c1c2.f25
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.f25
20.20.10.40.2844onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.filterlen33
10.30.30.40.3180onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.filtered25.GAMMA.2
1660.08.1175.21,348.8onekp_pilot/phylogenetic_analysis/species_level
5015.027.0175.21,348.1onekp_pilot/phylogenetic_analysis/species_level/alignments
1514.714.358.3214.2onekp_pilot/phylogenetic_analysis/species_level/alignments/FAA
3529.432.4175.21,133.9onekp_pilot/phylogenetic_analysis/species_level/alignments/FNA
1160.00.00.00.6onekp_pilot/phylogenetic_analysis/species_level/trees
27610.017.0157.44,691.1onekp_pilot/taxa
39.717.0157.451.092onekp_pilot/taxa/####-############
130.817.0157.436.092onekp_pilot/taxa/####-############/assemblies
29.77.545.215.092onekp_pilot/taxa/####-############/translations
50.00.00.00.1onekp_pilot/tools
File countMedian size (Mb)Average size (Mb)Largest size (Mb)Total size (Mb)Similar directoriesiPlant directory name
68,2530.00.3481.123,116.6onekp_pilot
48,0530.00.3481.114,956.7onekp_pilot/orthogroups
19,2200.10.7243.813,276.5onekp_pilot/orthogroups/alignments
9,6100.10.379.83,289.6onekp_pilot/orthogroups/alignments/FAA
9,6100.21.0243.89,986.9onekp_pilot/orthogroups/alignments/FNA
28,8330.00.1481.11,680.2onekp_pilot/orthogroups/gene_trees
9,6110.00.1481.1583.3onekp_pilot/orthogroups/gene_trees/FAA
9,6100.00.00.5102.2onekp_pilot/orthogroups/gene_trees/FAA/trees
19,2220.00.1458.01,096.8onekp_pilot/orthogroups/gene_trees/FNA
9,6110.00.1458.0556.6onekp_pilot/orthogroups/gene_trees/FNA/12_codon
9,6100.00.00.598.5onekp_pilot/orthogroups/gene_trees/FNA/12_codon/trees
9,6110.00.1439.1540.3onekp_pilot/orthogroups/gene_trees/FNA/all_codon
9,6100.00.00.5101.2onekp_pilot/orthogroups/gene_trees/FNA/all_codon/dna_tree
19,9190.00.2175.23,468.8onekp_pilot/phylogenetic_analysis
2,5560.10.11.0292.7onekp_pilot/phylogenetic_analysis/alignments
8520.00.00.341.8onekp_pilot/phylogenetic_analysis/alignments/FAA
8520.10.11.0125.5onekp_pilot/phylogenetic_analysis/alignments/FNA
8520.10.10.9125.4onekp_pilot/phylogenetic_analysis/alignments/FNA2AA
17,1970.00.10.41,827.3onekp_pilot/phylogenetic_analysis/gene_trees
1,7040.00.10.4238.3onekp_pilot/phylogenetic_analysis/gene_trees/FAA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FAA/raxmlboot.####
1,7040.00.10.4238.3onekp_pilot/phylogenetic_analysis/gene_trees/FNA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA/raxmlboot.####
3,4080.00.10.4476.7onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA/raxmlboot.####
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA/raxmlboot.####.c1c2
10,3810.00.10.4874.0onekp_pilot/phylogenetic_analysis/gene_trees/filtered
2,5480.00.10.4169.3onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA/raxmlboot.####.f25
10.20.10.40.2852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA/raxmlboot.####.filterlen33
8520.00.00.03.8onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA/raxmlboot.####.f25
6,9800.00.10.4700.9onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.GAMMA.2
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.c1c2.GAMMA.2
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.c1c2.f25
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.f25
20.20.10.40.2844onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.filterlen33
10.30.30.40.3180onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.filtered25.GAMMA.2
1660.08.1175.21,348.8onekp_pilot/phylogenetic_analysis/species_level
5015.027.0175.21,348.1onekp_pilot/phylogenetic_analysis/species_level/alignments
1514.714.358.3214.2onekp_pilot/phylogenetic_analysis/species_level/alignments/FAA
3529.432.4175.21,133.9onekp_pilot/phylogenetic_analysis/species_level/alignments/FNA
1160.00.00.00.6onekp_pilot/phylogenetic_analysis/species_level/trees
27610.017.0157.44,691.1onekp_pilot/taxa
39.717.0157.451.092onekp_pilot/taxa/####-############
130.817.0157.436.092onekp_pilot/taxa/####-############/assemblies
29.77.545.215.092onekp_pilot/taxa/####-############/translations
50.00.00.00.1onekp_pilot/tools
File countMedian size (Mb)Average size(Mb)Largest size (Mb)Total size (Mb)Similar directoriesContents at SRA (PRJEB4921)
1781,915.02,045.53,371.0364,100.0total of all short reads — uncompressed, but downloads are compressed to a quarter of these sizes
21,915.02,045.53,371.04,091.089expecting per sample — uncompressed, but downloads are compressed to a quarter of these sizes
File countMedian size (Mb)Average size(Mb)Largest size (Mb)Total size (Mb)Similar directoriesContents at SRA (PRJEB4921)
1781,915.02,045.53,371.0364,100.0total of all short reads — uncompressed, but downloads are compressed to a quarter of these sizes
21,915.02,045.53,371.04,091.089expecting per sample — uncompressed, but downloads are compressed to a quarter of these sizes

In some instances, users will find many directories with similar names, as indicated in this table by hash (#) marks. The total number of directories is given in the preceding column.

Table 2

Number and size of data files on websites

File countMedian size (Mb)Average size (Mb)Largest size (Mb)Total size (Mb)Similar directoriesiPlant directory name
68,2530.00.3481.123,116.6onekp_pilot
48,0530.00.3481.114,956.7onekp_pilot/orthogroups
19,2200.10.7243.813,276.5onekp_pilot/orthogroups/alignments
9,6100.10.379.83,289.6onekp_pilot/orthogroups/alignments/FAA
9,6100.21.0243.89,986.9onekp_pilot/orthogroups/alignments/FNA
28,8330.00.1481.11,680.2onekp_pilot/orthogroups/gene_trees
9,6110.00.1481.1583.3onekp_pilot/orthogroups/gene_trees/FAA
9,6100.00.00.5102.2onekp_pilot/orthogroups/gene_trees/FAA/trees
19,2220.00.1458.01,096.8onekp_pilot/orthogroups/gene_trees/FNA
9,6110.00.1458.0556.6onekp_pilot/orthogroups/gene_trees/FNA/12_codon
9,6100.00.00.598.5onekp_pilot/orthogroups/gene_trees/FNA/12_codon/trees
9,6110.00.1439.1540.3onekp_pilot/orthogroups/gene_trees/FNA/all_codon
9,6100.00.00.5101.2onekp_pilot/orthogroups/gene_trees/FNA/all_codon/dna_tree
19,9190.00.2175.23,468.8onekp_pilot/phylogenetic_analysis
2,5560.10.11.0292.7onekp_pilot/phylogenetic_analysis/alignments
8520.00.00.341.8onekp_pilot/phylogenetic_analysis/alignments/FAA
8520.10.11.0125.5onekp_pilot/phylogenetic_analysis/alignments/FNA
8520.10.10.9125.4onekp_pilot/phylogenetic_analysis/alignments/FNA2AA
17,1970.00.10.41,827.3onekp_pilot/phylogenetic_analysis/gene_trees
1,7040.00.10.4238.3onekp_pilot/phylogenetic_analysis/gene_trees/FAA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FAA/raxmlboot.####
1,7040.00.10.4238.3onekp_pilot/phylogenetic_analysis/gene_trees/FNA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA/raxmlboot.####
3,4080.00.10.4476.7onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA/raxmlboot.####
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA/raxmlboot.####.c1c2
10,3810.00.10.4874.0onekp_pilot/phylogenetic_analysis/gene_trees/filtered
2,5480.00.10.4169.3onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA/raxmlboot.####.f25
10.20.10.40.2852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA/raxmlboot.####.filterlen33
8520.00.00.03.8onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA/raxmlboot.####.f25
6,9800.00.10.4700.9onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.GAMMA.2
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.c1c2.GAMMA.2
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.c1c2.f25
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.f25
20.20.10.40.2844onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.filterlen33
10.30.30.40.3180onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.filtered25.GAMMA.2
1660.08.1175.21,348.8onekp_pilot/phylogenetic_analysis/species_level
5015.027.0175.21,348.1onekp_pilot/phylogenetic_analysis/species_level/alignments
1514.714.358.3214.2onekp_pilot/phylogenetic_analysis/species_level/alignments/FAA
3529.432.4175.21,133.9onekp_pilot/phylogenetic_analysis/species_level/alignments/FNA
1160.00.00.00.6onekp_pilot/phylogenetic_analysis/species_level/trees
27610.017.0157.44,691.1onekp_pilot/taxa
39.717.0157.451.092onekp_pilot/taxa/####-############
130.817.0157.436.092onekp_pilot/taxa/####-############/assemblies
29.77.545.215.092onekp_pilot/taxa/####-############/translations
50.00.00.00.1onekp_pilot/tools
File countMedian size (Mb)Average size (Mb)Largest size (Mb)Total size (Mb)Similar directoriesiPlant directory name
68,2530.00.3481.123,116.6onekp_pilot
48,0530.00.3481.114,956.7onekp_pilot/orthogroups
19,2200.10.7243.813,276.5onekp_pilot/orthogroups/alignments
9,6100.10.379.83,289.6onekp_pilot/orthogroups/alignments/FAA
9,6100.21.0243.89,986.9onekp_pilot/orthogroups/alignments/FNA
28,8330.00.1481.11,680.2onekp_pilot/orthogroups/gene_trees
9,6110.00.1481.1583.3onekp_pilot/orthogroups/gene_trees/FAA
9,6100.00.00.5102.2onekp_pilot/orthogroups/gene_trees/FAA/trees
19,2220.00.1458.01,096.8onekp_pilot/orthogroups/gene_trees/FNA
9,6110.00.1458.0556.6onekp_pilot/orthogroups/gene_trees/FNA/12_codon
9,6100.00.00.598.5onekp_pilot/orthogroups/gene_trees/FNA/12_codon/trees
9,6110.00.1439.1540.3onekp_pilot/orthogroups/gene_trees/FNA/all_codon
9,6100.00.00.5101.2onekp_pilot/orthogroups/gene_trees/FNA/all_codon/dna_tree
19,9190.00.2175.23,468.8onekp_pilot/phylogenetic_analysis
2,5560.10.11.0292.7onekp_pilot/phylogenetic_analysis/alignments
8520.00.00.341.8onekp_pilot/phylogenetic_analysis/alignments/FAA
8520.10.11.0125.5onekp_pilot/phylogenetic_analysis/alignments/FNA
8520.10.10.9125.4onekp_pilot/phylogenetic_analysis/alignments/FNA2AA
17,1970.00.10.41,827.3onekp_pilot/phylogenetic_analysis/gene_trees
1,7040.00.10.4238.3onekp_pilot/phylogenetic_analysis/gene_trees/FAA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FAA/raxmlboot.####
1,7040.00.10.4238.3onekp_pilot/phylogenetic_analysis/gene_trees/FNA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA/raxmlboot.####
3,4080.00.10.4476.7onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA/raxmlboot.####
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/FNA2AA/raxmlboot.####.c1c2
10,3810.00.10.4874.0onekp_pilot/phylogenetic_analysis/gene_trees/filtered
2,5480.00.10.4169.3onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA/raxmlboot.####.f25
10.20.10.40.2852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FAA/raxmlboot.####.filterlen33
8520.00.00.03.8onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA/raxmlboot.####.f25
6,9800.00.10.4700.9onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.GAMMA.2
20.30.10.40.3852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.c1c2.GAMMA.2
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.c1c2.f25
10.00.00.00.0852onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.f25
20.20.10.40.2844onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.filterlen33
10.30.30.40.3180onekp_pilot/phylogenetic_analysis/gene_trees/filtered/FNA2AA/raxmlboot.####.filtered25.GAMMA.2
1660.08.1175.21,348.8onekp_pilot/phylogenetic_analysis/species_level
5015.027.0175.21,348.1onekp_pilot/phylogenetic_analysis/species_level/alignments
1514.714.358.3214.2onekp_pilot/phylogenetic_analysis/species_level/alignments/FAA
3529.432.4175.21,133.9onekp_pilot/phylogenetic_analysis/species_level/alignments/FNA
1160.00.00.00.6onekp_pilot/phylogenetic_analysis/species_level/trees
27610.017.0157.44,691.1onekp_pilot/taxa
39.717.0157.451.092onekp_pilot/taxa/####-############
130.817.0157.436.092onekp_pilot/taxa/####-############/assemblies
29.77.545.215.092onekp_pilot/taxa/####-############/translations
50.00.00.00.1onekp_pilot/tools
File countMedian size (Mb)Average size(Mb)Largest size (Mb)Total size (Mb)Similar directoriesContents at SRA (PRJEB4921)
1781,915.02,045.53,371.0364,100.0total of all short reads — uncompressed, but downloads are compressed to a quarter of these sizes
21,915.02,045.53,371.04,091.089expecting per sample — uncompressed, but downloads are compressed to a quarter of these sizes
File countMedian size (Mb)Average size(Mb)Largest size (Mb)Total size (Mb)Similar directoriesContents at SRA (PRJEB4921)
1781,915.02,045.53,371.0364,100.0total of all short reads — uncompressed, but downloads are compressed to a quarter of these sizes
21,915.02,045.53,371.04,091.089expecting per sample — uncompressed, but downloads are compressed to a quarter of these sizes

In some instances, users will find many directories with similar names, as indicated in this table by hash (#) marks. The total number of directories is given in the preceding column.

Figure 1

iPlant DE data window.

Figure 2

Bulk download window if Java is disabled. Click on the circled link to access the instructions to install and configure an iDrop desktop.

Figure 3

Realigning a group of sequences using Muscle.

Figure 4

Phylozoom display of 1KP species phylogeny.

Figure 5

Phenylpropanoid synthesis pathway for Colchicum autumnale. Labelled rectangles are proteins. Small circles are metabolites. Black lines show the KEGG pathway. Red lines show the BioGRID interactions emanating from protein (K12355), which was interactively selected. A right-click on the protein will display the inferred function and a link to the sequence(s).

Abbreviations

     
  • 1KP

    1,000 Plants project

  •  
  • DE

    Discovery Environment

  •  
  • KEEG

    Kyoto Encyclopedia of Genes and Genomes

  •  
  • NSF

    National Science Foundation

  •  
  • SRA

    Short Reads Archive

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

CWD, BRR, NWM, SWG, S Ma, BS, MM, DES, PSS, CR, LP, JAS, LD, DWS, JCV, TC, TMK, MR, RSB, MKD, and JLM collected the plant samples. NM, NJW, S Mi, NN, TW, SA, MB, JGB, MAG, EW, JPD, CWD, BR, HP, BRR, and JLM performed the phylogenomic analyses. NM, LHH, ZY, and EJC setup and maintained web-resources used to communicate data. LHH and RS performed the protein and KEGG pathway analyses. EJC, ZT, XW, XS, YZ, JW, and GKW generated the sequence data. GKW and JLM designed and oversaw the research. All authors read and approved the final manuscript.

Acknowledgments

The 1000 Plants (1KP) initiative, led by GKW, is funded by the Alberta Ministry of Innovation and Advanced Education, Alberta Innovates Technology Futures (AITF), Innovates Centre of Research Excellence (iCORE), Musea Ventures, BGI-Shenzhen and China National GeneBank (CNGB). We thank the many people responsible for sample collection on 1KP and the staff at BGI-Shenzhen for doing our sequencing. Phylogenomic analyses were supported by the US National Science Foundation through the iPlant collaborative. CANDO was funded by an NIH Director's Pioneer Award 1DP1OD006779-01.

References

1.
Goff
SA
Vaughn
M
McKay
S
Lyons
E
Stapleton
AE
Gessler
D
Matasci
N
Wang
L
Hanlon
M
Lenards
A
Muir
A
Merchant
N
Lowry
S
Mock
S
Helmke
M
Kubach
A
Narro
M
Hopkins
N
Micklos
D
Hilgert
U
Gonzales
M
Jordan
C
Skidmore
E
Dooley
R
Cazes
J
McLay
R
Lu
Z
Pasternak
S
Koesterke
L
Piel
WH
, et al. 
2011
The iPlant Collaborative: Cyberinfrastructure for Plant Biology
Front Plant Sci
, vol. 
2
 pg. 
34
 
3.
Johnson
MT
Carpenter
EJ
Tian
Z
Bruskiewich
R
Burris
JN
Carrigan
CT
Chase
MW
Clarke
ND
Covshoff
S
dePamphilis
CW
Edger
PP
Goh
F
Graham
S
Greiner
S
Hibberd
JM
Jordon-Thaden
I
Kutchan
TM
Leebens-Mack
J
Melkonian
M
Miles
N
Myburg
H
Patterson
J
Pires
JC
Ralph
P
Rolf
M
Sage
RF
Soltis
D
Soltis
P
Stevenson
D
Stewart
CN
Jr
, et al. 
2012
Evaluating methods for isolating total RNA and predicting the success of sequencing phylogenetically diverse plant transcriptomes
PLoS One
, vol. 
7
 pg. 
e50226
  
10.1371/journal.pone.0050226
4.
Xie
Y
Wu
G
Tang
J
Luo
R
Patterson
J
Liu
S
Huang
W
He
G
Gu
S
Li
S
Zhou
X
Lam
TW
Li
Y
Xu
X
Wong
GK
Wang
J
2014
SOAPdenovo-Trans: De novo transcriptome assembly with short RNA-Seq reads
Bioinformatics
, vol. 
30
 (pg. 
1660
-
1666
10.1093/bioinformatics/btu077
5.
Wickett
NJ
Mirarab
S
Nguyen
N
Warnow
T
Carpenter
E
Matasci
N
Ayyampalayam
S
Barker
M
Burleigh
JG
Gitzendanner
MA
Ruhfel
B
Wafula
E
Der
JP
Graham
SW
Mathews
S
Melkonian
M
Soltis
DE
Soltis
PS
Miles
NW
Rothfels
C
Pokorny
L
Shaw
AJ
deGironimo
L
Stevenson
DW
Surek
B
Villarreal
JC
Roure
B
Philippe
H
dePamphilis
CW
Chen
T
A phylotranscriptomics analysis of the origin and early diversification of land plants
Proc Natl Acad Sci U S A
 
IN PRESS
7.
iPlant User Registration
 
9.
iPlant Discovery Environment
 
11.
Matasci
N
McKay
SJ
2013
Phylogenetic analysis with the iPlant discovery environment
Curr Protoc Bioinformatics
, vol. 
6
 pg. 
Unit 6.13
 
13.
iRODS Data Management Software
 
15.
Minie
M
Chopra
G
Sethi
G
Horst
J
White
G
Roy
A
Hatti
K
Samudrala
R
2014
CANDO and the infinite drug discovery frontier
Drug Discov Today
, vol. 
19
 (pg. 
1353
-
1363
10.1016/j.drudis.2014.06.018
16.
BioGRID Interactions
 
17.
Kyoto Encylopedia of Genes and Genomes (KEGG)
 
18.
1KP Protein-Protein Interactions Mapped to Metabolic Pathways
 
21.
Sayou
C
Monniaux
M
Nanao
MH
Moyroud
E
Brockington
SF
Thévenon
E
Chahtane
H
Warthmann
N
Melkonian
M
Zhang
Y
Wong
GK
Weigel
D
Parcy
F
Dumas
R
2014
A promiscuous intermediate underlies the evolution of LEAFY DNA binding specificity
Science
, vol. 
343
 (pg. 
645
-
648
10.1126/science.1248229
22.
Klapoetke
NC
Murata
Y
Kim
SS
Pulver
SR
Birdsey-Benson
A
Cho
YK
Morimoto
TK
Chuong
AS
Carpenter
EJ
Tian
Z
Wang
J
Xie
Y
Yan
Z
Zhang
Y
Chow
BY
Surek
B
Melkonian
M
Jayaraman
V
Constantine-Paton
M
Wong
GK
Boyden
ES
2014
Independent optical excitation of distinct neural populations
Nat Methods
, vol. 
11
 (pg. 
338
-
346
10.1038/nmeth.2836
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.