ApicoTFdb: The comprehensive web repository of apicomplexan transcription factors and regulators

Despite significant progress in apicomplexans genome sequencing and genomics, the current list of experimentally validated TFs in these genomes is incomplete and mainly consists of AP2 family of proteins, with only a limited number of non-AP2 family TFs and TAFs. We have performed systematic bioinformatics aided prediction of TFs and TAFs in apicomplexan genomes, and developed ApicoTFdb database which consists of experimentally validated as well as computationally predicted TFs and TAFs in 14 apicomplexan species. The predicted TFs are manually curated to complement the existing annotations. The current version of the database includes 1310 TFs, out of which 833 are novel and computationally predicted TFs, representing 22 distinct families across 14 apicomplexan species. The predictions include TFs of TUB, NAC, BSD, CCAAT, HTH, Cupin/Jumonji, winged-helix, and FHA family proteins, not reported earlier in the genomes. Apart from TFs, ApicoTFdb also classifies TAFs into three main subclasses-TRs, CRRs and RNARs, representing 3047 TAFs in 14 apicomplexan species are analyzed in this study. The database is equipped with a set of useful tools for comparative analysis of a user-defined list of the proteins. ApicoTFdb will be useful to the researchers interested in less-studied gene regulatory mechanisms mediating the complex life cycle of the apicomplexan parasites. The database will aid the discovery of novel drug targets to much needed combat the growing drug resistance in the parasites.


47
Transcription regulation is a key process that facilitates the cellular responses to 48 different environmental conditions. The underlying transcriptional machinery of 49 regulation is more complex in eukaryotes as compared to that in prokaryotes due to TFs can bind to specific DNA sequences upstream of promoter regions, controlling the 56 rate of transcription and, thus transfer of genetic information [4,5]. Here, the sequence 57 diversity among DBDs also ensures precise regulation of various cellular processes in 58 response to external and internal perturbations [6]. In fact, even in well-annotated 59 organisms, numerous TFs have obscure DNA binding sequences which can still direct 60 complex transcription regulation [7]. In addition, the transcription co-factors such as 61 Chromatin Remodeling Factor, also control the direction of gene regulation by assisting 62 general TFs [8]. Moreover, notwithstanding for the best-examined classes of DBDs, 63 due to the diversity in protein as well as within the recognition sequences, the precise 64 prediction of the regulators remains a challenging task [9].  that otherwise remain obscure with existing resources. 98 We believe that the presented database would be extremely useful for the scientific 99 community interested in deducing the regulatory molecules and their mechanism that 100 governs the complex life cycle of any of these 14 different parasites. ApicoTFdb is 101 freely accessible at http://bioinfo.icgeb.res.in/PtDB/.

103
The protein sequences for TF identification across 14 apicomplexans species (see Table   104 1) were retrieved from PlasmoDB (  In order to classify a given protein sequence into a TF-family, we exploited its DBD 125 profiles using the methods mentioned before. For a given protein, we independently 126 obtained its domain information predicted with interproscan5 and GO based biological 127 function, if available. We performed careful manual curation for each sequence by 128 assigning a TF-family to it on the basis of conserved DBDs and integrating the above-129 mentioned sources of information. Since a protein sequence may have more than one 130 DBD, therefore, for proteins with more than one DBDs, we assigned the TF-family on

155
Using the in-silico approach, we predicted and classified the TFs and TAFs for 14 156 apicomplexans species. ApicoTFdb, thus provides a unique platform to analyze several 157 new classes of TFs/TAFs not reported earlier in the parasites genomes.  Table S1]. The results include, 578 Plasmodium spp. TFs, 77 172 Cryptosporidium spp. TFs, overall 648 TFs from Toxoplasma spp., Eimeria spp.,

173
Cyclospora spp., Neurospora spp., and Babesia spp., as shown in Table 1.  Within our predicted set of TFs, we observed a large number of validated TFs (Table   180 1). For instance, out of the known 28 P. falciparum TFs, we were able to predict 23 181 annotated TFs. Since we are able to retain majority of known TFs, we extended our 182 analysis to classify this list of TFs according to their respective domains.

Genome wide analysis of transcription factors in apicomplexans
Using the TF prediction pipeline, we have successfully assigned functions to 279 185 proteins, previously annotated as hypothetical, uncharacterized, unspecified product 186 and conserved proteins with unknown function under putative TF class according to 187 their DBDs [Table S2].

188
Interestingly, the analysis also resulted in the identification TFs of 9 families TUB, knowlesi. As far as our knowledge, this is a first ever attempt to classify TAFs into TRs,

Web interface and annotations in ApicoTFdb 229
The ApicoTFdb project is presented as a web-based and user-friendly interface for 230 Apicomplexans TFs and TAFs information retrieval, as shown in Figure 3. The

231
ApicoTFdb database project is organized according to the above described Only a limited number of databases provide information related to the predicted as well 245 as experimentally validated apicomplexan TFs and TAFs, e.g. DBD and CISPB.

246
Though the database is useful, we observed that the information available in CISBP 247     Table 1.