Abstract

Chemoreception is a biological process essential for the survival of animals, as it allows the recognition of important volatile cues for the detection of food, egg-laying substrates, mates, or predators, among other purposes. Furthermore, its role in pheromone detection may contribute to evolutionary processes, such as reproductive isolation and speciation. This key role in several vital biological processes makes chemoreception a particularly interesting system for studying the role of natural selection in molecular adaptation. Two major gene families are involved in the perireceptor events of the chemosensory system: the odorant-binding protein (OBP) and chemosensory protein (CSP) families. Here, we have conducted an exhaustive comparative genomic analysis of these gene families in 20 Arthropoda species. We show that the evolution of the OBP and CSP gene families is highly dynamic, with a high number of gains and losses of genes, pseudogenes, and independent origins of subfamilies. Taken together, our data clearly support the birth-and-death model for the evolution of these gene families with an overall high gene turnover rate. Moreover, we show that the genome organization of the two families is significantly more clustered than expected by chance and, more important, that this pattern appears to be actively maintained across the Drosophila phylogeny. Finally, we suggest the homologous nature of the OBP and CSP gene families, dating back their most recent common ancestor after the terrestrialization of Arthropoda (380--450 Ma) and we propose a scenario for the origin and diversification of these families.

Introduction

Chemoreception is a widely used mechanism across animal species for perception of the surrounding environment, from communication between conspecifics to detection of predators and location of food or hosts, playing a critical role in an organism’s fitness (Krieger and Ross 2002; Matsuo et al. 2007; Asahina et al. 2008; Whiteman and Pierce 2008; Smadja and Butlin 2009). Moreover, its role in reproduction may contribute to a number of evolutionary processes, such as reproductive isolation and speciation. Thus, understanding the evolution of genes involved in sensorial perception may provide valuable insight into the role of natural selection in molecular adaptation.

The first step in the recognition of chemical signals (peripheral events) is accomplished by binding and membrane receptor proteins that recognize external ligands and translate this interaction into an electrical signal to the central nervous system. In the Insecta, there are three different types of chemosensory receptors, the odorant (OR), the gustatory (GR), and the ionotropic (IR) receptors, which are located in the dendritic membrane of chemosensory neurons (Kaupp 2010). The dendrites of these neurons are positioned inside the sensilla, which are hair-like hollow structures that is filled with an aqueous fluid, the sensillar lymph. The chemical signals enter the sensilla lumen through the sensilla pores of the chitin wall, diffuse through the lymph, and activate the receptors (for a review, see Sanchez-Gracia et al. 2009). The sensillar lymph is secreted by nonneuronal support cells and contains a variety of proteins, including the odorant-binding proteins (OBPs) and chemosensory proteins (CSPs) (Vogt and Riddiford 1981; Steinbrecht 1998). These proteins are small (10 to 30 kDa), globular and highly abundant water-soluble proteins, characterized by a specific domain of six α-helices, joined by either two or three disulfide bonds (Leal et al. 1999; Tegoni et al. 2004). Although the full range of functions of these molecules has not been well established, there is increasing evidence of their importance in chemosensory perception (Pophof 2004; Xu et al. 2005; Grosse-Wilde et al. 2006; Matsuo et al. 2007). Most likely, OBP and CSP proteins are involved in the solubilization and transport of odorants, which are generally hydrophobic (Kaissling 2001; Leal et al. 2005). Recent studies, however, have revealed that OBP and CSP genes are not restricted to the olfactory tissues and may, in fact, participate in other physiological functions (Kaissling 2001; Graham et al. 2003; Pophof 2004; Findlay et al. 2008) (for a review, see Pelosi et al. 2006). Despite carrying out a similar physiological role, vertebrate OBPs are not homologous to their insect counterparts and actually differ in structure and size (Pelosi and Maida 1990). In fact, these genes belong to a large superfamily of carrier proteins, the lipocalins, that usually consist of a β-barrel structure and a carboxy-terminal α-helix (Flower 1996).

Comprehensive analysis of the complete genome sequences of Drosophila and a number of other insects (Anopheles gambiae, Bombyx mori, Triboliumcastaneum, and Apis mellifera) has revealed that the OBP and CSP gene repertoires differ markedly across species. In fact, the OBP family comprises from 21 (in A. mellifera) to 66 genes (in A. gambiae), whereas the CSP gene family ranges from 3 members (in Drosophila) to 20 (in T. castaneum) (Foret and Maleszka 2006; Foret et al. 2007; Gong et al. 2007, 2009; Vieira et al. 2007; Kirkness et al. 2010). Interestingly, these genes are unevenly distributed throughout the genome, with many of them (69% of the OBP genes in Drosophila) being arranged in small clusters (from 2 to 6 OBP genes) (Vieira et al. 2007). The Drosophila OBP gene family has been classified into several phylogenetic subfamilies on the basis of distinctive structural features, functional information, and phylogenetic relationships: the Classic, Minus-C, Plus-C, Dimer, PBP/GOBP, ABPI and ABPII, CRLBP, and D7 subfamilies (Hekmat-Scafe et al. 2002; Valenzuela et al. 2002; Vieira et al. 2007; Gong et al. 2009; Kirkness et al. 2010). Interestingly, these subfamilies are unequally distributed across arthropods, even among the dipterans, and they are totally absent in some species. In contrast, the CSP gene family is much more conserved across insects, without distinctive phylogenetic clades. It has been suggested that the OBP and CSP gene families may have shared a most recent common ancestor (MRCA) near the origin of the arthropods, though the evidence for this is controversial (Pelosi et al. 2005; Zhou et al. 2006).

In the present study, we used the complete genome sequence data from 20 Arthropoda species to conduct a fine and exhaustive comparative genomic analysis of the OBP and CSP gene families. In particular, we aimed to gain insights into the origin and evolutionary fate of OBP and CSP duplicates and to determine their role in the adaptive process. Our exhaustive analysis allowed us to identify new genes and several gene contractions and expansions in different lineages. Interestingly, we also identified two OBP genes that are present in almost all the analyzed species, indicating a putative critical role in chemosensation. Overall, our results are clearly consistent with the birth-and-death (BD) evolutionary model (Nei and Rooney 2005), with estimates for the birth (β) and death (δ) rates of β = 0.0049 and δ = 0.0010 for OBP and β = 0.0028 and δ = 0.0007 for CSP. We also found that the organization of the members of these gene families into clusters is not a by-product of their tandem origin but, instead, is actively maintained by natural selection. Finally, we point to the homologous nature of the OBP and CSP gene families, estimating their MRCA to have occurred 380–450 Ma, and we propose a scenario for the origin and diversification of these two families.

Materials and Methods

Genomic Data

Genome sequence data and gene annotations were downloaded from public data repositories: Drosophilidae (release FB2008_08) from FlyBase (Drysdale 2008), A. gambiae (release AgamP3.46) from Ensembl (Flicek et al. 2008), B. mori (release April 2008) from SilkDB (Wang et al. 2005), T. castaneum (release V3.0) from BeetleBase (Wang et al. 2007), A. mellifera (release 4.0) from National Center for Biotechnology Information (ftp://ftp.ncbi.nih.gov/genomes), Pediculus humanus (release PhumU1.1), and Ixodes scapularis (release IscaW1.1) from VectorBase (Lawson et al. 2007), Acyrthosiphon pisum (release June 2008) from AphidBase (http://www.aphidbase.com), and Daphnia pulex (release jgi060905) from wFleaBase (http://iubio.bio.indiana.edu/daphnia).

Gene Identification

We identified putative OBP and CSP members through several rounds of exhaustive searches using information from already known OBP and CSP proteins as queries (Foret and Maleszka 2006; Foret et al. 2007; Gong et al. 2007, 2009; Vieira et al. 2007; Flicek et al. 2008; Zhou et al. 2010). First, we searched the preliminary predicted gene set using BlastP (Altschul et al. 1997) (BLOSUM45 matrix with an e value threshold of 10−5), HMMER (http://hmmer.wustl.edu/) (e value domain threshold of 10−5), and HHsearch (Soding 2005) (e value threshold of 10−5). The HMMER and HHsearch searches used PFAM (Finn et al. 2006), PBP/GOBP (for OBP; PF01395), and OS-D (for CSP; PF03392) HMM profiles. Furthermore, because OBP family members are highly divergent, we also built four extra custom HMM profiles (used in all HMMER and HHsearch searches). We built these profiles after clustering all known OBP protein sequences (only D. melanogaster and D. mojavensis from the Drosophila genus) with BlastClust (ftp://ftp.ncbi.nih.gov/genomes) (e value threshold of 10−5, length coverage “-L” of 0.5 and score density “-S” of 0.6). We selected the four clusters with the highest numbers of sequences, aligned the clusters separately with MAFFT (Katoh et al. 2005) (E-INS-i with BLOSUM30 matrix, 10,000 maxiterate, and offset “0”) and, for each cluster, built an HMM profile using HMMER. Because HHsearch only makes comparisons between HMM profiles, it was necessary to transform the proteome of each species into a set of HMM profiles. For this, we clustered the proteomes for each species separately with BlastClust and built an HMM profile from each cluster separately. We followed a similar HMM profile-building approach as described above, with the exception of the BlastClust parameters (e value threshold of 10−6, length coverage -L of 0.7 and score density -S of 1.0). All profiles used by HHsearch included secondary structure information predicted with PSIPRED (McGuffin et al. 2000). Second, we searched the raw DNA sequence data using TBlastN (BLOSUM45 with e value threshold of 10−3), EXONERATE (Slater and Birney 2005) (50% of the maximum store threshold), and HMMER (e value domain threshold of 10−10). For the latter analysis, we searched against all 6-frames using PFAM’s and our four custom HMM profiles as queries. All searches were performed exhaustively until no new hit was found, adding always all newly identified members to the queries.

We manually checked all putative positive hits, specifically looking for the presence of a signal peptide (predicted by PrediSi [Hiller et al. 2004]), the characteristic “cysteine domain” (Pelosi et al. 2006; Vieira et al. 2007) and a secondary structure including six α-helices (predicted by PSIPRED [McGuffin et al. 2000]). We used the Artemis (Rutherford et al. 2000) genome annotator with the putative splice sites predicted by Genesplicer (Pertea et al. 2001) to assist with the annotation process.

Gene Clustering Analysis

We have tested, by computer simulations, whether OBP or CSP genes are actually physically closer in the chromosomes than expected by chance. This analysis was conducted separately for each species and gene family (either OBP or CSP), excluding species with poorly assembled genomes or families with less than ten members. Specifically, we computed for each genome a statistic based on the average physical distance (in base pairs) between neighboring genes (within a given chromosome). This observed value was contrasted against the null empirical distribution of this statistic generated by computer simulations (based on 10,000 replicates). In each replicate, we randomly chose a fixed number of genes (the same number than that observed OBP or CSP members in a particular genome) and calculated the statistic (table 2).

To try to gain insight into the biological meaning of such chromosome clusters, we analyzed whether the observed OBP clusters are more conserved across the phylogeny than expected by chance. The analysis was conducted using the MCMuSeC algorithm (Ling et al. 2009) that examines, using the “gene teams” model (Luc et al. 2003), the distribution pattern of gene clusters across the phylogeny. The method uses as statistic the branch length score (BLS) to measure the evolutionary time (the total lengths of the phylogenetic tree) where the gene cluster is conserved. Therefore, the longer the BLS value the more likely it will be under functional constraint. For such analysis, we used the cluster definition as in Vieira et al. (2007). The statistical significance of the test was obtained by comparing the observed BLS value (for each OBP cluster) against the null empirical distribution of the same cluster size generated by computer simulations (based on 1,000,000 replicates).

Phylogenetic Analysis

We performed a phylogenetic analysis including all complete OBP and CSP genes and partial coding sequences with more than 85 and 78 amino acids, respectively (the size of the smallest full coding sequences in each gene family). Because the signal peptide portion of OBPs has a high substitution rate, we removed these regions (identified using the PrediSi program [Hiller et al. 2004]) before conducting the analyses. The protein sequences were multiply aligned using MAFFT v6.624b (Katoh et al. 2005) (E-INS-i with BLOSUM30 matrix, 10,000 maxiterate and offset 0). We estimated the phylogenetic relationships by maximum likelihood using the software RAxML v7.2.3 (Stamatakis 2006), assuming the WAG evolutionary model (Whelan and Goldman 2001) and fixing the amino acid frequencies (-f d -e 0.0001 -d -N 30 -m PROTGAMMAWAG). The genetic distances (number of amino acid changes per site) were estimated using MEGA software (Tamura et al. 2007) with the pairwise deletion option and assuming the Jones, Taylor, and Thorton evolutionary model (Jones et al. 1992).

We inferred the OBP and CSP orthology groups using the OrthoMCL software (inflation of 1.5 and e value threshold of 10−5), which is based on reciprocal best hits within and between proteomes. These orthology relationships were used to estimate the OBP and CSP birth (β) and death (δ) rates (events per gene and per million years) by maximum likelihood (Librado P, Vieira FG, Rozas J, unpublished results) using the divergence times from Tamura et al. (2004) and Hedges et al. (2006). Briefly, for each orthology group, we inferred the number of genes in each internal node using those numbers in extant species and the phylogenetic branch lengths. This information allow us to further estimate the number of gene gain and loss events in each phylogenetic branch and the global birth and death rates following equations (1) and (2) in Vieira et al. (2007). The half-life for a gene to be lost from the genome (t½) was estimated assuming that the death rate follows an exponential decay curve. In particular, forumla

For all analyses, customized in-house scripts were written in Perl, with extra modules from BioPerl (Stajich et al. 2002); these scripts are available upon request.

Results

Identification and Characterization of OBP and CSP Genes

We performed an exhaustive and manually curated search that allowed us to identify the complete set of putative functional OBP and CSP genes across the 20 Arthropoda species analyzed (fig. 1 supplementary data 1 and 2), improving currently published data. In addition, we also found some scattered fragments that likely correspond to incomplete sequences and pseudogenes (table 1). Almost all the identified genes have the characteristic hallmarks of the OBP and CSP gene families: the signal peptide, the six α-helix pattern, and the highly conserved cysteine profile. However, despite the highly conserved secondary structure of OBP proteins, the OBP family members are highly divergent (average per-site amino acid divergence of d = 2.99; overall sequence identity of 16.71%), exhibiting a wide range of protein lengths (from 85 to 329 amino acids) and cysteine profiles. The CSP gene family shows lower divergence values (d = 1.51, with overall identity of 34.04%), with the four cysteine profiles (forming the two disulfide bridges) being completely conserved and exhibiting fairly constant gene lengths (60% of the mature proteins have lengths between 97 and 119 amino acids).

Table 1

Number of OBP and CSP Genes in the Arthropoda Species Analyzed

 OBP
 
CSP
 
 Putative Functional
 
Pseudogenes Putative Functional
 
Pseudogenes 
 Complete Sequence Low Coverage Sequencea Complete Sequence Low Coverage Sequencea 
Dmel 52 
Dsim 52 
Dsec 51 
Dyak 55 
Dere 50 
Dana 50 
Dpse 45 
Dper 45 
Dwil 62 
Dmoj 43 
Dvir 41 
Dgri 46 
Agam 81 (66) 2 (0) 8 (7) 
Bmor 43 (44) 3 (0) 1 (0) 19 (16) 3 (2) 2 (0) 
Tcas 49 (46) 1 (0) 19 (20) 1 (0) 
Amel 21 
Phum 
Apis 14 (11) 1 (0) 10 2 (3) 1 (0) 
Dpul 
Isca 
 OBP
 
CSP
 
 Putative Functional
 
Pseudogenes Putative Functional
 
Pseudogenes 
 Complete Sequence Low Coverage Sequencea Complete Sequence Low Coverage Sequencea 
Dmel 52 
Dsim 52 
Dsec 51 
Dyak 55 
Dere 50 
Dana 50 
Dpse 45 
Dper 45 
Dwil 62 
Dmoj 43 
Dvir 41 
Dgri 46 
Agam 81 (66) 2 (0) 8 (7) 
Bmor 43 (44) 3 (0) 1 (0) 19 (16) 3 (2) 2 (0) 
Tcas 49 (46) 1 (0) 19 (20) 1 (0) 
Amel 21 
Phum 
Apis 14 (11) 1 (0) 10 2 (3) 1 (0) 
Dpul 
Isca 

NOTE.—The four-letter code used for the species is: Drosophila melanogaster (Dmel), D. simulans (Dsim), D. sechellia (Dsec), D. erecta (Dere), D. yakuba (Dyak), D. ananassae (Dana), D. pseudoobscura (Dpse), D. persimilis (Dper), D. willistoni (Dwil), D. mojavensis (Dmoj), D. virilis (Dvir) and D. grimshawi (Dgri), Anopheles gambiae (Agam), Bombyx mori (Bmor), Tribolium castaneum (Tcas), Apis mellifera (Amel), Pediculus humanus (Phum), Acyrthosiphon pisum (Apis), Daphnia pulex (Dpul), and Ixodes scapularis (Isca). The numbers of the OBP and CSP genes reported in previous works are given in parenthesis (only in cases with discrepancies).

a

Genes with truncated coding sequences due to incomplete genome assembly.

FIG. 1.—

Accepted tree topology for the Arthropoda species surveyed. Blue shadowed boxes depict an aquatic environment. Divergence times are given in millions of years (Tamura et al. 2004; Hedges et al. 2006). Right: number of members of the OBP and CSP gene families classified into subfamilies and the presence of the OR and GR gene families.

FIG. 1.—

Accepted tree topology for the Arthropoda species surveyed. Blue shadowed boxes depict an aquatic environment. Divergence times are given in millions of years (Tamura et al. 2004; Hedges et al. 2006). Right: number of members of the OBP and CSP gene families classified into subfamilies and the presence of the OR and GR gene families.

In spite of the intensive analyses that have been previously conducted in D. melanogaster, our HMM-based searches allowed the identification of a new OBP gene (Obp73a). It is likely that the high divergence of this gene from the other OBP members prevented its previous identification by similarity based methods. Interestingly, this gene has a 1:1 orthology not only in the 12 Drosophila genomes but also in almost all insect species analyzed (except in Hymenoptera). In fact, there are only two OBP members with clear orthology relationships across insects: Obp73a and Obp59a (Zhou et al. 2010). This high conservation across a large number of arthropod species suggests a critical function for these proteins.

Chromosomal Organization

We studied the evolutionary meaning of the organization in chromosome clusters of the OBP and CSP genes. We have found that within species OBP and CSP genes are physically closer in the genome (significantly clustered) than expected by chance (P < 0.0064 and P < 0.0008, respectively) (table 2). In contrast, the OR and GR gene families of D. melanogaster, which have a similar number of genes to OBP, are more scattered across the genome (Robertson et al. 2003) and do not exhibit such clear structuring (P = 0.194 and P = 0.023 for OR and GR, respectively).

Table 2

P Values of the Chromosomal Clusters Analysis

 OBP
 
CSP
 
 Observed Distance Average Distance P Value Observed Distance Average Distance P Value 
Dmel 1322846.7 2026501.8 <0.0001 — — —a 
Dsim 1268476.5 2006452.9 <0.0001 — — —a 
Dsec 719946.2 1478286.5 <0.0001 — — —a 
Dyak 1229276.6 1993428.8 <0.0001 — — —a 
Dere 1271164.5 2057290.3 0.0004 — — —a 
Dana 1245187.1 1982158.1 0.0004 — — —a 
Dpse 1102439.9 2146580.2 <0.0001 — — —a 
Dper 331607.1 1467330.8 <0.0001 — — —a 
Dwil 306109.5 1680312.9 <0.0001 — — —a 
Dmoj 1528514.0 2917407.1 <0.0001 — — —a 
Dvir 774464.4 2483148.4 <0.0001 — — —a 
Dgri 1112257.6 2172345.2 <0.0001 — — —a 
Agam 2478428.4 3020631.7 0.0064 — — —a 
Bmor 124328.2 2220702.0 0.001 33582.7 1835740.8 0.0008 
Tcas 2267791.0 3736858.6 0.0052 832475.9 5529128.4 <0.0001 
Amel — — —b — — —a 
Phum — — —a — — —a 
Apis — — —b — — —b 
Dpul — — —a — — —a 
Isca — — —a — — —a 
 OBP
 
CSP
 
 Observed Distance Average Distance P Value Observed Distance Average Distance P Value 
Dmel 1322846.7 2026501.8 <0.0001 — — —a 
Dsim 1268476.5 2006452.9 <0.0001 — — —a 
Dsec 719946.2 1478286.5 <0.0001 — — —a 
Dyak 1229276.6 1993428.8 <0.0001 — — —a 
Dere 1271164.5 2057290.3 0.0004 — — —a 
Dana 1245187.1 1982158.1 0.0004 — — —a 
Dpse 1102439.9 2146580.2 <0.0001 — — —a 
Dper 331607.1 1467330.8 <0.0001 — — —a 
Dwil 306109.5 1680312.9 <0.0001 — — —a 
Dmoj 1528514.0 2917407.1 <0.0001 — — —a 
Dvir 774464.4 2483148.4 <0.0001 — — —a 
Dgri 1112257.6 2172345.2 <0.0001 — — —a 
Agam 2478428.4 3020631.7 0.0064 — — —a 
Bmor 124328.2 2220702.0 0.001 33582.7 1835740.8 0.0008 
Tcas 2267791.0 3736858.6 0.0052 832475.9 5529128.4 <0.0001 
Amel — — —b — — —a 
Phum — — —a — — —a 
Apis — — —b — — —b 
Dpul — — —a — — —a 
Isca — — —a — — —a 

NOTE.—P values calculated by computer simulations. The species four-letter code is as in table 1.

a

Species not analyzed for having less than ten gene members.

b

Species not analyzed for having a fragmented genome (probably due to poor coverage or assembling).

These chromosomes clusters, however, could be just a consequence of the origin of genes by tandem gene duplication rather than having some functional significance. To gain insight into the functional meaning of this clustering, we have analyzed whether these clusters have been maintained throughout evolution despite of the breaks produced by inevitable chromosomal rearrangements. This analysis was conducted using only OBP data from the 12 Drosophila species because there are few orthologous clusters among species sharing large divergence times. Our results show that OBP genes are significantly clustered across the Drosophila evolution (P < 0.033), suggesting the existence of some functional constraints maintaining the clusters (Quijano et al. 2008).

Phylogenetic Analysis

Our phylogenetic analysis shows that the evolution of OBP and CSP gene families is highly dynamic, though to a lesser degree in the CSP gene family, exhibiting a number of taxa-specific subfamilies, several branch-specific expansions, and almost no groups of orthologous genes shared across Arthropoda (figs. 2–4).

FIG. 2.—

OBP orthologous groups shared across species. Venn diagrams indicate the inferred number of groups of orthologous genes (OG) shared among different insect species. (A) Drosophila, (B) Diptera, (C) Diptera and Lepidoptera, (D) Diptera, Lepidoptera, and Coleoptera, (E) Endopterygota, and (F) Hexapoda.

FIG. 2.—

OBP orthologous groups shared across species. Venn diagrams indicate the inferred number of groups of orthologous genes (OG) shared among different insect species. (A) Drosophila, (B) Diptera, (C) Diptera and Lepidoptera, (D) Diptera, Lepidoptera, and Coleoptera, (E) Endopterygota, and (F) Hexapoda.

FIG. 3.—

Phylogenetic relationships of the OBP proteins. Unrooted phylogenetic tree of OBP protein sequences from Drosophila melanogaster and D. mojavensis (red branches), Anopheles gambiae (blue branches), Bombyx mori (brown branches), Tribolium castaneum (green branches), Apis mellifera (orange branches), Pediculus humanus (pink branches), and Acyrthosyphon pisum (cyan branches). Inner and outer rings indicate phylogenetic subfamilies (Classic in black, Minus-C in green, Plus-C in blue, Dimer in red, D7 in yellow, ABPI in cyan, ABPII in gray, and PBP/GOBP in pink) and the secondary structure information (box: α-helix; arrow: β-sheet), respectively. The scale bar represents 1 amino acid substitution per site. The image was created using the iTOL web server (Letunic and Bork 2007).

FIG. 3.—

Phylogenetic relationships of the OBP proteins. Unrooted phylogenetic tree of OBP protein sequences from Drosophila melanogaster and D. mojavensis (red branches), Anopheles gambiae (blue branches), Bombyx mori (brown branches), Tribolium castaneum (green branches), Apis mellifera (orange branches), Pediculus humanus (pink branches), and Acyrthosyphon pisum (cyan branches). Inner and outer rings indicate phylogenetic subfamilies (Classic in black, Minus-C in green, Plus-C in blue, Dimer in red, D7 in yellow, ABPI in cyan, ABPII in gray, and PBP/GOBP in pink) and the secondary structure information (box: α-helix; arrow: β-sheet), respectively. The scale bar represents 1 amino acid substitution per site. The image was created using the iTOL web server (Letunic and Bork 2007).

FIG. 4.—

Phylogenetic relationships of the CSP proteins. Unrooted phylogenetic tree of CSP protein sequences from Drosophila melanogaster and D. mojavensis (red branches), Anopheles gambiae (blue branches), Bombyx mori (brown branches), Tribolium castaneum (green branches), Apis mellifera (orange branches), Pediculus humanus (pink branches), Acyrthosyphon pisum (cyan branches), and Daphnia pulex (black lines). Outer ring indicates the secondary structure information (box: α-helix; arrow: β-sheet). The scale bar represents 1 amino acid substitution per site. The tree was displayed using the iTOL web server (Letunic and Bork 2007).

FIG. 4.—

Phylogenetic relationships of the CSP proteins. Unrooted phylogenetic tree of CSP protein sequences from Drosophila melanogaster and D. mojavensis (red branches), Anopheles gambiae (blue branches), Bombyx mori (brown branches), Tribolium castaneum (green branches), Apis mellifera (orange branches), Pediculus humanus (pink branches), Acyrthosyphon pisum (cyan branches), and Daphnia pulex (black lines). Outer ring indicates the secondary structure information (box: α-helix; arrow: β-sheet). The scale bar represents 1 amino acid substitution per site. The tree was displayed using the iTOL web server (Letunic and Bork 2007).

The Drosophila OBP gene family has been classified into several groups on the basis of distinctive structural features, functional information, and phylogenetic relationships: the Classic, Minus-C, Plus-C, Dimer, PBP/GOBP, ABPI and ABPII (formerly known as ABPX), CRLBP, and D7 subfamilies (Hekmat-Scafe et al. 2002; Valenzuela et al. 2002; Vieira et al. 2007; Gong et al. 2009). The atypical subfamily, which has so far been identified only in mosquitoes (Xu et al. 2003; Zhou et al. 2008), is in fact a Dimer OBP clade (supplementary fig. 1, Supplementary Material online). These proteins have a double domain profile that most likely originated from a fusion of two Classic OBP genes. Our results show that the basal OBP group seems to be the Classic, whereas all other groups are internal clades of the Classic subfamily which is, in fact, paraphyletic (fig. 3). The Plus-C subfamily, present in all Hexapoda species, has been lost in the Hymenoptera. Interestingly, some subgroups of the Classic subfamily, such as Dimer, Minus-C, and CRLBP, appear to have had independent origins. The Dimer OBP originated independently in the Culicidae and Drosophilidae lineages, the Minus-C appeared in the Drosophilidae, Bombyx/Tribolium, and Apis lineages, whereas the CRLBP members are highly scattered across the tree and appear to lack any phylogenetic meaning (throughout this study we have included CRLBPs within the Classic subfamily). Furthermore, we also identified in A. gambiae a putative new OBP member (AgamOBP78) of the D7 subfamily, a widespread subfamily in blood-sucking Diptera (Valenzuela et al. 2002).

The CSP gene family consistently has fewer members than the OBP family, exhibiting only two lineage-specific expansions (in B. mori and T. castaneum; fig. 4). The genes in this family also exhibit lower genetic distances, although its members are present across all Arthropoda species, including Crustacea (D. pulex) and Chelicerata (I. scapularis). Overall, the CSP gene family has an evolutionary pattern that is less dynamic than the OBP family, with fewer and more conserved members that are not grouped into distinctive phylogenetic clades.

We observed that the number of groups of orthologous genes that are shared among different species quickly decreases with increasing divergence time (fig. 2). For example, the number of groups of orthologous genes ranges from 34 OBP and 3 CSP within the genus Drosophila to 2 OBP and 2 CSP across Hexapoda, and no OBP nor CSP groups shared across all the Arthropoda. Noticeably, only two OBP genes have orthologs across all insects except in Hymenoptera: Obp59a and Obp73a.

Despite the high divergence that is seen among paralogs, some genes have unexpected features that may indicate important functions or, alternatively, that may be the result of misannotation. For instance, the Obp59a gene has an unusually long sequence and a unique cysteine pattern. BmorOBP41, a Plus-C subfamily member, has a pattern of cysteine residues that is unusual for this family (fig. 3). Furthermore, we also identified three CSP genes (TcasCSP6, ApisCSP1, and ApisCSP9) with a markedly different secondary structure (fig. 4).

Common Origin of OBP and CSP

The common origin of the OBP and CSP gene families is a controversial issue (Pelosi et al. 2005; Zhou et al. 2006). To attempt to detect a putative remote homology between the OBP and CSP gene families, we performed a series of similarity searches using different approaches. With a standard BlastP-based search (e value threshold < 1), we did not detect any significant similarity. Using more powerful approaches, like HMM-based analyses (HMMER software), together with PFAM (Finn et al. 2006) and our four specific custom profiles (see Materials and Methods) allowed us to detect some slight indications of sequence similarity between the PFAM CSP profile (OS-D: PF03392) and the OBP TcasOBP16 (e value of 0.0049), but the analysis also detected some false positives (data not shown). Because the degree of functional constraint on the tertiary structure of proteins is probably higher than their primary structure, we studied the similarity among OBP and CSP protein structures to gain insight into their putative remote homology. For that, we generated rigid structural alignments using FATCAT (Ye and Godzik 2004) between all OBP and CSP proteins present in the RCSB Protein Data Bank (www.pdb.org) (Berman et al. 2000). We found that the majority of OBP-CSP structure alignments are statistically significant (P = 0.0089 for the lowest P value) (table 3; fig. 5). Moreover, using OBP and CSP protein sequences as a query in additional BlastP searches against all PDB sequences, we detect no proteins (other than OBP and CSP) with significant structural similarity (on the top scoring 10 hits).

Table 3

Structural Alignments between OBP and CSP Proteins

 OBP
 
CSP
 
 BmorGOBP2 ApolPBP1 AgamD7r4 BmorPBP AgamOBP1 AmelASP2 BmorPBP Lush BmorCSP1 SgreCSP4 MbraCSP2 MbraCSPa6 
 2WC5 2JPO 2QEV 2P70 2ERB 1TUJ 1GM0 1T14 2JNT 2GVS 1K19 1KX9 
2WC5  1.66 × 10−07 1.39 × 10−05 2.18 × 10−14 5.05 × 10−08 3.39 × 10−06 8.51 × 10−06 1.21 × 10−07 NS NS NS 2.98 × 10−02 
2JPO   5.41 × 10−04 1.29 × 10−08 2.12 × 10−05 6.04 × 10−05 1.90 × 10−14 1.76 × 10−05 NS NS 4.80 × 10−02 1.70 × 10−02 
2QEV    1.89 × 10−05 9.01 × 10−05 5.15 × 10−05 3.85 × 10−04 4.67 × 10−06 4.04 × 10−02 3.30 × 10−02 3.81 × 10−02 6.28 × 10−03 
2P70     3.90 × 10−08 1.21 × 10−07 3.26 × 10−08 2.52 × 10−08 2.72 × 10−02 NS 4.40 × 10−02 1.81 × 10−02 
2ERB      3.43 × 10−06 1.12 × 10−04 2.59 × 10−10 4.39 × 10−02 4.35 × 10−02 3.31 × 10−02 1.51 × 10−02 
1TUJ       5.66 × 10−05 1.62 × 10−06 1.61 × 10−02 NS 4.23 × 10−02 2.84 × 10−02 
1GM0        2.14 × 10−04 NS NS NS NS 
1T14         3.15 × 10−02 1.16 × 10−02 3.21 × 10−02 8.88 × 10−03 
2JNT          6.01 × 10−06 6.80 × 10−05 5.45 × 10−09 
2GVS           6.84 × 10−06 1.30 × 10−10 
1K19            1.70 × 10−07 
 OBP
 
CSP
 
 BmorGOBP2 ApolPBP1 AgamD7r4 BmorPBP AgamOBP1 AmelASP2 BmorPBP Lush BmorCSP1 SgreCSP4 MbraCSP2 MbraCSPa6 
 2WC5 2JPO 2QEV 2P70 2ERB 1TUJ 1GM0 1T14 2JNT 2GVS 1K19 1KX9 
2WC5  1.66 × 10−07 1.39 × 10−05 2.18 × 10−14 5.05 × 10−08 3.39 × 10−06 8.51 × 10−06 1.21 × 10−07 NS NS NS 2.98 × 10−02 
2JPO   5.41 × 10−04 1.29 × 10−08 2.12 × 10−05 6.04 × 10−05 1.90 × 10−14 1.76 × 10−05 NS NS 4.80 × 10−02 1.70 × 10−02 
2QEV    1.89 × 10−05 9.01 × 10−05 5.15 × 10−05 3.85 × 10−04 4.67 × 10−06 4.04 × 10−02 3.30 × 10−02 3.81 × 10−02 6.28 × 10−03 
2P70     3.90 × 10−08 1.21 × 10−07 3.26 × 10−08 2.52 × 10−08 2.72 × 10−02 NS 4.40 × 10−02 1.81 × 10−02 
2ERB      3.43 × 10−06 1.12 × 10−04 2.59 × 10−10 4.39 × 10−02 4.35 × 10−02 3.31 × 10−02 1.51 × 10−02 
1TUJ       5.66 × 10−05 1.62 × 10−06 1.61 × 10−02 NS 4.23 × 10−02 2.84 × 10−02 
1GM0        2.14 × 10−04 NS NS NS NS 
1T14         3.15 × 10−02 1.16 × 10−02 3.21 × 10−02 8.88 × 10−03 
2JNT          6.01 × 10−06 6.80 × 10−05 5.45 × 10−09 
2GVS           6.84 × 10−06 1.30 × 10−10 
1K19            1.70 × 10−07 

NOTE.—P values obtained using FATCAT; statistical significant values between OBP and CSP structures are depicted in bold; NS, non significant.

FIG. 5.—

Tertiary structure alignments. Representation of the significant alignments between OBP and CSP structures. PDB protein structures are represented as nodes in yellow (OBP) and green (CSP). Significant alignments are depicted as edges between nodes; edge thickness and color range (ranging from gray, blue to red) indicate increasing significance levels.

FIG. 5.—

Tertiary structure alignments. Representation of the significant alignments between OBP and CSP structures. PDB protein structures are represented as nodes in yellow (OBP) and green (CSP). Significant alignments are depicted as edges between nodes; edge thickness and color range (ranging from gray, blue to red) indicate increasing significance levels.

Birth-and-Death Evolution

Overall, our phylogenetic analyses showed that the OBP and CSP families fit well with a BD evolutionary model (figs. 3, 4, and 6) based on the following results: 1) phylogenetic trees based on orthologous genes fit well with the accepted species phylogeny; 2) there is no evidence of gene conversion between paralogous genes (data from Drosophila); 3) paralogous genes have higher divergence times compared with orthologs; 4) several gene gain and loss events can be identified in numerous phylogeny lineages; 5) several nonfunctional members (pseudogenes) were found (mainly in the terminal branches); 6) many orthology groups can be seen among closely related species, and this number gradually decreases with increasing divergence times; and 7) there is an uneven phylogenetic subfamily distribution across species. Hence, OBP and CSP genes appear to have evolved independently from the time of their origin by gene duplication until their loss by deletion or transiently as pseudogenes.

FIG. 6.—

OBP and CSP gene gains and losses. The inferred numbers of genes at each phylogenetic node are depicted in red. Values above and below the branches indicate the number of gene gains and losses, respectively. Subfamily gains (▴) and losses (×) are color-coded (Classic in black, Minus-C in green, Plus-C in blue, Dimer in red, D7 in yellow, ABPI in cyan, ABPII in gray, and PBP/GOBP in pink).

FIG. 6.—

OBP and CSP gene gains and losses. The inferred numbers of genes at each phylogenetic node are depicted in red. Values above and below the branches indicate the number of gene gains and losses, respectively. Subfamily gains (▴) and losses (×) are color-coded (Classic in black, Minus-C in green, Plus-C in blue, Dimer in red, D7 in yellow, ABPI in cyan, ABPII in gray, and PBP/GOBP in pink).

To gain insight into the specific BD dynamics of these families, it is important to quantify the magnitude of this process. Previous reports have addressed this issue using automatic annotations, surveying a set of too closely related species or applying less accurated statistical models (Hahn et al. 2005, 2007; Demuth et al. 2006; Guo and Kim 2007; Vieira et al. 2007). Here, we have estimated BD rates using a manually curated data set covering several species across the Arthropoda phylum and using more accurate gene turnover models, which allowed us to separately estimate birth (β) and death (δ) rates. Our BD estimates for the OBP gene family are β = 0.0049 and δ = 0.0010, whereas for the CSP family, they are β = 0.0028 and δ = 0.0007 (fig. 6).

Discussion

OBP and CSP Gene Family Evolution

The OBP and CSP gene families exhibit a highly dynamic evolutionary history. For instance, the number of members of these families is quite variable across Arthropoda species (OBP ranges from 0 to 83 genes and CSP from 1 to 22 in table 1), and its members are highly diverse, with divergent proteins exhibiting a wide range of gene lengths and encoding different cysteine profiles. As a result and despite the exhaustive studies that have been performed in recent years, we have still been able to identify a new OBP member (Obp73a) in the 12 Drosophila species which, in addition, is conserved across Arthropoda (except in Hymenoptera). Interestingly, there are only two genes with a clear 1:1 orthology relationship across insects: Obp73a and Obp59a. This conservation pattern is highly suggestive, reminiscent of the Or83b gene, an essential and highly conserved OR member present in all sequenced Hexapoda species (Larsson et al. 2004).

The OBP and CSP genes in Drosophila, A. gambiae, Aedes aegypti, B.mori, and T. castaneum are frequently organized in clusters (Zhou et al. 2006, 2008; Foret et al. 2007; Gong et al. 2007, 2009). However, no stringent statistical analysis has been conducted to determine their evolutionary significance. We have found that the members of these families are actually significantly clustered across the genome and, moreover, that the OBP cluster distribution has been maintained across the Drosophila evolution. This conservation across ∼400 My of evolution (the total branch lengths) suggests the action of natural selection in preventing cluster brake up. Indeed, this conservation could be explained by the existence of shared regulatory elements among members (Boutanaev et al. 2002; Gong et al. 2007; Matsuo et al. 2007; Quijano et al. 2008). Since chromosomal rearrangement breakpoints are unevenly distributed across the genome, the current clustering of OBP genes might also reflect the existence of the so-called fragile regions, regions with a propensity to breakage (Pevzner and Tesler 2003; von Grotthuss et al. 2010). This feature, nevertheless, would not provide the best explanation since our null empirical distribution already reflects the actual spatial distribution of genes in the genomes. The OBP clusters, therefore, likely have a functional meaning.

Our phylogenetic analysis uncovered a highly dynamic mode of OBP and CSP gene family evolution, although to a lesser extent for the CSP family. Both families exhibit lineage-specific expansions and a high number of orthology groups at short evolutionary times that gradually disappear with increasing divergence (fig. 2). Our results also indicate that the Dimer and Minus-C OBP subfamilies are polyphyletic and, therefore, have no phylogenetic significance. The striking fact that a similar cysteine pattern arose independently several times during the evolution of these genes is intriguing and suggests that these conformations may be advantageous. Because OBP genes form dimers in vitro (Andronopoulou et al. 2006), the Dimer OBP gene structure might be functionally equivalent to two single-domain OBP genes. In the case of Minus-C, the loss of one disulfide bridge might also have functional relevance, as it could generate a more flexible structure (like CSPs) (Angeli et al. 1999; Leal et al. 1999; Scaloni et al. 1999).

Overall, our results clearly support the BD model of evolution for these two gene families. Hence, the model of evolution described for the OBP family of Drosophila also holds for the evolution of OBP and CSP families and for both short and long period of times (across Arthropoda). The BD model, therefore, is neither incidental nor specific to the Drosophila genus but rather it is a more general model of evolution. Interestingly, the estimated birth rates of both families are higher than that estimated for the whole Drosophila genome (λ = 0.0012) (Hahn et al. 2007), reflecting a highly dynamic evolution. Indeed, the half-life estimates of a given gene are t1/2 = 693 My and t1/2 = 990 My for the OBP and CSP genes, respectively. Nevertheless, and in spite of using complete genome data, our current estimates should be viewed with caution. The species we surveyed belong to a phylogenetic tree with some large branches (e.g., branches leading to T. castaneum or B. mori) that can lead to inaccurate estimates. In the future, these estimates can be further improved by using genome information from species that are more homogeneously distributed across the tree.

Current rates of birth and death suggest a very high gene turnover rate, placing gene gain and loss events as one of the most important processes in the evolution of these gene families. These high rates can have a significant adaptive value, due to the function of these families in the contact with the exterior environment. During adaptation to a changing environment, newly arisen genes can play an important role as raw material for the action of natural selection. The actual OBP and CSP family sizes would result from a balance between the effect of the BD process (or random genomic drift [Nei 2007]), the maintenance of a core number of genes required for basal chemosensory performance and the requirement of newly arisen genes which diverged into species-specific activities.

Origin and Evolutionary History of the Chemosensory System

The putatively remote homology between OBP and CSP proteins suggest that these gene families belong to a larger superfamily of general binding proteins. The OBP and CSP gene families, together with the two major chemosensory receptor families (OR and GR), show a suggestive parallel distribution across Arthropoda. OBP and OR genes are found only in Hexapoda, whereas CSP and GR genes have been identified in all major Arthropoda groups: Hexapoda, Crustacea, Myriapoda (just CSP), and Chelicerata (Pelosi et al. 2006; Wanner et al. 2007; Wanner and Robertson 2008; Penalva-Arana et al. 2009; Sanchez-Gracia et al. 2009; Smadja et al. 2009). This suggests that the OBP and OR gene families originated after the Hexapoda–Crustacea split (∼470 Ma), whereas the CSP and GR families were already present in the MRCA of these two groups and Chelicerata (∼700 Ma) (Hedges et al. 2006). Because the earliest fossil evidence of terrestrial animal activity that that has been found comes from the Ordovician (∼450 Ma [Labandeira 2005]), the common ancestor of these three groups is expected to be aquatic. This scenario agrees with other studies proposing the independent terrestrialization of Hexapoda, Chelicerata, and Myriapoda lineages (380–420 Ma; fig. 1) (Ward et al. 2006).

According to our results, the aquatic ancestor of the extant major Arthropoda groups would have had chemoreceptors tuned to the perception of soluble components (proto-GR) and also a generic gene family of binding proteins (proto-CSP) with diverse physiological roles. The colonization of the hostile terrestrial environment by Hexapoda, Chelicerata, and Myriapoda (but not Crustacea) led to diverse adaptations. For example, Arthropoda species overcame the challenges of water supply and desiccation by the development of an impermeable cuticle. Because the neurons must be connected with the exterior, they developed a porous sensillar cuticular wall and, to avoid desiccation, they also developed an aqueous lumen around their chemosensory neurons. The new aerial environment also changed the perceived chemical signals from essentially hydrophilic (in aqueous solution) to mainly hydrophobic (in gaseous phase) molecules (Freitag et al. 1998). Hence, two major problems emerged with terrestrialization: 1) the new aqueous lumen prevented the access of hydrophobic molecules to chemoreceptors and 2) likely the chemoreceptors were unable to perform a fine detection of these new molecules. The origin of new specialized protein families to mediate the transport and detection of these new hydrophobic odorants solved these problems. Generalist binding proteins might have evolved and further specialized to bind odorants and pheromones and, in parallel, the ancestral aquatic-specific receptors evolved into a new class of receptors specialized for sensing airborne compounds (olfactory receptors). Because the split of the four major Arthropoda groups occurred before their terrestrial colonization, the evolutionary novelty representing the origin of the odorant-binding molecules, and olfactory receptors must have occurred independently in the Hexapoda, Myriapoda, and Chelicerata lineages. These independent origins imply that these molecules might have evolved from different ancestral gene families: whereas in Hexapoda, a proto-CSP gene family would have given rise to the OBP genes, in the other two groups might have derived from different (and still unknown) ancestral proteins. A similar scenario would have occurred with the olfactory receptors, which likely evolved from the GR family in the Hexapoda (Robertson et al. 2003) and from other protein families in the two other taxa. This hypothesis would explain the presence of GR but absence of OBP and OR (even pseudogenes) in the Daphnia (Penalva-Arana et al. 2009) and Ixodes (Vieira FG, Rozas J, unpublished results) genomes. Nevertheless, the reasons might be different: whereas in Ixodes, olfactory genes probably evolved from different ancestral families, Crustacea remained largely aquatic with no need for airborne detection.

This scenario is further supported by a number of convergent evolution cases affecting the olfactory system. The IR, a new and structurally divergent chemoreceptor gene family, has recently been discovered in Drosophila (Benton et al. 2009; Brigaud et al. 2009; Croset et al. 2010). The robber crab (Birgus latro) is an attractive example of the changes that have occurred during the adaptive process to the terrestrial environment. This land-living crustacean has developed a complex olfactory sense with organs very similar to the insect sensilla (Stensmyr et al. 2005; Krieger et al. 2010). Another example occurs in the vertebrate olfactory system. In spite of having equivalent physiological functions, vertebrates exhibit phylogenetically unrelated chemoreceptor and odorant-binding molecules. Vertebrate receptors belong to the GPCR family, whereas OBP genes belong to a large superfamily of carrier proteins, the lipocalins (Flower 1996; Pelosi et al. 2006; Nei et al. 2008). Curiously, GPCR and lipocalins are also present in Hexapoda, though with different biochemical functions. In Drosophila, GPCRs function as neurotransmitters and hormone receptors or in axon guidance during embryonic nervous system development (Brody and Cravchik 2000; Sanchez et al. 2000); lipocalins function as salivary anticlotting proteins in Rodnius prolixus (Montfort et al. 2000), whereas the anticlotting proteins of blood-sucking Diptera belong to the D7 OBP subfamily (Valenzuela et al. 2002).

Taking all data together, we can hypothesize a scenario for the evolution of the chemosensory system (fig. 7). We can assume the existence of some general molecule-binding and receptor genes before the Vertebrata–Arthropoda split (∼900 My [Hedges et al. 2006]), such as proto-lipocalins and proto-OBP/CSP or proto-GPCR and proto-GR genes, among others (“A” in fig. 7). After the split, the two taxa developed functionally equivalent gustatory receptor proteins tuned for soluble chemicals: the GR in Arthropoda (“B” in fig. 7) and gustatory-GPCR in Vertebrata (“C” in fig. 7). These two lineages later terrestrialized (380–420 Ma [Arthropoda] and ∼340 Ma [Vertebrata]) (Ward et al. 2006) and the new selective pressures led to the independent functional diversification of existing gene families to mediate the transport and detection of volatile molecules. In Crustacea, most lineages remained aquatic with no need for such evolutionary innovations (“D” in fig. 7). The new odorant binding and transport activities were taken over by olfactory lipocalins in vertebrates, OBP/CSP in Hexapoda, and likely by some (but unknown) binding protein family in Chelicerata (“E” in fig. 7). A parallel scenario could have occurred during chemoreceptor evolution (“F” in fig. 7): the GR would have evolved into the Hexapoda OR (as proposed by Robertson et al. [2003]; Penalva-Arana et al. [2009]), gustatory-GPCR into vertebrate olfactory-GPCR receptors, and some unknown receptor gene family into the Chelicerata olfactory chemoreceptors. Interestingly and further supporting this idea, mammals have experienced the reverse adaptive changes during the transition from a terrestrial to a fully aquatic habitat (Hayden et al. 2010) with large-scale pseudogenizations resulting in major reductions (in some cases total) of the OR repertoire (“G” in fig. 7) (McGowen et al. 2008). The diversification of olfactory-binding and receptor gene families in Arthropoda and Vertebrata seems to have occurred at roughly the same time, after the terrestrialization of each taxon. The nearly contemporary but independent origin of basic molecular elements of the olfactory system suggests that the involved gene families may have coevolved (OBP with OR; olfactory GPCR with lipocalins). In this sense, it is highly suggestive the similar distribution pattern of selective constraints (Sanchez-Gracia et al. 2009) and birth-and-loss rates (Sanchez-Gracia et al. 2011) between Hexapoda OBP and OR genes (but not between OBP and GR genes).

FIG. 7.—

Putative scenario for the evolution of the chemosensory system. Shaded in blue boxes represent the aquatic lifestyle. Right: presence or absence of the chemosensory gene families in extant species. Branch lengths are not to scale. Letters from A to F stand for the different evolutionary events (see text).

FIG. 7.—

Putative scenario for the evolution of the chemosensory system. Shaded in blue boxes represent the aquatic lifestyle. Right: presence or absence of the chemosensory gene families in extant species. Branch lengths are not to scale. Letters from A to F stand for the different evolutionary events (see text).

Supplementary Material

Supplementary figure S1 ,data 1 and 2 are available at Genome Biology and Evolution online (http://gbe.oxfordjournals.org/).

We would like to thank P. Librado for allowing us the use of an early version of the BadiRate program and for all his help, support, and valuable discussions. We are also grateful to A. Sánchez-Gracia and J. M. Ranz for their constructive input and comments on the manuscript. This work was supported by the Ministerio de Ciencia y Innovación (Spain) (grants BFU2007-62927 and BFU2010-15484) and the Comissió Interdepartamental de Recerca i Innovació Tecnològica (Spain) (grant number 2009SGR-1287). F.G.V. was supported by the predoctoral fellowship from the “Fundação para a Ciência e a Tecnologia” (Portugal) (SFRH/BD/22360/2005).

References

Altschul
SF
, et al.  . 
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
 , 
1997
, vol. 
25
 (pg. 
3389
-
3402
)
Andronopoulou
E
, et al.  . 
Specific interactions among odorant-binding proteins of the African malaria vector Anopheles gambiae
Insect Mol Biol.
 , 
2006
, vol. 
15
 (pg. 
797
-
811
)
Angeli
S
, et al.  . 
Purification, structural characterization, cloning and immunocytochemical localization of chemoreception proteins from Schistocerca gregaria
Eur J Biochem.
 , 
1999
, vol. 
262
 (pg. 
745
-
754
)
Asahina
K
Pavlenkovich
V
Vosshall
LB
The survival advantage of olfaction in a competitive environment
Curr Biol.
 , 
2008
, vol. 
18
 (pg. 
1153
-
1155
)
Benton
R
Vannice
KS
Gomez-Diaz
C
Vosshall
LB
Variant ionotropic glutamate receptors as chemosensory receptors in Drosophila
Cell
 , 
2009
, vol. 
136
 (pg. 
149
-
162
)
Berman
HM
, et al.  . 
The Protein Data Bank
Nucleic Acids Res.
 , 
2000
, vol. 
28
 (pg. 
235
-
242
)
Boutanaev
AM
Kalmykova
AI
Shevelyov
YY
Nurminsky
DI
Large clusters of co-expressed genes in the Drosophila genome
Nature
 , 
2002
, vol. 
420
 (pg. 
666
-
669
)
Brigaud
I
Montagne
N
Monsempes
C
Francois
MC
Jacquin-Joly
E
Identification of an atypical insect olfactory receptor subtype highly conserved within noctuids
FEBS J.
 , 
2009
, vol. 
276
 (pg. 
6537
-
6547
)
Brody
T
Cravchik
A
Drosophila melanogaster G protein-coupled receptors
J Cell Biol.
 , 
2000
, vol. 
150
 (pg. 
F83
-
F88
)
Croset
V
, et al.  . 
Ancient protostome origin of chemosensory ionotropic glutamate receptors and the evolution of insect taste and olfaction
PLoS Genet.
 , 
2010
, vol. 
6
 pg. 
e1001064
 
Demuth
JP
De Bie
T
Stajich
JE
Cristianini
N
Hahn
MW
The evolution of mammalian gene families
PLoS One
 , 
2006
, vol. 
1
 pg. 
e85
 
Drysdale
R
FlyBase: a database for the Drosophila research community
Methods Mol Biol.
 , 
2008
, vol. 
420
 (pg. 
45
-
59
)
Findlay
GD
Yi
X
Maccoss
MJ
Swanson
WJ
Proteomics reveals novel Drosophila seminal fluid proteins transferred at mating
PLoS Biol.
 , 
2008
, vol. 
6
 pg. 
e178
 
Finn
RD
, et al.  . 
Pfam: clans, web tools and services
Nucleic Acids Res.
 , 
2006
, vol. 
34
 (pg. 
D247
-
D251
)
Flicek
P
, et al.  . 
Ensembl 2008
Nucleic Acids Res.
 , 
2008
, vol. 
36
 (pg. 
D707
-
D714
)
Flower
DR
The lipocalin protein family: structure and function
Biochem J.
 , 
1996
, vol. 
318
 
Pt 1
(pg. 
1
-
14
)
Foret
S
Maleszka
R
Function and evolution of a gene family encoding odorant binding-like proteins in a social insect, the honey bee (Apis mellifera)
Genome Res.
 , 
2006
, vol. 
16
 (pg. 
1404
-
1413
)
Foret
S
Wanner
KW
Maleszka
R
Chemosensory proteins in the honey bee: insights from the annotated genome, comparative analyses and expressional profiling
Insect Biochem Mol Biol.
 , 
2007
, vol. 
37
 (pg. 
19
-
28
)
Freitag
J
Ludwig
G
Andreini
I
Rossler
P
Breer
H
Olfactory receptors in aquatic and terrestrial vertebrates
J Comp Physiol [A].
 , 
1998
, vol. 
183
 (pg. 
635
-
650
)
Gong
DP
Zhang
HJ
Zhao
P
Xia
QY
Xiang
ZH
The odorant binding protein gene family from the genome of silkworm, Bombyx mori
BMC Genomics
 , 
2009
, vol. 
10
 pg. 
332
 
Gong
DP
, et al.  . 
Identification and expression pattern of the chemosensory protein gene family in the silkworm, Bombyx mori
Insect Biochem Mol Biol.
 , 
2007
, vol. 
37
 (pg. 
266
-
277
)
Graham
LA
Brewer
D
Lajoie
G
Davies
PL
Characterization of a subfamily of beetle odorant-binding proteins found in hemolymph
Mol Cell Proteomics
 , 
2003
, vol. 
2
 (pg. 
541
-
549
)
Grosse-Wilde
E
Svatos
A
Krieger
J
A pheromone-binding protein mediates the bombykol-induced activation of a pheromone receptor in vitro
Chem Senses.
 , 
2006
, vol. 
31
 (pg. 
547
-
555
)
Guo
S
Kim
J
Molecular evolution of Drosophila odorant receptor genes
Mol Biol Evol.
 , 
2007
, vol. 
24
 (pg. 
1198
-
1207
)
Hahn
MW
De Bie
T
Stajich
JE
Nguyen
C
Cristianini
N
Estimating the tempo and mode of gene family evolution from comparative genomic data
Genome Res.
 , 
2005
, vol. 
15
 (pg. 
1153
-
1160
)
Hahn
MW
Han
MV
Han
SG
Gene family evolution across 12 Drosophila genomes
PLoS Genet.
 , 
2007
, vol. 
3
 pg. 
e197
 
Hayden
SM
, et al.  . 
Ecological adaptation determines functional mammalian olfactory subgenomes
Genome Res.
 , 
2010
, vol. 
20
 (pg. 
1
-
9
)
Hedges
SB
Dudley
J
Kumar
S
TimeTree: a public knowledge-base of divergence times among organisms
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
2971
-
2972
)
Hekmat-Scafe
DS
Scafe
CR
McKinney
AJ
Tanouye
MA
Genome-wide analysis of the odorant-binding protein gene family in Drosophila melanogaster
Genome Res.
 , 
2002
, vol. 
12
 (pg. 
1357
-
1369
)
Hiller
K
Grote
A
Scheer
M
Munch
R
Jahn
D
PrediSi: prediction of signal peptides and their cleavage positions
Nucleic Acids Res.
 , 
2004
, vol. 
32
 (pg. 
W375
-
W379
)
Jones
DT
Taylor
WR
Thornton
JM
The rapid generation of mutation data matrices from protein sequences
Comput Appl Biosci.
 , 
1992
, vol. 
8
 (pg. 
275
-
282
)
Kaissling
KE
Olfactory perireceptor and receptor events in moths: a kinetic model
Chem Senses.
 , 
2001
, vol. 
26
 (pg. 
125
-
150
)
Katoh
K
Kuma
K
Toh
H
Miyata
T
MAFFT version 5: improvement in accuracy of multiple sequence alignment
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
511
-
518
)
Kaupp
UB
Olfactory signalling in vertebrates and insects: differences and commonalities
Nat Rev Neurosci.
 , 
2010
, vol. 
11
 (pg. 
188
-
200
)
Kirkness
EF
, et al.  . 
Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle
Proc Natl Acad Sci U S A.
 , 
2010
, vol. 
107
 (pg. 
12168
-
12173
)
Krieger
J
Sandeman
RE
Sandeman
DC
Hansson
BS
Harzsch
S
Brain architecture of the largest living land arthropod, the Giant Robber Crab Birgus latro (Crustacea, Anomura, Coenobitidae): evidence for a prominent central olfactory pathway?
Front Zool.
 , 
2010
, vol. 
7
 pg. 
25
 
Krieger
MJ
Ross
KG
Identification of a major gene regulating complex social behavior
Science
 , 
2002
, vol. 
295
 (pg. 
328
-
332
)
Labandeira
CC
Invasion of the continents: cyanobacterial crusts to tree-inhabiting arthropods
Trends Ecol Evol.
 , 
2005
, vol. 
20
 (pg. 
253
-
262
)
Larsson
MC
, et al.  . 
Or83b encodes a broadly expressed odorant receptor essential for Drosophila olfaction
Neuron
 , 
2004
, vol. 
43
 (pg. 
703
-
714
)
Lawson
D
, et al.  . 
VectorBase: a home for invertebrate vectors of human pathogens
Nucleic Acids Res.
 , 
2007
, vol. 
35
 (pg. 
D503
-
D505
)
Leal
WS
, et al.  . 
Kinetics and molecular properties of pheromone binding and release
Proc Natl Acad Sci U S A.
 , 
2005
, vol. 
102
 (pg. 
5386
-
5391
)
Leal
WS
Nikonova
L
Peng
G
Disulfide structure of the pheromone binding protein from the silkworm moth, Bombyx mori
FEBS Lett.
 , 
1999
, vol. 
464
 (pg. 
85
-
90
)
Letunic
I
Bork
P
Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation
Bioinformatics
 , 
2007
, vol. 
23
 (pg. 
127
-
128
)
Ling
X
He
X
Xin
D
Detecting gene clusters under evolutionary constraint in a large number of genomes
Bioinformatics
 , 
2009
, vol. 
25
 (pg. 
571
-
577
)
Luc
N
Risler
JL
Bergeron
A
Raffinot
M
Gene teams: a new formalization of gene clusters for comparative genomics
Comput Biol Chem.
 , 
2003
, vol. 
27
 (pg. 
59
-
67
)
Matsuo
T
Sugaya
S
Yasukawa
J
Aigaki
T
Fuyama
Y
Odorant-binding proteins OBP57d and OBP57e affect taste perception and host-plant preference in Drosophila sechellia
PLoS Biol.
 , 
2007
, vol. 
5
 pg. 
e118
 
McGowen
MR
Clark
C
Gatesy
J
The vestigial olfactory receptor subgenome of odontocete whales: phylogenetic congruence between gene-tree reconciliation and supermatrix methods
Syst Biol.
 , 
2008
, vol. 
57
 (pg. 
574
-
590
)
McGuffin
LJ
Bryson
K
Jones
DT
The PSIPRED protein structure prediction server
Bioinformatics
 , 
2000
, vol. 
16
 (pg. 
404
-
405
)
Montfort
WR
Weichsel
A
Andersen
JF
Nitrophorins and related antihemostatic lipocalins from Rhodnius prolixus and other blood-sucking arthropods
Biochim Biophys Acta.
 , 
2000
, vol. 
1482
 (pg. 
110
-
118
)
Nei
M
The new mutation theory of phenotypic evolution
Proc Natl Acad Sci U S A.
 , 
2007
, vol. 
104
 (pg. 
12235
-
12242
)
Nei
M
Niimura
Y
Nozawa
M
The evolution of animal chemosensory receptor gene repertoires: roles of chance and necessity
Nat Rev Genet.
 , 
2008
, vol. 
9
 (pg. 
951
-
963
)
Nei
M
Rooney
AP
Concerted and birth-and-death evolution of multigene families
Annu Rev Genet.
 , 
2005
, vol. 
39
 (pg. 
121
-
152
)
Pelosi
P
Calvello
M
Ban
L
Diversity of odorant-binding proteins and chemosensory proteins in insects
Chem Senses.
 , 
2005
, vol. 
30
 
Suppl 1
(pg. 
i291
-
i292
)
Pelosi
P
Maida
R
Odorant-binding proteins in vertebrates and insects: similarities and possible common function
Chem Senses.
 , 
1990
, vol. 
15
 (pg. 
205
-
215
)
Pelosi
P
Zhou
JJ
Ban
LP
Calvello
M
Soluble proteins in insect chemical communication
Cell Mol Life Sci.
 , 
2006
, vol. 
63
 (pg. 
1658
-
1676
)
Penalva-Arana
DC
Lynch
M
Robertson
HM
The chemoreceptor genes of the waterflea Daphnia pulex: many Grs but no Ors
BMC Evol Biol.
 , 
2009
, vol. 
9
 pg. 
79
 
Pertea
M
Lin
X
Salzberg
SL
GeneSplicer: a new computational method for splice site prediction
Nucleic Acids Res.
 , 
2001
, vol. 
29
 (pg. 
1185
-
1190
)
Pevzner
P
Tesler
G
Human and mouse genomic sequences reveal extensive breakpoint reuse in mammalian evolution
Proc Natl Acad Sci U S A.
 , 
2003
, vol. 
100
 (pg. 
7672
-
7677
)
Pophof
B
Pheromone-binding proteins contribute to the activation of olfactory receptor neurons in the silkmoths antheraea polyphemus and Bombyx mori
Chem Senses.
 , 
2004
, vol. 
29
 (pg. 
117
-
125
)
Quijano
C
, et al.  . 
Selective maintenance of Drosophila tandemly arranged duplicated genes during evolution
Genome Biol.
 , 
2008
, vol. 
9
 pg. 
R176
 
Robertson
HM
Warr
CG
Carlson
JR
Molecular evolution of the insect chemoreceptor gene superfamily in Drosophila melanogaster
Proc Natl Acad Sci U S A.
 , 
2003
, vol. 
100
 
Suppl 2
(pg. 
14537
-
14542
)
Rutherford
K
, et al.  . 
Artemis: sequence visualization and annotation
Bioinformatics
 , 
2000
, vol. 
16
 (pg. 
944
-
945
)
Sanchez-Gracia
A
Vieira
FG
Almeida
FC
Rozas
J
Comparative genomics of the major chemosensory gene families in Arthropods
In: Encyclopedia of Life Sciences
 , 
2011
Chichester (UK)
John Wiley & Sons, Ltd
 
doi:10.1002/9780470015902.a0022848
Sanchez-Gracia
A
Vieira
FG
Rozas
J
Molecular evolution of the major chemosensory gene families in insects
Heredity
 , 
2009
, vol. 
103
 (pg. 
208
-
216
)
Sanchez
D
, et al.  . 
Characterization of two novel lipocalins expressed in the Drosophila embryonic nervous system
Int J Dev Biol.
 , 
2000
, vol. 
44
 (pg. 
349
-
359
)
Scaloni
A
Monti
M
Angeli
S
Pelosi
P
Structural analysis and disulfide-bridge pairing of two odorant-binding proteins from Bombyx mori
Biochem Biophys Res Commun.
 , 
1999
, vol. 
266
 (pg. 
386
-
391
)
Slater
GS
Birney
E
Automated generation of heuristics for biological sequence comparison
BMC Bioinformatics
 , 
2005
, vol. 
6
 pg. 
31
 
Smadja
C
Butlin
RK
On the scent of speciation: the chemosensory system and its role in premating isolation
Heredity
 , 
2009
, vol. 
102
 (pg. 
77
-
97
)
Smadja
C
Shi
P
Butlin
RK
Robertson
HM
Large gene family expansions and adaptive evolution for odorant and gustatory receptors in the pea aphid, Acyrthosiphon pisum
Mol Biol Evol
 , 
2009
, vol. 
26
 (pg. 
2073
-
2086
)
Soding
J
Protein homology detection by HMM-HMM comparison
Bioinformatics
 , 
2005
, vol. 
21
 (pg. 
951
-
960
)
Stajich
JE
, et al.  . 
The Bioperl toolkit: Perl modules for the life sciences
Genome Res.
 , 
2002
, vol. 
12
 (pg. 
1611
-
1618
)
Stamatakis
A
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
Bioinformatics
 , 
2006
, vol. 
22
 (pg. 
2688
-
2690
)
Steinbrecht
RA
Odorant-binding proteins: expression and function
Ann N Y Acad Sci.
 , 
1998
, vol. 
855
 (pg. 
323
-
332
)
Stensmyr
MC
, et al.  . 
Insect-like olfactory adaptations in the terrestrial giant robber crab
Curr Biol.
 , 
2005
, vol. 
15
 (pg. 
116
-
121
)
Tamura
K
Dudley
J
Nei
M
Kumar
S
MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0
Mol Biol Evol.
 , 
2007
, vol. 
24
 (pg. 
1596
-
1599
)
Tamura
K
Subramanian
S
Kumar
S
Temporal patterns of fruit fly (Drosophila) evolution revealed by mutation clocks
Mol Biol Evol.
 , 
2004
, vol. 
21
 (pg. 
36
-
44
)
Tegoni
M
Campanacci
V
Cambillau
C
Structural aspects of sexual attraction and chemical communication in insects
Trends Biochem Sci.
 , 
2004
, vol. 
29
 (pg. 
257
-
264
)
Valenzuela
JG
, et al.  . 
The D7 family of salivary proteins in blood sucking diptera
Insect Mol Biol.
 , 
2002
, vol. 
11
 (pg. 
149
-
155
)
Vieira
FG
Sanchez-Gracia
A
Rozas
J
Comparative genomic analysis of the odorant-binding protein family in 12 Drosophila genomes: purifying selection and birth-and-death evolution
Genome Biol.
 , 
2007
, vol. 
8
 pg. 
R235
 
Vogt
RG
Riddiford
LM
Pheromone binding and inactivation by moth antennae
Nature
 , 
1981
, vol. 
293
 (pg. 
161
-
163
)
von Grotthuss
M
Ashburner
M
Ranz
JM
Fragile regions and not functional constraints predominate in shaping gene organization in the genus Drosophila
Genome Res.
 , 
2010
, vol. 
20
 (pg. 
1084
-
1096
)
Wang
J
, et al.  . 
SilkDB: a knowledgebase for silkworm biology and genomics
Nucleic Acids Res.
 , 
2005
, vol. 
33
 (pg. 
D399
-
D402
)
Wang
L
Wang
S
Li
Y
Paradesi
MSR
Brown
SJ
BeetleBase: the model organism database for Tribolium castaneum
Nucl Acids Res.
 , 
2007
, vol. 
35
 (pg. 
D476
-
D479
)
Wanner
KW
, et al.  . 
Female-biased expression of odourant receptor genes in the adult antennae of the silkworm, Bombyx mori
Insect Mol Biol.
 , 
2007
, vol. 
16
 (pg. 
107
-
119
)
Wanner
KW
Robertson
HM
The gustatory receptor family in the silkworm moth Bombyx mori is characterized by a large expansion of a single lineage of putative bitter receptors
Insect Mol Biol.
 , 
2008
, vol. 
17
 (pg. 
621
-
629
)
Ward
P
Labandeira
C
Laurin
M
Berner
RA
Confirmation of Romer’s Gap as a low oxygen interval constraining the timing of initial arthropod and vertebrate terrestrialization
Proc Natl Acad Sci U S A.
 , 
2006
, vol. 
103
 (pg. 
16818
-
16822
)
Whelan
S
Goldman
N
A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach
Mol Biol Evol.
 , 
2001
, vol. 
18
 (pg. 
691
-
699
)
Whiteman
NK
Pierce
NE
Delicious poison: genetics of Drosophila host plant preference
Trends Ecol Evol.
 , 
2008
, vol. 
23
 (pg. 
473
-
478
)
Xu
P
Atkinson
R
Jones
DN
Smith
DP
Drosophila OBP LUSH is required for activity of pheromone-sensitive neurons
Neuron
 , 
2005
, vol. 
45
 (pg. 
193
-
200
)
Xu
PX
Zwiebel
LJ
Smith
DP
Identification of a distinct family of genes encoding atypical odorant-binding proteins in the malaria vector mosquito, Anopheles gambiae
Insect Mol Biol.
 , 
2003
, vol. 
12
 (pg. 
549
-
560
)
Ye
Y
Godzik
A
FATCAT: a web server for flexible structure comparison and structure similarity searching
Nucleic Acids Res.
 , 
2004
, vol. 
32
 (pg. 
W582
-
W585
)
Zhou
JJ
He
XL
Pickett
JA
Field
LM
Identification of odorant-binding proteins of the yellow fever mosquito Aedes aegypti: genome annotation and comparative analyses
Insect Mol Biol.
 , 
2008
, vol. 
17
 (pg. 
147
-
163
)
Zhou
JJ
Kan
Y
Antoniw
J
Pickett
JA
Field
LM
Genome and EST analyses and expression of a gene family with putative functions in insect chemoreception
Chem Senses.
 , 
2006
, vol. 
31
 (pg. 
453
-
465
)
Zhou
JJ
, et al.  . 
Genome annotation and comparative analyses of the odorant-binding proteins and chemosensory proteins in the pea aphid Acyrthosiphon pisum
Insect Mol Biol.
 , 
2010
, vol. 
19
 
Suppl 2
(pg. 
113
-
122
)

Author notes

Associate editor: Yoshihito Niimura
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.