It is encouraging to see that the authors of the commentaries to our review share our enthusiasm regarding the promising perspectives of applying proteomics to behavioral ecology. We have explained how the proteomic approach can assist behavioral ecologists in understanding the molecular basis of behavior, as well as variation in and evolution of behavior ( Valcu and Kempenaers 2015 ). The 3 commentaries provide additional arguments and examples supporting this view ( Bailey and Ly 2014 ; Ramm 2014 ; Sirot 2014 ). Here, we would like to comment on an interesting topic raised by Bailey and Ly (2014) .
In their commentary, Bailey and Ly draw attention to a particular approach to proteomic data analysis, namely data mining. They advocate the use of whole-proteome signatures for testing hypotheses about behavior. Proteomic signatures (aka protein expression patterns in our review, protein expression signatures, Bradley et al. 2002 ; protein expression profiles, Shen et al. 2013 ; proteomic signature profiles, Goh et al. 2012 ; proteomic profiles or patterns, Petricoin et al. 2002 ) represent patterns of protein abundance, which are indicators of particular phenotypes or biological conditions. The interpretation of these patterns does not require further information about the identity of the proteins. Proteins changing in abundance collectively contribute to the patterns, irrespective of what caused the change (e.g., gene expression up- or downregulation, protein turnover, or modification). Hence, proteomic signatures are more powerful in discriminating phenotypes than variation in any of the single proteins they comprise. This is useful particularly for heterogeneous populations ( Petricoin et al. 2002 ) and for highly variable phenotypes such as behavior.
Proteomic signatures can be obtained through an independent selection of differentially expressed proteins based on statistical criteria or can be extracted from complex proteomic data sets using machine learning algorithms, such as those suggested by Bailey and Ly (2014) . In the latter case, the choice for the pattern recognition algorithm depends on the data available and on the desired output. For example, supervised learning can be employed for pattern recognition in data with an already known structure (e.g., treatment vs. control), whereas unsupervised learning assists the discovery of previously unknown patterns without making assumptions about a structure in the data ( Thomas et al. 2006 ).
Proteomic signatures have been long recognized as useful tools with applications, for example, in diagnostic and disease monitoring ( Petricoin et al. 2002 ), pharmacology ( Wenzel and Bandow 2011 ), toxicology ( Amacher 2010 ), ecotoxicology ( Tomanek 2011 ), and ecology ( Renella et al. 2014 ). Such global proteomic signatures identified based on either protein presence/absence ( Biron et al. 2005 ; Ponton et al. 2006 ; Lefèvre et al. 2007 ) or protein abundance ( Chan et al. 2011 ) have also been used in some of the behavioral ecology studies we reviewed. Data mining is a powerful approach to identify hidden phenotypes because it uses proteome-wide information on protein presence or abundance, not only subsets of proteins that satisfy certain criteria (e.g., differentially expressed). This can, for example, help revealing groups of individuals with diverging molecular phenotypes within otherwise (behaviorally) homogenous groups. As Bailey and Ly (2014) also point out, whole-proteome signatures encompass many small differences in protein abundances scattered across the proteome, and this makes them a sensitive tool for investigating the molecular basis of variation in behavior and the evolution of behavior. Furthermore, whole-proteome signatures allow tackling phenomic studies (i.e., genotype-to-phenotype mapping) ( Houle et al. 2010 ; Bailey and Ly 2014 ).
We feel, however, that a note of caution is needed here. The results of heuristic algorithms largely depend on the data being analyzed ( Thomas et al. 2006 ) and computer scientists warn that “data mining is easy to do badly” ( Larose 2014 ). The solutions identified may not be unique and require extensive validation ( Thomas et al. 2006 ). From a technical perspective, gel-based approaches may suffer from incomplete separation of proteins (as discussed in our review), which makes them less suitable for data mining approaches because 1 band or spot often contains more than 1 protein. These limitations probably explain the tendency of proteomic studies to favor traditional statistic tools for data analysis. However, when carefully used, bioinformatic tools typically employed for the analysis and interpretation of proteomic data should produce consistent results whether applied on preselected protein sets or on whole-proteome data sets ( Huang et al. 2009 ).
On the other hand, although proteomic signatures in the absence of protein identity are undoubtedly valuable tools for data analysis, they only become truly insightful when incorporating prior knowledge on protein function ( Subramanian et al. 2005 ). We strongly believe that the full potential of proteomic tools in helping us to understand the molecular basis of behavior will only be reached by learning the identity and the function of the proteins comprising a behavior-specific proteomic signature.
Whatever the approach undertaken for analysis and however challenging high-throughput proteomic studies may be, of one thing we can be sure: the proteome holds answers to many of the questions asked by behavioral ecologists and searching for them will be worth the effort!