Evolutionary Divergence of Gene and Protein Expression in the Brains of Humans and Chimpanzees

Although transcriptomic profiling has become the standard approach for exploring molecular differences in the primate brain, very little is known about how the expression levels of gene transcripts relate to downstream protein abundance. Moreover, it is unknown whether the relationship changes depending on the brain region or species under investigation. We performed high-throughput transcriptomic (RNA-Seq) and proteomic (liquid chromatography coupled with tandem mass spectrometry) analyses on two regions of the human and chimpanzee brain: The anterior cingulate cortex and caudate nucleus. In both brain regions, we found a lower correlation between mRNA and protein expression levels in humans and chimpanzees than has been reported for other tissues and cell types, suggesting that the brain may engage extensive tissue-specific regulation affecting protein abundance. In both species, only a few categories of biological function exhibited strong correlations between mRNA and protein expression levels. These categories included oxidative metabolism and protein synthesis and modification, indicating that the expression levels of mRNA transcripts supporting these biological functions are more predictive of protein expression compared with other functional categories. More generally, however, the two measures of molecular expression provided strikingly divergent perspectives into differential expression between human and chimpanzee brains: mRNA comparisons revealed significant differences in neuronal communication, ion transport, and regulatory processes, whereas protein comparisons indicated differences in perception and cognition, metabolic processes, and organization of the cytoskeleton. Our results highlight the importance of examining protein expression in evolutionary analyses and call for a more thorough understanding of tissue-specific protein expression levels.


Quantitative analysis of human and chimpanzee brain proteomes
Quantitative LC/MS/MS was performed on 1 μg of protein digest per sample, using a nanoAcquity UPLC system (Waters Corporation) coupled to a Synapt G2 HDMS high resolution accurate mass tandem mass spectrometer (Waters Corporation) via a nanoelectrospray ionization source. The sample was first trapped on a Symmetry C18 300 mm × 180 mm trapping column (5 μl/min at 99.9/0.1 v/v water/acetonitrile), after which the analytical separation was performed using a 1.7 μm Acquity BEH130 C18 75 mm × 250 mm column (Waters Corporation) using a 90-min gradient of 5 to 40% acetonitrile with 0.1% formic acid at a flow rate of 300 nl/min with a column temperature of 45°C. Data collection on the Synapt G2 mass spectrometer was performed in ion-mobility assisted data-independent acquisition (HDDIA or HDMSE) mode, using 0.6 s alternating cycle time between low (6 V) and high (27-50 V) collision energy. Scans performed at low collision energy measure peptide accurate mass and intensity (abundance), while scans at elevated collision energy allow for qualitative identification of the resulting peptide fragments via database searching. The total analysis cycle time for each sample injection was approximately 2 h.
All samples were run back-to-back following a randomization order of injection. The QC pool sample was run three times throughout the block (including the first and last injections) to obtain QC reproducibility metrics (for a total of 15 quantitative analyses). Following the 15 analyses, data was imported into Rosetta Elucidator v3.3 (Rosetta Biosoftware), and all LC-MS files were aligned based on the accurate mass and retention time of detected ions ("features") using PeakTeller algorithm (Elucidator). The relative peptide abundance was calculated based on area-under-the-curve of aligned features across all runs. The dataset had 474,762 quantified features and high collision energy (peptide fragment) data was collected in 422,513 spectra for Bauernfeind Supplementary Material 3 sequencing by database searching, respectively. This MS/MS data was searched against a custom SwissProt/Trembl database, which contained both human (Homo sapiens; 26,107 forward entries) and chimpanzee (Pan troglodytes; 32,369 forward entries) proteins. Because chimpanzee protein sequences are less well known than the human proteome (despite the larger number of sequences in the online databases), combining the human and chimpanzee databases allowed us to have the most current and detailed database for protein identification. This database was then appended with a decoy reverse-sequence of each forward entry for false positive rate determination. After individual peptide scoring using PeptideProphet algorithm (Elucidator), the data was annotated at a <1% peptide false discovery rate. This analysis yielded identifications for 8,850 peptides and 1,348 proteins across samples, including 858 proteins with two or more peptides quantified. For quantitative processing, the data was first curated to contain only high quality peptides with appropriate chromatographic peak shape and the dataset was intensity scaled to the robust mean across all samples analyzed; the final quantitative dataset was based on 8,775 peptides and contained 1,337 proteins.

Reproducibility of proteomic analysis
Based on Bradford assays, the total quantity of protein ranged from 429 to 514 mg with no apparent outliers based on protein concentrations ranges (Supplementary Figure 1A). To screen for potential outliers, each individual sample was plotted on a principal component analysis (PCA) plot for the top three principal components (Supplementary Figure 1B) based on z-scored transformed (measurement of the significance of change) protein intensity. The PCA did not indicate any analytical outliers in relation to the other samples and all intra-treatment group replicates had similar PC1 (most significant factor) separations. Importantly, we noted that the QC pools (yellow dots; Supplementary Figure 1B) grouped tightly and were located in the Bauernfeind Supplementary Material 4 middle of the principal components space, as would be expected since the QC pools were generated from a mixture of all samples. The tight grouping of QC pools reflects a high analytical reproducibility. The reproducibility of the protein expression measurements (technical reproducibility) was assessed by measuring the variation in the protein intensities across the QC pool samples run periodically throughout the run block. In this QC pool sample, the mean CV of protein intensity was 7.4% with a median of 5.2% across the 1337 proteins quantified. As expected, the variation in each of the four treatment groups were slightly higher due to biological and preparation variation (mean variation was 12.7% in human ACC, 20.1% in human CN, 16.0% in chimpanzee ACC, and 18.0% in chimpanzee CN). In both species, there appeared to be slightly more variation among CN samples compared to ACC samples.
As an additional way of assessing variability and demonstrating reproducibility among samples, we found the mean expression level of each peptide per species. Across regions and species, a similar amount of variation exists in peptide expression levels, and as expected, less variation is observed in the QC pool sample (Supplementary Figure 2A). Mean peptide expression levels for both species combined are similar in ACC and CN (Supplementary Figure   2B). Also, the log 2 fold change of peptide expression between species is similar in ACC and CN (Supplementary Figure 2C).
In order to confirm that the protein quantifications were consistent with those of their corresponding peptides, we found log 2 fold change ratios between humans and chimpanzees and between ACC and CN of both peptides and proteins. In each comparison, the correlations between the fold changes of peptide and protein expression displayed moderately high, positive correlations (Pearson correlation of human ACC to chimpanzee ACC Spearman rank rho = 0.56, Bauernfeind Supplementary Material 5 p < 0.0001; human CN to chimpanzee CN rho = 0.51, p < 0.0001; human ACC to human CN rho = 0.57, p < 0.0001; chimpanzee ACC to chimpanzee CN rho = 0.52, p < 0.0001), displaying consistency between species and regions.

Analysis of the unpaired (complete) dataset
The interindividual variation in molecular expression was assessed by comparing the distribution of the interindividual CVs for the expression of transcripts and proteins. The results of Mann-Whitney tests for differences in central tendency and Kolmogorov-Smirnov tests for differences in the shape of the distributions between transcript and protein expression, regions, and species are listed in Supplementary Table 4 and are presented in Supplementary Figure 3.
The distributions of the interindividual variation in transcript expression are very similar between the paired and unpaired datasets, indicating that coding and noncoding transcripts display similar patterns in variation in each brain region and species.

Differential expression of transcripts in the unpaired dataset
We assayed expression from 10,400 genes that were expressed in both the ACC and CN.
We found 720 genes to be DE between humans and chimpanzees in the ACC (FDR ≤ 0.05). In contrast, we found far fewer genes as DE in the CN (66 genes; FDR ≤ 0.05). This might be due to large variation in this region within species, possibly due to sex-specific differences.
Interestingly, the top two differentially expressed genes in both regions were R-spondin (RSPO1; involved in the inhibition of the Wnt/β-Catenin Signaling pathway and involved in cell migration and polarity) and LIX1, and both genes are expressed in the brain (Kamata et al., 2004;Moeller et al., 2002). The correlation of differential expression between the ACC and the CN was also Bauernfeind Supplementary Material 6 low generally (Spearman rank rho = 0.21, p < 0.0001), supporting the idea that these regions display unique specializations of biological functions between humans and chimpanzees.
Gene transcripts supporting 373 categories of biological function in ACC and 140 in CN were found to be DE between humans and chimpanzees (minimum of 10 genes per category, q ≤ involved in intracellular signaling and transport, including multicellular organismal processes, and cellular adhesion, including the categories of cell adhesion and biological adhesion, are among the most enriched genes displaying interspecific differential expression. Although there are many functional categories of genes that show differential expression in both ACC and CN, including those involved in structural development, intracellular signaling, and cell adhesion, several categories are also unique to the region. Specifically, transcripts involved in neuronal communication and cellular regulatory processes are uniquely DE between species in the ACC and may underlie synaptic transmission (Uddin et al., 2004). In contrast, our analysis of interspecific differential expression finds that genes involved in cognition and perception differentiate human and chimpanzee CN. This result is not surprising due to the numerous connections of primary somatosensory cortex to the CN but difficult to interpret due to the unique somatotopic mapping of the region, which is not tracked in our samples (Flaherty & Graybiel, 1993;Geradin 2003).

Differential expression of proteins in the unpaired dataset
We found 67 of the 715 homologous proteins to be differentially expressed between humans and chimpanzees in the ACC (FDR ≤ 0.05). Like the gene expression data, fewer proteins were DE in the CN (58 proteins; FDR ≤ 0.05) compared to ACC. Also seen in the transcript data, the top two differentially expressed proteins were shared by both regions: serum albumin (ALB) and myosin light chain 6B (MYL6B), a blood plasma protein and an ATPase cellular motor protein, respectively. The correlation of differential expression in proteins was more similar between the ACC and the CN (Spearman rank rho = 0.43, p < 0.0001) than compared to the genomic data, a result that is not surprising considering the smaller range of variability across protein expression.  Table 5) revealed similar findings to that performed on the subset of data from the paired dataset. Fewer biological functions met our threshold criteria (51 in ACC and 26 in CN) for differential expression in humans and chimpanzees from proteins, even though we used more lenient thresholds (minimum of 3 genes per category, q ≤ 0.05), by nature of having far fewer proteins in our dataset compared to transcripts. Specifically, proteins involved in the oxidative metabolism, including oxidative reduction and metabolic processes, and anaerobic metabolism and biosynthesis, including carbohydrate catabolic processes and cellular carbohydrate catabolic processes, are among the most enriched proteins displaying differential expression between species in the ACC. For the CN, proteins involved in intracellular signaling and transport, including multicellular organismal processes, and cellular adhesion, including the categories of cell adhesion and biological adhesion, are among the most enriched proteins displaying interspecific differential expression. While categories of biological function that Bauernfeind Supplementary Material 8 support biosynthesis, perception, and immune response are DE in ACC and CN, proteins involved in anaerobic metabolism are uniquely DE between species in the ACC and may assist in biomolecular turnover that may be unique to one species (Bauernfeind et al., 2014).

Supplementary Dataset Legend
Supplementary Dataset 1 -Raw peptide and protein quantifications and individual measures of gene and protein expression for human and chimpanzee ACC and CN. The species means, SDs, and interindividual CVs are included. interindividual CVs between gene and protein expression, regions of the brain, and species in the unpaired dataset. The threshold q-values and minimum numbers of genes or proteins per category are listed in each subheading. (Table is uploaded