Abstract

Social interactions make up to a large extent the prime material of episodic memories. We therefore asked how social signals are coded by neurons in the hippocampus. Human hippocampus is home to neurons representing familiar individuals in an abstract and invariant manner ( Quian Quiroga et al. 2009). In contradistinction, activity of rat hippocampal cells is only weakly altered by the presence of other rats ( von Heimendahl et al. 2012; Zynyuk et al. 2012). We probed the activity of monkey hippocampal neurons to faces and voices of familiar and unfamiliar individuals (monkeys and humans). Thirty-one percent of neurons recorded without prescreening responded to faces or to voices. Yet responses to faces were more informative about individuals than responses to voices and neuronal responses to facial and vocal identities were not correlated, indicating that in our sample identity information was not conveyed in an invariant manner like in human neurons. Overall, responses displayed by monkey hippocampal neurons were similar to the ones of neurons recorded simultaneously in inferotemporal cortex, whose role in face perception is established. These results demonstrate that the monkey hippocampus participates in the read-out of social information contrary to the rat hippocampus, but possibly lack an explicit conceptual coding of as found in humans.

Introduction

Individual recognition is achieved by identifying distinct elements such as the face, voice, noun, or silhouette as belonging to one individual. For each well-known individual, a unique combination of these idiosyncratic attributes is represented at the cognitive level. At the neuronal level, this unique representation could arise either from synchronous activation of cell assemblies (Eichenbaum 1993; Hoffman and McNaughton 2002; Canolty et al. 2010) in distant unimodal brain regions (von Kriegstein et al. 2005; von Kriegstein and Giraud 2006) or from the activity of gnostic cells (Konorski 1967) located in a “Person identity node” (Bruce and Young 1986; Burton et al. 1990; Ellis et al. 1997; Campanella and Belin 2007)—although these two hypotheses are not mutually exclusive (Patterson et al. 2007; Meyer and Damasio 2009). Gnostic cells—cells that responded equally to different pictures of a well-known person's face, as to its written or pronounced name—have been discovered in human epileptic patients (Quiroga et al. 2005; Waydo et al. 2006; Quian Quiroga et al. 2009). Interestingly, these cells were located in the medial temporal lobe, principally in the hippocampus (but also in the amygdala, entorhinal cortex, and parahippocampal cortex) which is an archaic structure present in all mammals with a highly conserved anatomy, known mainly for its involvement in episodic memory and navigation rather than its role in social information processing (Bird and Burgess 2008; Clayton and Russell 2009; Clark and Squire 2013). It thus raises the questions of (1) whether the hippocampus in different species participates in representing social information (Becker et al. 1999; Ishai et al. 2005; Machado and Bachevalier 2006; Ku et al. 2011; Davidson et al. 2012; Allen and Fortin 2013) and (2) whether gnostic neurons representing familiar individuals can be observed in other mammals (Quiroga 2012). Notably, in contradiction with human findings, rat's hippocampal place cells have been shown to be only weakly sensitive to the presence of a nearby peer (von Heimendahl et al. 2012; Zynyuk et al. 2012), but note that in another rodent species hamster's hippocampal neurons play a role in social odor categorization (Petrulis et al. 2000, 2005).

In the present study, we aimed at characterizing how hippocampal neurons, recorded without particular prescreening, are activated when monkeys are exposed to different stimuli representing single individuals. To parallel at best the design used in human studies, we presented monkeys with pictures of faces of familiar individuals with different points of view and, in place of nouns, with several audio extracts of voices of these same individuals. We wondered if monkey hippocampal neurons would represent individuals in an invariant and abstract manner similarly to humans' neurons, if they would on the contrary remain blind to social information similarly to rats' neurons, or if they would exhibit an intermediate stage of coding. With regard to this issue, rhesus monkeys embody an appealing link between humans and rats. On the one hand, (1) they rely as much on visual information as humans do (Ghazanfar and Santos 2004; Waitt and Buchanan-Smith 2006), (2) they individually discriminate (Rendall et al. 1996; Parr et al. 2000; Gothard et al. 2004; Dahl et al. 2007) and recognize (Adachi and Hampton 2011; Sliwa et al. 2011) other monkeys and humans and (3) they possess a rich representation of other individuals encompassing both vocal and facial information (Adachi and Hampton 2011; Sliwa et al. 2011; Habbershon et al. 2013). On the other hand, monkeys lack some semantic and episodic complexity found in humans (Hampton and Schwartz 2004; Bayley et al. 2005; Clark and Squire 2013), leaving the issue of the organization and further of the existence of social concept neurons in their hippocampus open questions.

Finally, we aimed at comparing hippocampal neurons' activity to the activity of neurons recorded in the inferotemporal cortex (area TE, in the anterior fundus, and anterior lateral part of the lower bank of STS), a region known for its involvement in perceptual processing of faces (Gross et al. 1972; Bruce et al. 1981; Perrett et al. 1982; Rolls 1984; Yamane et al. 1988; Tsao et al. 2006), with the goal of understanding how representation of facial and vocal signals could be transformed along a perception to conceptualization/memory pathway. We therefore additionally recorded neuronal activities from the anterior inferotemporal cortex to the same stimuli and hypothesized that, (1) in both region social stimuli would be categorized separately from nonsocial stimuli at the neuronal and population levels, (2) we would observe a hierarchical processing of information about identity, where the anterior hippocampus would process information about identity independently of the type of modality (similarly to the finding of Quian Quiroga et al. 2009), while inferotemporal neurons would be modality-specific, and (3) seeing or hearing familiar individuals would preferentially trigger hippocampal neurons' activity as these stimuli might prompt the recall of the whole concept of these individuals or of episodes associated with them (Yanike et al. 2004; Rutishauser et al. 2008; Viskontas et al. 2009).

Materials and Methods

Subjects

All experimental procedures were in accordance with the regulations of local authorities (Direction Départementale des Services Vétérinaires, Lyon, France), European Community standards for the care and use of laboratory animals (European Community Council Directive of 24 November 1986, 86/609/EEC) and national standards (Ministère de l'Agriculture et de la Forêt, Commission Nationale de l'expérimentation animale). Two male adult rhesus monkeys (Macaca mulatta; monkey Y, 8.5 kg and monkey O, 13 kg) were used. They were socially housed in rooms of either 3 or 4 individuals since their arrival at the housing facility.

Surgical Preparation

Animals were prepared for chronic recording of single-neuron activity in the right hippocampus and right anterior inferotemporal cortex, area TE (Fig. 1A). Anesthesia was induced with Zoletil 20 (15 mg/kg) and maintained under isoflurane (2.5%) during positioning of a cilux head-restraint post and recording chamber (Crist Instruments, Damascus, MD). Animals were given atropine (0.25 mg/kg) to prevent excessive salivation. Adequate measures were taken to minimize pain or discomfort. Analgesia was provided by a presurgical buprenorphine injection (0.2 mg/kg). The position of the recording chamber for each animal was calculated using stereotaxic coordinates derived from presurgical anatomical magnetic resonance images (MRI, 0.6 mm isometric), to have access both to the right hippocampus and right TE structure. Postsurgical MR images were used to finely monitor recording locations within the hippocampus and TE during each experiment (Fig. 1A). Structural MRI presurgical and postsurgical scans were performed as animals were anesthetized and placed in an MRI-compatible stereotaxing frame.

Figure 1.

Electrophysiological recordings and behavioral paradigm. (A) Location of the recording sites. Left: Three dimensional representation of the monkey brain highlighting the right hippocampus (in red) and TE region (blue) and the chamber boundaries (black lines) in one monkey. Right: Selected MRI sections along the anteroposterior axis showing the recording sites and the track of the electrode (middle section). (B) Stimuli (faces and nonface objects) are drawn from a large database in which faces are varied along 5 dimensions: identity, species (human/monkey), gender (female/male), viewpoint (−30°/30°/frontal), and familiarity (personally known/unknown). This set is mirrored by an acoustic set (voices and nonvoice sounds) presenting voices with the same properties, except the viewpoint dimension. Stimuli were either unfamiliar to the tested monkeys (left) or drawn from their environment (right). (C) Visual and acoustic stimuli presentation task. Monkeys initiated trials by fixating a central fixation spot for 500 ms. Following this, a sound was played on acoustic trials while the monkey kept fixating the spot. On visual trials, a picture was displayed which the animal could explore within its boundaries. (D) Projections of the peak amplitude of 4 units recorded from a tetrode. Each subplot represents the projection of one electrode over another. Each cluster represents a unit. (E) Mean waveforms corresponding to the clusters presented in A. Each row represents one unit while each column corresponds to one electrode of the tetrode.

Figure 1.

Electrophysiological recordings and behavioral paradigm. (A) Location of the recording sites. Left: Three dimensional representation of the monkey brain highlighting the right hippocampus (in red) and TE region (blue) and the chamber boundaries (black lines) in one monkey. Right: Selected MRI sections along the anteroposterior axis showing the recording sites and the track of the electrode (middle section). (B) Stimuli (faces and nonface objects) are drawn from a large database in which faces are varied along 5 dimensions: identity, species (human/monkey), gender (female/male), viewpoint (−30°/30°/frontal), and familiarity (personally known/unknown). This set is mirrored by an acoustic set (voices and nonvoice sounds) presenting voices with the same properties, except the viewpoint dimension. Stimuli were either unfamiliar to the tested monkeys (left) or drawn from their environment (right). (C) Visual and acoustic stimuli presentation task. Monkeys initiated trials by fixating a central fixation spot for 500 ms. Following this, a sound was played on acoustic trials while the monkey kept fixating the spot. On visual trials, a picture was displayed which the animal could explore within its boundaries. (D) Projections of the peak amplitude of 4 units recorded from a tetrode. Each subplot represents the projection of one electrode over another. Each cluster represents a unit. (E) Mean waveforms corresponding to the clusters presented in A. Each row represents one unit while each column corresponds to one electrode of the tetrode.

Design

To parallel at best the design used in human studies, we presented monkeys with different stimuli representing familiar individuals. Human studies used different pictures of each person as well as their written and pronounced name. Here, we presented monkeys with different pictures of faces of familiar individuals with different points of view and, in place of names, with several audio extracts of voices of these same individuals (Fig. 1B). Because studies carried in humans did not report presenting voices of familiar individuals, it is not known whether the cells responding to faces also respond to the corresponding voices. For example, one can hypothesize that when hearing voices, these cells would have reconfigured to represent meanings of different words rather than identity of the speaker. However, the choice of voices was based on ours and other previous findings that monkeys possess a memory of the association between facial and vocal identities for familiar peers (Adachi and Hampton 2011; Sliwa et al. 2011; Habbershon et al. 2013) and for familiar humans (Sliwa et al. 2011). Importantly, although we used faces and voices, the present study was not aiming at studying multisensory perceptual integration between faces and voices at work, for example, during communication. We rather aimed at testing if a conceptual cross-modal association between face identity and voice identity stored in memory would be represented in an invariant way [F = V] (Stein et al. 2010) at the level of single neurons in the monkey hippocampus. Therefore, we used static pictures of faces and avoided presenting both faces and voices simultaneously because it would have induced the perception of congruency in stimuli dynamics and not only congruency in identity. As control stimuli, we presented to the monkeys pictures and acoustic extracts of unknown individuals. We also presented visual and acoustic objects and synthetic patterns, which have been shown to elicit response activity in hippocampal neurons (Tamura, Ono, Fukuda and Nakamura 1992; Tamura, Ono, Fukuda and Nishijo 1992; Rolls et al. 1993; Hampson et al. 2004; Rolls et al. 2005). Finally, contrary to the studies conducted by Quiroga et al., we did not attempt to prescreen neurons, giving us the chance to document responses to facial and vocal stimuli along a large spectrum of selective activities, and not only in ultra-selective cells.

Behavioral Procedure

The animal was head restrained and placed in front of, and at eye level to, an LCD screen situated at 56 cm, in a quiet room. Adequate measures were taken to minimize pain or discomfort. The subject's gaze position was monitored with an infrared eye-tracker (ISCAN) with a 250-Hz sampling rate. Behavioral paradigms, visual displays, eye position monitoring, and reward delivery were under the control of a computer running a real-time data acquisition system (REX) (Hays et al. 1982). Visual stimuli subtended a visual angle of 10° × 13° on the center of the screen (Fig. 1C). The virtual exploration window consisted of the picture surrounded by ∼2° black surround visual angle. Auditory stimuli were presented in a quiet room at the intensity of a regular conversation, that is, 50–65 dB (A-weighted) sound pressure level (SPL) at the subject's ear as measured with a Brüel and Kjær 2239A Integrating Sound Level Meter (http://www.bksv.com) from 2 speakers located at 56 cm in front of the subject and symmetrically 45 cm apart. To start the trial, the subject directed its gaze to a fixation point at the center of the screen. Then it was required to fixate (within ±4°) during 0.5 s. Two types of test trials were randomly interleaved: trials with visual stimuli and trials with auditory stimuli (Fig. 1C). In visual trials, after the 0.5 s fixation, an image was presented at the center of the screen. The subject could freely explore the picture during 2 s, so long as its gaze was maintained within the boundaries of a virtual window around the picture corresponding to the black area in Figure 1B. In auditory trials, after the 0.5 s fixation, the subject was required to continue fixating (±4°) during 2 s while an audio sample was played. Juice reward was given after each trial to ensure monkeys' motivation to complete trials and their attention to the computer monitor where visual stimuli were displayed. Thus reinforcement did not depend on any particular exploration pattern.

Stimuli

Visual stimuli were color images of individuals (humans and rhesus monkeys), objects and synthetic pictures (Fractal Explorer, fractals.da.ru). Because our focus was to investigate a code for identity, we presented more stimuli of individuals than of objects, allowing testing for neuronal activity in response to social and perceptual dimensions of face stimuli (gender, view, and species). Thus, there were three pictures (−30°/30°/front view) for each of the 6 human individuals and 6 or 5 monkey individuals shown and one picture per object and fractal; leading to 18 human, 18 or 21 monkey, 6 object and 3 fractal pictures presented to each tested animal (Fig. 1B). Visual stimuli were presented on a black background. Face photographs were cropped to only include the face, head, and neck, and then adjusted to a common size (GNU Image Manipulation Program, http://www.gimp.org). Human faces were presented without masks and goggles (that they wore in the presence of the animals) since the colony rooms were equipped with windows through which animals could see the experimenters and staff without this protection. Pictures shown in the document are provided for illustrative purposes but do not correspond to the actual stimuli, which included for all animal pictures the head implants (head posts and recording chambers). Otherwise, the pictures used resemble the ones shown in all respects (color, size, and gaze direction).

This set was mirrored by an acoustic set. Three audio samples from each of the 6 human individuals and 6 or 5 monkey individuals were presented along with one audio sample per object and synthetic abstract sound; leading to 18 human voice stimuli, 18 monkey vocalizations, 6 object and 3 synthetic abstract audio samples presented to each tested animal (Fig. 1B). The mean sound pressure level for the duration of each audio sample was calculated. Then the 42 audio samples for each subject were normalized for this mean acoustic intensity (MATLAB, MathWorks, Natick, MA). Voices were presented with the same properties as faces, except the viewpoint dimension. Vocal stimuli consisted of recordings of 2 s duration, each containing either a sample of “coo” vocalization (894 ms ± 300) or a human speech extract (877 ms ± 380). Human speech extracts consisted of small sentences or words in French such as “Bonjour tout le monde”/“Hello everybody” or “Voilà”/“Here it is.”

Individuals and objects represented on the stimuli were either familiar to the subjects (including the subject itself), or unfamiliar. Familiar simian individuals were the 2 or 3 adult rhesus macaques housed in the same room as the subjects for 2–3 years prior to testing. Familiar human individuals were a care-giver and 2 experimenters working with the animals on a daily basis during the same 2 to 3-year period. Familiar objects were objects present in the colony rooms (hose, primate chair, and pole). In the manuscript, these stimuli are referred to as “known.” Each stimulus set was specific to each animal to ensure a high familiarity with individuals presented in stimuli. Unfamiliar stimuli are pictures and voice extracts of unknown to the monkey individuals and objects, as well as fractal images and synthetic audio samples. These stimuli have been presented to the monkeys on a daily basis in the setup room during 1 month prior to the experiment, in order to avoid a novelty confound (Xiang and Brown 1998; Jutras and Buffalo 2010). Thus, this second group of stimuli was not personally familiar but was visually familiar to the monkeys. In the manuscript, these stimuli are referred to as “unknown.” Both stimuli of known and unknown individuals and objects have been seen/heard by the monkeys hundreds of times prior to the recording sessions. Note that based on previous results (Adachi and Hampton 2011; Sliwa et al. 2011; Habbershon et al. 2013), we considered that rhesus monkeys represent the known stimuli as associated across modalities at a cognitive level because of their personal real-life experience with the individuals they represent, contrary to the unknown stimuli which they have only perceived as pictures or sounds in a unimodal way.

Electrophysiological Recordings

Single-neuron activity was recorded extracellularly with tungsten quartz insulated tetrodes (1.5–2 MΩ; Thomas Recording, Giessen, Germany) or tungsten single microelectrodes (1–2 MΩ; Frederick Haer Company, Bowdoinham, ME) which were lowered into the hippocampus and area TE. The depth of recording was calculated based on the distance between the target area and the tip of a reference electrode that was viewed on the MRI slice and served as a reference (Fig. 1A). The microelectrodes were inserted through 24-gauge stainless steel guide tubes set inside a delrin grid (Crist Instruments) adapted to the recording chamber. The electrodes were lowered individually with a NAN microdrive (Plexon Inc., Dallas, TX) (Fig. 1A) at a speed of 20–30 μm/s to the required depth under electrophysiological monitoring; background activity and neural signals changed at the crossing of boundaries between cortex, white matter, and gray matter at the expected depths. For the hippocampus we also used the signal change specific to the ventricles as a landmark above the hippocampus. When the electrodes reached the dorsal border of hippocampus and of TE cortex, their speed was reduced, and the electrodes were advanced by small increments. When both tetrodes registered well isolated and stable spikes, the experiment began. The electrode signal was band-pass filtered from 150 Hz to 8 kHz and preamplified using a Plexon system (Plexon Inc.) and digitized at 20 kHz using an 18-bit interface card (National Instrument) controlled by a custom data acquisition software (RecorderEV3, Mediane System, Le Pecq, France). Continuous signals sampled at 20 kHz were displayed online and stored with the RecorderEV3 software along with behavioral and stimulation data for off-line analysis. Spike waveforms were stored in a time window from 650 µs before to 1350 µs after threshold crossing. During these 2000 µs 40 data points were used to look at the spike shape. Single-units were sorted offline using the Offline Sorter software (Plexon Inc.) (Fig. 1D,E). We used a semimanual sorting method such that three selected parameters (principal components, waveform patterns, and amplitude of the peak in each electrode of the tetrode) allowed us to separate units from background activity and yielded well-isolated clusters. We produced ISIs to confirm that the units were well isolated from one another, notably by inspecting that there were no spikes falling within the refractory interspike interval of 1 ms after each spike. Because with tetrodes the signal is recorded simultaneously by 4 adjacent microwires, the location of the neurons can be triangulated, and their signals separated from one another.

Cells Location

Recordings were carried in the anterior part of the right hippocampus, mainly between 15 and 18 mm on the anteroposterior axis (AP, relative to the interaural line) in one monkey and 8–16 mm in the other one (Fig. 1A and Supplementary Fig. S1). Recordings were carried out in the superior part of right TE, that is, in the interior inferior bank of STS, mainly in its anterior fundus and anterior lateral part between 10–12 mm in one monkey and 8–12 mm AP in the other (Fig. 1A and Supplementary Fig. S1). MR images were compared with 2 monkey brain atlases (Paxinos et al. 2000; Saleem and Logothetis 2006) to identify the hippocampal subfield (CA1, CA3, dentate gyrus, and subiculum) from which neuronal activity was recorded.

Cells Selectivity Analysis

Data were analyzed with custom written scripts and the Statistics Toolbox in Matlab (MathWorks). Neurons with very low firing rates (FRs) (<0.4 Hz) were not included in the following analysis due to their lack of reliability when tested with the following methods. Raster plots and peristimulus histograms in response to the stimuli set were plotted for each neuron for inspection. The baseline activity was defined as the average activity during 500 ms preceding the stimulus onset during fixation. The peak of each neuron's activity was defined separately for auditory and visual modalities as the extremum of their trial-averaged firing rate calculated on 50 ms consecutive bins. Then neuron's response to a given stimulus was defined as the trial-averaged FR in a 500-ms window centered on the time of the peak stimulus activity. This method was used to take into account the use of dynamic acoustic stimuli, which can lead to different response time courses among different neurons. In parallel, spike trains were also smoothed by convolution with a Gaussian kernel (σ = 10 ms) to obtain the spike density function (SDF). A neuron's response was considered excitatory if it was significantly different from baseline activity (P < 0.05, t-test) and if its SDF was greater than the mean plus 4 standard deviations (SDs) of the baseline activity. A neuron's response was considered inhibitory if it was significantly suppressed for at least 100 consecutive milliseconds during a window of 500 ms after the stimulus onset. Neurons were classified according to their responses to either or both of the 2 modalities.

We then wondered if, and how many, neurons would respond to social stimuli compared with nonsocial stimuli in either one of both modalities. Neurons were defined as face-, object-, voice-, or sound-selective if their activity to at least one stimulus of these categories was significant and if additionally their activity was not significant for stimuli of the other category from the same modality. We used a stringent criterion (4 STD above baseline) because we did not have an equal number of faces and objects in our stimulus set. This criterion differs from previous reports describing activity to faces in which responses to face stimuli were 2–10 times larger than responses to objects (Rolls 1984).

Second, stimuli represented either known or unknown individuals and also varied along several other categories: viewpoint (−30°/30°/frontal), species (human/monkey), gender (female/male) and identity. Therefore, to determine whether these categories are preferentially driving face-selective cells' activity, a generalized linear model (GLM) with 4 factors (familiarity, species, gender, and viewpoint) was fitted on each face-selective neuron response to the visual subset of stimuli. Similarly, a GLM with three factors (familiarity, species, and gender) was fitted on each voice-selective cell's response to the acoustic subset of stimuli. Only selective neurons presenting FR increases (but not decreases) for their preferred stimuli were included in this and following analyses. To assess if stimuli of familiar (rather than of unfamiliar) individuals are driving neurons' activity, we calculated the number of familiar images/sounds generating a response in each face/voice-selective cells (Viskontas et al. 2009) and compared it with a paired Student test to the number of unfamiliar images/sounds in the same cells.

Third, we quantified with a receiver operating characteristics (ROC) analysis, if neuronal activity related to facial (or vocal) identity was invariant, that is, if the average FRs to the three viewpoints (or vocal extracts) of a given person were similar. The hit rate corresponded to the median number of spikes pulled across all trials for each of the three pictures (or audio extracts) of an individual, and the false-positive rates to the median number of spikes pulled across all trials elicited by each of the other visual (or acoustic) stimuli of the set. Ninety-five percent confidence intervals were estimated using a bootstrap of the stimuli responses. The ROC analysis was performed for each facial (or vocal) identity and the maximum area under the curve (AUC) from all of them was designated as the best ROC area (Eifuku et al. 2011). In parallel, we also asked how selective each neuron's responses were, by quantifying the number of stimuli eliciting activity greater than the half-maximum activity of this neuron (Perrodin et al. 2011) and also by computing a sparseness index S (Rolls and Tovee 1995), with ri being the response of the neuron to the ith stimulus, and N being the total number of stimuli, as follows:

$S=(∑i=1Nri/N)2∑i=1N⁡ri2/N$

The sparseness index ranges from 0 (highly-selective or sparse coding) to 1 (nonselective or dense coding).

Fourth, face–voice association at the neuronal level was investigated. For each cell the correlation coefficient between the response vector to the different facial identities and the response vector to corresponding vocal identities was calculated. To determine if the calculated correlation coefficient could have been obtained by chance, we compared them with correlation coefficients calculated by permuting identities 719 times.

Finally, unsupervised methods were also used to assess finer selectivity. For each neuron a principal component analysis was performed, with observations being each presented stimulus from the set and variables being the neuron's response at each time bin for the respective stimuli. Stimuli responses were projected on the first 2 principal components and then clustered either using a Gaussian mixture model or by calculating the Euclidian distance between the representation of the neuronal responses to every pair of stimuli in the principal component space and further linking them hierarchically into a cluster tree.

Population Analysis and Comparison Across Brain Areas

χ2 tests were used to compare across brain regions the number of cells responding to the different modalities and categories. We also compared the indexes of selectivity across modalities and across regions with Student and Wilcoxon tests. Finally unsupervised methods were used to determine which parameters are driving neuronal activity (of all the recorded neurons and not only of the selective ones) at the population level in both brain areas. For each given stimulus and each given neuron, a z-score was calculated as the neuron's mean FR for the stimulus in a 500 ms window centered at the peak of activity for this stimulus divided by the standard deviation of the FR in the same 500 ms window. The peak of activity was defined as the extremum of the trial-averaged FR on 50 ms consecutive bins for a given stimulus (if the activity was similar to that of the baseline, the z-score was close to 1/SD of the FR in the 500-ms window). The z-scores weighted the contribution of change in FR in response to a stimulus presentation by its reliability across trials, whereby less reliable responses (i.e., with a high standard deviation) were weighted down compared with more reliable responses (i.e., with a low standard deviation). Principal component analyses on the z-scores were performed in both the hippocampus and TE, with observations being each one of the presented stimuli and variables being each neuron. Z-scores between the different neurons were also standardized in order for the PCA not to be dominated by neurons with the highest modulation depths (Cunningham and Yu 2014). The standardized z-scores were projected on the first 2 principal components and then clustered using a Gaussian mixture model, which is based on an expectation–maximization (EM) algorithm. A hierarchical analysis was also performed, by calculating the Euclidian distances in the principal component space between every pair of stimuli and further linking the stimuli hierarchically into a cluster tree. This analysis assumes the existence of some categorical structure, contrary to the unsupervised stimulus arrangements done with the PCA, but it does not assume any particular grouping into categories (Kriegeskorte et al. 2008).

Results

The activity of 343 neurons was recorded in 2 monkeys in the hippocampus (188 cells: 99 in monkey O and 89 in monkey Y; Fig. 1A; 58 in CA1, 110 in CA3, 18 in the dentate gyrus and 2 in the subiculum) and in area TE (155 cells: 82 in monkey O and 73 in monkey Y; Fig. 1A), while monkeys were presented color images or audio-recordings of individuals and objects familiar to them (including themselves) or not personally familiar (Fig. 1B). Spontaneous FR of cells ranged from 0.1 to 29 Hz in the hippocampus (median 1.3 Hz, mean 3.2 Hz) and from 0.1 to 127 Hz in area TE (median: 1.4 Hz, mean: 3.8 Hz). Thus, the population of cells recorded is likely to encompass pyramidal cells and interneurons in the hippocampus (Matsumura et al. 1999; Wirth et al. 2003; Viskontas et al. 2007; Ison et al. 2011) and area TE (Mruczek and Sheinberg 2012). Naturalistic stimuli elicited robust responses throughout the sampled regions, with 83 (44%) hippocampal neurons (26 in CA1, 50 in CA3, Fig. 2A) selective to at least one stimulus in either modality (with criteria: excitatory response, P < 0.05, median FR superior to 4 SD of the baseline and superior to 0.4 Hz) and 79 (51%) neurons in TE (Fig. 2A). In the hippocampus, 11.7% of cells showed activity that was significantly suppressed by the visual stimulus onset relative to baseline. In contrast, 25.5% of cells showed excitatory responses to the stimulus onset and 7% showed mixed responses (excitatory to some stimuli and inhibitory to others). Inhibitory responses to acoustic cues were less frequent: only 5% were inhibitions, 2% showed mixed inhibitory and excitatory responses, and 23% were excitatory responses. The same pattern was found in TE for visual cues (11.6% of cells showed inhibitory responses, while 32.2 and 11.6% showed, respectively, excitatory and mixed responses) and auditory cues (11.6% of inhibitions, 23.2% of excitatory responses, and 5.1% of mixed responses).

Figure 2.

Face-selective and voice-selective cells in the monkey hippocampus (HPC) and inferotemporal cortex (area TE). (A) Proportion of responsive cells to the different modalities. (B) Proportions of selective cells to faces, objects, voices, and sounds. (C) Examples of rasters and mean FRs (±SEM) of single cells from the hippocampus responding to faces of one or both species as compared with objects (columns 1–3), to voices of one species as compared with objects' sounds (column 4). (D) Examples of single cells in TE.

Figure 2.

Face-selective and voice-selective cells in the monkey hippocampus (HPC) and inferotemporal cortex (area TE). (A) Proportion of responsive cells to the different modalities. (B) Proportions of selective cells to faces, objects, voices, and sounds. (C) Examples of rasters and mean FRs (±SEM) of single cells from the hippocampus responding to faces of one or both species as compared with objects (columns 1–3), to voices of one species as compared with objects' sounds (column 4). (D) Examples of single cells in TE.

Cells Selective for Social Stimuli

We first asked if social stimuli are coded differently from nonsocial stimuli by hippocampal neurons. Neurons were defined as face-selective or object-selective if their activity to at least one stimulus of these categories was significant and if they did not respond to stimuli of the other category (a response was considered significant if it was greater than the mean plus 4 SDs of the baseline activity and significantly different from baseline activity (P < 0.05, t-test)). The proportion of face/object/voice/sound-selective neurons responding with excitation, inhibition of their FR or with a mixed response (excitatory to some stimuli and inhibitory to others) is provided in the Supplementary Tables 1 and 2. As these represented the vast majority of the responses, we provide here proportions for selective neurons presenting an enhanced FR for their preferred stimuli (as in Supplementary Table 2 and Fig. S11A). Hence, of the 188 cells recorded in the hippocampus, 38 (20%) neurons were face-selective (Fig. 2B): 14 (24%) in CA1, 19 (17%) in CA3 (Fig. 2B). Exemplar face-selective cells from the hippocampus are presented in Figure 2C, Supplementary Figures S2 and S3, (see Supplementary Fig. S11A–C for the distribution of best responses to face stimuli against best responses to nonface stimuli). Conversely 5 (3%) were object-selective (see Supplementary Fig. S4 and S5 for example): 1 (2%) in CA1, 3 (3%) in CA3 (Fig. 2B). The number of voice-selective neurons and sound-selective neurons was also quantified with the same criteria. 28 (15%) hippocampal neurons were voice-selective (Fig. 2B and SupplementaryFig. S11A–C): 9 (15%) in CA1, 17 (15%) in CA3. Exemplar voice-selective cells from the hippocampus are presented in Figure 2C and Supplementary Figures S6 and S7. Conversely, 6 (3%) were sound-selective: 2 (3%) in CA1, 4 (4%) in CA3 (Fig. 2B). Compared with face-selective cells, voice-selective cells displayed less pronounced increase in FRs compared with their baseline activity (2-sided t-test, P = 0,028 Fig. 2C). Of the 155 cells recorded in TE, 32 (21%) were found to be face-selective (Fig. 2B and SupplementaryFig. S11A–C). Exemplar face-selective cells from TE are presented in Figure 2D and Supplementary Figures S8, S9. Conversely, 10 (6%) were object-selective. These proportions did not differ from those found in the hippocampus (P > 0.05, χ2-test). Thirty-seven (24%) inferotemporal neurons were found voice-selective (Fig. 2B and Supplementary Figs S10, S11). Conversely, 5 (3%) were sound-selective. These proportions also did not differ between TE and the hippocampus and did not differ compared with the proportion of face-selective cells in TE (P > 0.05, χ2-test). Compared with face-selective cells, voice-selective cells displayed less pronounced increase in FRs compared with their baseline activity (2-sided t-test, P = 0.02 in TE), as was observed in the hippocampus. These first results show the existence of cells in the monkey hippocampus, similar to cells in TE, which respond differently to social stimuli compared with nonsocial stimuli in the absence of any conditional training of the animals. These results suggest that acoustic signal might be less well categorized into social and nonsocial than visual signals in the monkey hippocampus and in TE.

Representation of Social Categories

To determine which categories preferentially drive the activity of face-selective cells, a generalized linear model GLM analysis investigated the selectivity of these cells with 4 factors characterizing stimuli in the experimental dataset: (1) familiarity (known/unknown), (2) species (monkey/human), (3) gender (female/male), (4) viewpoint (frontal/30°/−30°). We found that, 11 (29%) of the 38 hippocampal neurons responded in a differential fashion to at least one of these 4 factors (Table 1). The species factor characterized the largest proportion of cells for which activity was significantly explained by the GLM (Table 1, Fig. 2C,D, and Supplementary Figs S2, S3 for single examples of cells responding more to monkey faces).

Table 1

Face-selective and voice-selective cells properties in the hippocampus and TE

Percentage of face-selective cells

Percentage of voice-selective cells

HPC TE HPC TE
Known–unknown 16
Monkey–human 16 25 11
Female–male 22
View (0°, 30°, and −30°) 13
Not modulated 71 56 86 89
Percentage of face-selective cells

Percentage of voice-selective cells

HPC TE HPC TE
Known–unknown 16
Monkey–human 16 25 11
Female–male 22
View (0°, 30°, and −30°) 13
Not modulated 71 56 86 89

Note: Percentage of cells modulated by one of the 4/3 categories as assessed with generalized linear model analyses carried separately on each population.

In TE, 14 (44%) of the 32 TE neurons were found to respond in a differential fashion to at least one of the 4 factors (familiarity, species, viewpoint, and gender, Table 1). As in the hippocampus, the species factor characterized the largest proportion of cells for which activity was significantly explained by the GLM (Table 1, Fig. 2D, and Supplementary Figs S8, S9 for single examples of cells responding more to human or monkey faces, respectively). In this case the proportion of responses to the 4 factors as a whole differed significantly between the hippocampus and TE (P = 0.0022, χ2-test); inferotemporal cells being overall more sensitive to coding facial category that hippocampal cells.

A GLM analysis was also used to investigate the selectivity of voice-selective cells with three factors characterizing stimuli in the experimental dataset: (1) familiarity (known/unknown), (2) species (monkey/human), (3) gender (female/male). We found that only 4 (14%) of the 28 hippocampal cells responded in a differential fashion to at least one of these three factors (Table 1). These proportions were smaller than the proportions observed for face-selective cells, though not significantly (P = 0.29, χ2-test). In TE, 4 (11%) of the 37 neurons responded in a differential fashion to at least one of the three factors (familiarity, species, gender, Table 1). These proportions were significantly smaller than the proportions observed for face-selective cells in the same structure (P = 2.10−9, χ2-test), showing that in the inferotemporal cortex, responses to voices are less informative than responses to faces as was also the case in the hippocampus.

Representation of Familiarity

To assess if stimuli of known (rather than of unknown/not personally familiar) individuals are driving neurons' activity, we calculated the number of known images/sounds generating a response in face/voice-selective cells (Viskontas et al. 2009). In the hippocampus, there were 2.13 ± 0.37 pictures of known individuals eliciting a selective response in face-selective neurons, compared with 2 ± 0.32 pictures of unknown individuals. There were 0.64 ± 0.16 sounds of known individuals eliciting a selective response in voice-selective neurons, compared with 0.71 ± 0.21 sounds of unknown individuals. These numbers did not differ significantly (P = 0.68, P = 0.77, paired t-tests) showing that known individuals were not preferentially coded compared with unknown individuals by hippocampal neurons in the monkey. In TE, 2.03 ± 0.4 pictures of familiar individuals elicited a selective response in face-selective neurons, compared with 2.31 ± 0.37 pictures of unfamiliar individuals. There were 0.78 ± 0.12 sounds of familiar individuals that elicited a selective response in voice-selective neurons, compared with 0.84 ± 0.17 sounds of unfamiliar individuals. These amounts did not differ significantly (P = 0.29, P = 0.82, paired t-tests) showing that familiar individuals were not preferentially coded compared with unfamiliar individuals by inferotemporal neurons, as was observed in the hippocampus (P = 1, χ2-test).

Note that both stimuli of known and unknown individuals and objects have been seen/heard by the monkeys hundreds of times prior to the recording sessions (both stimuli being thus visually or acoustically familiar). Therefore, the results shown above do not generalize to novel images, for example, shown maximally a few times in a lifetime, which we have not presented in this study, and only concerns stimuli of unfamiliar (in the sense not personally familiar) individuals presented several times prior to the experiment.

Response to Identities

Binary classifiers tested for face-selective cells' invariant coding of facial identity throughout viewpoint, compared with the rest of the visual stimuli. For each face-selective cell, the best AUC of the 12 ROC curves was calculated (see Supplementary Figs S2C and S8C for examples). The greater the AUC, the higher a cell discriminates between the three viewpoints of one individual against other stimuli, and thus the higher is cell invariance for facial identity. In the hippocampus, the mean invariance of face-selective cells activity to facial identities was AUCbest,HPC = 0.83, (Fig. 3A top) which did not differ significantly from the mean invariance of face-selective cells in TE (AUCbest,TE = 0.86, P > 0.05, 2-sided Wilcoxon rank sum test, Fig. 3A). The mean invariance of voice-selective cells activity to vocal identities was calculated (see Supplementary Figs S6C and S10C,D for examples) and was found to be lower compared with face-selective cells identity invariance though not significantly in hippocampus (AUCbest,HPC = 0.80, Fig. 3A top). In TE, the mean invariance of voice-selective cells activity to vocal identities was found to be significantly lower compared with face-selective cells invariance to facial identities (AUCbest,TE = 0.77, P = 0.006, 2-sided Wilcoxon test, Fig. 3A bottom). Even though many cells coded for the different facial views of at least one individual in a similar fashion, these cells did not necessarily respond to a unique individual. On the contrary, an analysis of the sparseness of the cells (i.e., whether cells were active for most stimuli or for few stimuli) showed that only very few cells (e.g., Supplementary Fig. S5) displayed high selectivity, in both areas (see below).

Figure 3.

Identity coding. (A) Distribution of best significant AUC of each face and voice-selective cell in the hippocampus and TE. (B) Distribution of correlation coefficients for face–voice-selective cells from the hippocampus and TE, as compared with the chance distribution obtained by permuting 719 times the vocal identities with regard to their corresponding facial identity for the same set of cells. (C). Scatter graph of the response to the best face (respectively best voice) in face-selective and face-responsive neurons (respectively, voice-selective or responsive neurons, y-axis) against the response of the same neurons to the corresponding voice (respectively corresponding face, x-axis) in hippocampus (top) and TE (bottom). Only neurons whose best response is for a familiar individual are represented. See Supplementary Figure S11D for a depiction of the comparison when the best response is for an unfamiliar individual.

Figure 3.

Identity coding. (A) Distribution of best significant AUC of each face and voice-selective cell in the hippocampus and TE. (B) Distribution of correlation coefficients for face–voice-selective cells from the hippocampus and TE, as compared with the chance distribution obtained by permuting 719 times the vocal identities with regard to their corresponding facial identity for the same set of cells. (C). Scatter graph of the response to the best face (respectively best voice) in face-selective and face-responsive neurons (respectively, voice-selective or responsive neurons, y-axis) against the response of the same neurons to the corresponding voice (respectively corresponding face, x-axis) in hippocampus (top) and TE (bottom). Only neurons whose best response is for a familiar individual are represented. See Supplementary Figure S11D for a depiction of the comparison when the best response is for an unfamiliar individual.

Sparseness of Face-selective and Voice-selective Cells Activity

The selectivity of face and voice-selective cells was first compared by quantifying the number of stimuli eliciting an activity greater than the half-maximum activity of that cell (Perrodin et al. 2011). In the hippocampus, face-selective cells responded to an average of 51% of the visual stimuli, that is, 23/45 of the stimuli elicited responses greater than the half-maximum response (Fig. 4A,D). Similarly to the hippocampus, inferotemporal face-selective cells responded to an average of 42% of the visual stimuli, that is, 19/45 of the visual stimuli elicited responses greater than the half-maximum response (P = 0.13, Wilcoxon test, Fig. 4A,D).

Figure 4.

Sparseness of neurons' activity. (A) Percentage of face stimuli eliciting activity larger than half-maximum in face-selective cells from the hippocampus and area TE. Vertical lines represent the averages (*P < 0.05). (B) Average of the sparseness index for face-selective cells in hippocampus and area TE. Error bars represent SEM. (C) Same as B in areas CA1 and CA3. (D) Normalized responses of each face-selective cell from HPC (left) and TE (right) to all the face stimuli. Cells are sorted according to their mean normalized FR and responses to stimuli are sorted within each line, according to their normalized FR. (EH) Same as in AD, but for voice-selective cells.

Figure 4.

Sparseness of neurons' activity. (A) Percentage of face stimuli eliciting activity larger than half-maximum in face-selective cells from the hippocampus and area TE. Vertical lines represent the averages (*P < 0.05). (B) Average of the sparseness index for face-selective cells in hippocampus and area TE. Error bars represent SEM. (C) Same as B in areas CA1 and CA3. (D) Normalized responses of each face-selective cell from HPC (left) and TE (right) to all the face stimuli. Cells are sorted according to their mean normalized FR and responses to stimuli are sorted within each line, according to their normalized FR. (EH) Same as in AD, but for voice-selective cells.

In the hippocampus voice-selective cells presented a higher selectivity than face-selective cells (average percentage of acoustic stimuli eliciting a response: 34%, that is, 15/45 stimuli, P = 0.0065, Wilcoxon test, Fig. 4E,H). In contrast in TE, the average number of acoustic stimuli eliciting a response was similar to the average number of visual stimuli eliciting a response (45%, i.e., 20/45 stimuli, P = 0.054, Wilcoxon test, Fig. 4E,H) and thus significantly higher than in the hippocampus (P = 0,049, Wilcoxon test, Fig. 4E).

We further analyzed whether face- and voice-selective cells were active for most stimuli or for few stimuli by computing a sparseness index, which ranges from 0 (highly-selective or sparse coding) to 1 (non-selective or dense coding) and obtained the same results. In the hippocampus, the mean sparseness index was high for both types of selective cells (0.83 for face-selective and 0.77 for voice-selective; Fig. 4B,F) indicating dense coding yet sparser for vocal stimuli than for face stimuli (P = 0.030, Student test). Face selectivity did not differ between CA1 and CA3 (0.82 in CA1, 0.84 in CA3, P > 0.05, Student test; Fig. 4C); nor did selectivity for voices (0.81 in CA1, 0.74 in CA3, P > 0.05 Student test; Fig. 4G). In TE, the mean sparseness indexes were high for both face- and voice-selective cells (0.80, 0.81 respectively, P > 0.05, Fig. 4B,F).

Both indexes were similar across modalities and across regions; indicating a dense coding in the hippocampus as was observed in TE–with only a sparser coding for vocal stimuli compared with facial ones in the hippocampus. On the contrary these indexes were higher than those described in some human studies in the hippocampus (Quiroga et al. 2005) and ultra-sparse hippocampal cells were only anecdotally (<1%) observed in our study (e.g., Supplementary Fig. S5).

Face–voice Association

We investigated whether cells encoded identity throughout modalities presentation. Some hippocampal neurons were activated by stimuli from both modalities (“bimodal cells”) (Fig. 2A). 23 (12%) cells were bimodal, compared with 38 (20%) visual and 22 (12%) auditory cells. From these cells, 7 responded to both faces and voices and not to sounds and objects stimuli (Fig. 2B). These latter cells were analyzed in search of activity invariant to modality. We found that these cells' responses to facial identities were poorly correlated to their corresponding vocal identities (ρ = 0.22, P = 0.22, Student test as compared with a distribution of correlations obtained by chance through permutations, Fig. 3B top). Figure 3C also shows that the distribution of the responses to the best familiar face compared with the corresponding voice were biased towards the face response for face-selective neurons while the opposite pattern was found for voice-selective neurons (see Supplementary Fig. S11D for responses to unfamiliar individuals). Thus it appears that although, some cells respond to multiple views of the same individual(s) (e.g., Supplementary Figs S2 and S3), this invariance does not generalize to voice stimuli of the same individual.

In TE, 32 (21%) cells were bimodal, compared with 34 (22%) visual and 13 (8%) auditory cells (Fig. 2A). From these bimodal cells, 14 responded to both faces and voices and not to sounds and objects stimuli (Fig. 2B). These latter cells were analyzed in search of activity invariant to modality. We found that the cells' responses to facial identities were poorly correlated to their corresponding vocal identities (ρ = −0.07, P = 0.61, Student test as compared with a distribution of correlations obtained by chance through permutations, Fig. 3B, bottom) as it was observed in the hippocampus. Similarly comparison of responses to best familiar face compared with the corresponding voice in face-selective and voice-selective neurons were found to be biased toward the prime selectivity of the cell, as was observed in the hippocampus (Fig. 3C and Supplementary Fig. S11D). Overall faces and voices of same individuals are represented by distinct rather than same neurons in the monkey in the hippocampus and in TE.

Conclusion 1

At the single cell level, hippocampal activity to social stimuli appeared to be similar to the activity observed in the inferotemporal cortex. In both regions, we found: (1) the existence of neurons responding differently to social stimuli compared with nonsocial stimuli, but mainly in the visual rather than acoustic modality, (2) that responses to voices are less informative than responses to faces about the subcategory they code for, (3) that neuronal responses to facial and vocal identities are poorly correlated. However compared with inferotemporal neurons, hippocampal neuron tuning was broader, in the sense that they exhibited less fine tuning toward the different social categories. We next tested if these results translate at the population level, by analyzing the activity of all the recorded neurons with unsupervised analyses.

Population Analysis

For each of the 343 recorded neurons, the z-score for each stimulus was calculated as the mean neuronal response to the stimulus divided by its standard deviation. To determine which parameters are driving neuronal activity at the population level in both brain areas, principal component analyses were performed. To this end, all the stimuli from the set were used as observations, and the z-score from each cell for each stimulus of the set were defined as the variables. Projection of stimulus identity on the first 2 principal components of neuronal population z-scores revealed that, visual and auditory stimuli are segregated by neuronal activity, in both regions (Fig. 5A and Supplementary Fig. S12). This segregation was confirmed by clustering with a Gaussian mixture model of the stimuli projection maps. When calculating the Euclidian distance in the principal component space between neuronal responses to every pair of stimuli and further linking the stimuli hierarchically into a cluster tree, we also found similar cluster trees for both regions, with distinction between visual and auditory stimuli (Fig. 5B). Dynamics of neuronal activities also presented dissimilarities across modalities in both regions, with more neurons reaching their peak of activity early in the stimulus presentation period for visual stimuli while peaks of activity were widely distributed along the auditory stimuli presentation (Fig. 5C), indicating that responses to acoustic stimuli were not time-locked to the stimulus onset.

Figure 5.

Independent representation of facial and vocal signals at the population level in the hippocampus and inferotemporal cortex. (A) Projection and clustering of population responses to stimuli based on each neurons' z-scores for each event in HPC (left) and TE (right). (B) Hierarchical trees consisting of many U-shaped lines connecting stimuli. The height of each U represents the distance between the 2 stimuli being connected. A unique color is assigned to each group of nodes in the tree where the linkage is less than a threshold t = 1.3. (C) Distribution of peak latencies in HPC (left) and TE (right) for visual (red) and acoustic (blue) stimuli.

Figure 5.

Independent representation of facial and vocal signals at the population level in the hippocampus and inferotemporal cortex. (A) Projection and clustering of population responses to stimuli based on each neurons' z-scores for each event in HPC (left) and TE (right). (B) Hierarchical trees consisting of many U-shaped lines connecting stimuli. The height of each U represents the distance between the 2 stimuli being connected. A unique color is assigned to each group of nodes in the tree where the linkage is less than a threshold t = 1.3. (C) Distribution of peak latencies in HPC (left) and TE (right) for visual (red) and acoustic (blue) stimuli.

Analysis of the stimuli projection maps (Fig. 5A and Supplementary Fig. S12) and hierarchical trees (Fig. 5B) revealed that not only auditory and visual stimuli but also face and object stimuli appear segregated by neuronal population activities in both regions. The optimal clustering of stimuli, assessed by minimizing Bayesian information of the Gaussian mixture model, was constituted of 3 components comprising acoustic stimuli (blue-green marks), facial stimuli (red-pink circles) and object stimuli (red-pink crosses, Fig. 5A). In the hippocampus, using the same methodology, the best clustering was obtained along the modality dimension. However, note that points corresponding to objects (red-pink crosses) were not included in the lower visual cluster suggesting that all visual stimuli were not coded in the same manner and may be functionally distinguished into the categories of faces and objects (Fig. 5A). In contrast to visual stimuli, neuronal activities to voice and sound stimuli clustered together in each region (Fig. 5A), indicating that both regions map and segregate in-between visual stimuli rather than in-between auditory stimuli. It also suggests, that compared with face-selective cells, voice-selective cells did not drive enough population coding to distinguish between vocal and sound stimuli, possibly due to their less pronounced increase in FRs compared with their baseline activity (2-sided t-test, P = 0.02 in TE, P = 0.028 in the hippocampus, Fig. 5C). In the hierarchical trees (Fig. 5B), vocal stimuli also appear to cluster all-together (blue-green cluster), whereas visual stimuli appear to cluster in 3 groups: a first one encompassing most of the face stimuli (magenta cluster) and 2 encompassing mainly visual objects (yellow-orange and yellow-green clusters).

Conclusion 2

At the population level, hippocampal activity to social stimuli appears to be similar to the activity observed in the inferotemporal cortex. In both regions, we find that: (1) visual stimuli cluster separately from auditory stimuli, (2) they differ in their response dynamics, and (3) faces cluster apart from visual objects while voices and nonvocal sounds cluster together.

Discussion

Faces and voices are the major cues read out and used by primates (including humans and rhesus monkeys) to maneuver in their social environment. They provide essential social information about others' status, such as identity, gender, species, familiarity, etc. (Ghazanfar and Santos 2004; Belin 2006; Leopold and Rhodes 2010). In this study, we contribute to the goal of characterizing how hippocampal neurons in the monkey are activated when monkeys are exposed to these social stimuli through pictures and audio samples. At the single cell and population levels, hippocampal activity to social stimuli appeared to be much more similar to the activity observed in the monkey inferotemporal cortex than we expected. It differed from the identity selective activity observed in human epileptic patients, but it also differed from the poor sensitivity to social stimuli observed in rats. Thus in the monkey hippocampus, we found: (1) the existence of neurons responding differently to social stimuli compared with nonsocial stimuli, (2) only poorly correlated neuronal activities for facial and vocal stimuli identity, and (3) responses to faces more informative than responses to voices about social categories.

Evidence for the Existence of Neurons Responding Differently to Social Stimuli Compared with Nonsocial Stimuli

In humans, the hippocampus might play a role in building and maintaining social relationships and networks (Allen and Fortin 2013), as evidenced by study of human patients with hippocampal impairment who present limited social circles compared with unimpaired controls (Davidson et al. 2012) and by single-unit recordings of human hippocampal cells coding for familiar and well-known individuals (Quiroga et al. 2005; Quian Quiroga et al. 2009; Viskontas et al. 2009). In rats, both lesion studies (Becker et al. 1999) and single-cell recording (von Heimendahl et al. 2012; Zynyuk et al. 2012) rather pointed to a nonimplication of the hippocampus in social representation. While in monkeys, lesions studies remained unclear (Machado and Bachevalier 2006), the present single-unit study reveals that monkeys hippocampus might play a role in social representation, because we show that hippocampal neurons expressed a significant enhancement of their activity to faces or voices, even though their processing was not relevant to the task. Although tested with an unequal number of items for each category, the distinction between faces and objects was sufficiently robust to also be attested at the level of the population when data were analyzed in an unsupervised manner. In this respect, monkey hippocampus appears to participate in representing social information and its role looks more similar to that of the human hippocampus than to that of the rat's one. This result converges with observations made with another technique, functional magnetic resonance imaging (fMRI), showing the presence within the hippocampus of an area having an enhanced activity for faces compared with objects in monkeys (Ku et al. 2011; Lafer-Sousa and Conway 2013) and in humans (Ishai et al. 2005).

Different Responses to Faces and to Voices in the Hippocampus

Representation of identity throughout modalities has been shown to involve the hippocampus in humans, both with fMRI studies using faces and voices (Holdstock et al. 2010; Joassin et al. 2011; Love et al. 2011) and with single-cell recordings using faces and nouns (Quian Quiroga et al. 2009). We thus wondered if single-neuron correlates of cross-modal association of identity could be found in the monkey hippocampus. Both at the single-unit and at the population level, we found that neuronal activity to faces and voices differed significantly. Activity for faces was robust, distinct from that to objects and coding for subcategories of faces. On the contrary, activity for voices was characterized by lower increase in the FR and was found poorly specific. This might come from the fact that voices have been shown to be weaker cues than faces for identifying individuals in humans, both after having been learned in conjunction with faces in a multi-modal setting, as it happens in real-life, (Hanley et al. 1998; Hanley and Turner 2000; Damjanovic and Hanley 2007) and when learned separately from faces in a unimodal setting (Olsson et al. 1998; Joassin et al. 2004; von Kriegstein and Giraud 2006; Mullennix et al. 2009). Since hippocampal neuronal activity might represent memory recall triggered by sensory cues, this could explain why “known” voices elicited fewer responses than “known” faces in our study, as well as why “unknown” voices elicited fewer responses than “unknown” faces. However, differences in neural coding for voices compared with faces have also been observed in other studies investigating other brain regions. For example, voice-selective cells, recorded in the voice area found with fMRI in monkeys (Perrodin et al. 2011) or around it (Kikuchi et al. 2010), are less numerous than face-selective cells found in face areas, they also respond to a smaller amount of voice extracts than face-selective cells do to face exemplars (Baylis et al. 1985; Hasselmo et al. 1989; Rolls and Tovee 1995), and have weaker enhancement in the FR compared with face-selective cells recorded in TE. Thus, the less distinct responses to [vocalizations vs. object-sounds] compared with [faces vs. objects] that we observe in the hippocampus might also alternatively be a perpetuation of different inputs received by the hippocampus from these lower-level regions. Alternatively, the absence of an auditory evoked response can also suggest that the responses to auditory stimuli might arise as feedback signals rather than from feed-forward input. Finally, the behavioral task performed by the monkeys was different during visual and auditory stimuli presentations. While monkeys could inspect the images by freely moving their gaze, they were required to fixate during sound presentation. It could be argued that fixation would lower neuronal activity to the stimuli, as it has been observed in the superior colliculus (Bell et al. 2003), and on the contrary that free-viewing would enhance neuronal responses. However, the free-viewing responses to faces we observe in the hippocampus and the inferotemporal cortex are in the range of those seen with fixation in other studies in the inferotemporal cortex (Baylis et al. 1985; Hasselmo et al. 1989; Rolls and Tovee 1995); also the study having discovered the voice area housing most of the voice-selective neurons has been carried while monkeys were required to fixate (Petkov et al. 2008).

No Evidence of a Cross-modal Coding in the Monkey Hippocampus

While visual and auditory responses appeared being coded at different scales, they could still potentially match qualitatively. By normalizing for mean response amplitudes in each modality, we examined if neuronal responses to facial identities would have a correspondence in the neuronal responses to vocal identities. We found that activities for facial and vocal identities were poorly correlated. In this regard, our results differ from the ones described in human studies, in which same neurons coded for facial identity and for written or orally pronounced noun. It is possible that the higher activity for visual stimuli, compared with acoustic ones, arises from the location of our recording sites within the hippocampus in its more anterior part, which is densely and directly connected to higher visual areas (Rockland and Van Hoesen 1999; Yukie 2000; Zhong and Rockland 2004) and only more sparsely and mainly indirectly to higher auditory areas through polymodal areas (Zhong et al. 2005; Mohedano-Moriano et al. 2007, 2008; Munoz-Lopez et al. 2010). Auditory stimuli might be preferentially coded in a more posterior parts of the hippocampus (Gil-da-Costa et al. 2004; Peters, Koch et al. 2007; Peters, Suchan et al. 2007) which we lacked sampling of. Nevertheless, the cross-modal cells found in humans were sampled throughout the medial temporal lobe including both anterior and posterior parts of the human hippocampus (Quiroga et al. 2005; Quian Quiroga et al. 2009). The specifically human aptitude for naming persons and things, has been shown to participate in the way humans represent, memorize and retrieve complex associations such as full autobiographical episodes (Clark and Squire 2013). It could be wondered if this aptitude also translates in the way cross-modal associations are represented in higher-level areas such as the hippocampus and further, if monkeys and others animals who rely on unnamed associations might lack an explicit coding at the single-cell level for cross-modal identity association. If we missed an explicit code by not prescreening the neurons but if this code yet exists in neurons we have not recorded from, it is not observable at the population level.

Less Sparse Coding of Identity Than in Humans

Because visual and auditory modalities appeared to be coded separately, we investigated invariant representation within modalities, throughout viewpoint or audio extracts. Neurons signaling facial identity were observed as evidenced by view-invariant coding. However, most neurons responded to more than one individual. This result is in accordance with early unsupervised recordings from human hippocampal neurons showing nonsparse hippocampal categorization of visual stimuli including faces (Fried et al. 1997, 2002; Kreiman et al. 2000). Similarly to these studies, we observe that 40% of hippocampal cells respond to stimuli and exhibit a medium range of selectivity (0.5 or 0.6). More recent findings by the same group (Quiroga et al. 2005; Waydo et al. 2006; Quian Quiroga et al. 2009) showed that, when using a supervised recording protocol precisely searching for this type of cells, few ultra-sparse cells (10%) in the human hippocampus represent single individuals in a selective and invariant manner. Because we have not used this type of supervised recording protocol, it is possible that we missed conceptual coding cells, for example, activated in the same manner only by all the pictures of a well-known monkey's face, or only by the audio extracts of a monkey's voice, or a fortiori only by all the pictures of a well-known monkey's face and all the audio extracts of the same monkey's voice.

A Representation of Social Stimuli Which Resembles That of the Inferotemporal Cortex

Recordings of inferotemporal cells, with the exact same stimulus set, material, animals, and protocol conditions, revealed that stimuli were coded in the monkey hippocampus in a way much more similar to the representation already known in the inferotemporal cortex than in the human hippocampus. First, we observed the existence of neurons responding differently to social stimuli compared with nonsocial stimuli; particularly in categorizing faces from non-faces, a classical result observed in the inferotemporal cortex (Perrett et al. 1984; Hasselmo et al. 1989; Sugase et al. 1999). This result also translates at the population level, where responses to faces were separated from that to objects in both regions. The face-selective cells recorded throughout the inferotemporal cortex in classical studies tend to cluster in columns and further in groups of columns (Sato et al. 2013) which can be observed with fMRI as areas preferentially activated by faces over other visual stimuli. These areas have been principally imaged in the temporal and frontal cortex (Logothetis et al. 1999; Tsao et al. 2003; Pinsk et al. 2005; Ku et al. 2011; Lafer-Sousa and Conway 2013), however, one study using an optimized fMRI protocol in monkeys revealed additional face areas outside of this cortical machinery, of which one was located in the hippocampus (Ku et al. 2011). Other studies also observed areas more active for faces than for objects in the hippocampus (Ishai et al. 2005; Lafer-Sousa and Conway 2013). Because of our recording location in the anterior part of the hippocampus and of stimulus-set similarity, we hypothesize that the face-selective cells we observed in our animals could be similar to those that have hypothetically contributed to the fMRI enhanced activity observed by Ku et al. (2011). Hippocampal and inferotemporal face-selective cells we recorded were unraveled using a passive viewing protocol, illustrating that they respond to the visual properties of faces, arguing for a perceptual involvement of both regions. Evidence of some nonface preferring cells that are silent to faces but responsive to all the objects also reinforces this conclusion. This illustrates that monkey hippocampal neurons can categorize nonspatial visual stimuli based on perceptual properties. A follow-up study could address whether faces are the sole correlate of these hippocampal neurons from day to day, or if these cells properties are ephemeral and protocol specific; in which case hippocampal face-selective cells could “remap” into another type of cell in a subsequent protocol (Colgin et al. 2008). It would also be interesting to test whether or not these correlates persists when the stimuli are presented in other places of the spatial view field, as previous studies showed that many monkey hippocampal neurons do have spatial view fields (Rolls and O'Mara 1995).

A Categorization of Facial Stimuli Which Resembles That of the Inferotemporal Cortex

Hippocampal face-selective cells tuning was broader than that of inferotemporal face-selective cells, though a proportion of them exhibited a fine tuning toward facial categories, mainly that of species, familiarity, and viewpoint. Modulation by these characteristics has already been observed in monkey inferotemporal cortex, a result which we reproduce here (species: Sigala et al. 2011; viewpoint: Perrett et al. 1985; De Souza et al. 2005; Freiwald and Tsao 2010, familiarity: Eifuku et al. 2011). Hippocampal neurons in monkeys therefore appear to categorize visual stimuli into a space encompassing perceptual components (face–object, species, or viewpoint distinctions) as well as cognitive components (familiarity). However, this observation is not specific to the hippocampus since familiarity is found to also modulate activity of a portion of face-selective cells in TE, which is in line with fMRI results in TE (Sugiura et al. 2001; Rotshtein et al. 2005) and the hippocampus (Rotshtein et al. 2005; Denkova et al. 2006). In other words in our protocol, this part of the hippocampus does not appear to be more peculiar for mnemonic processes than the inferotemporal cortex. On the contrary, facial information probably reached the hippocampus through feed-forward projections from the temporal cortex. In the hippocampus, the neuronal response to faces was globally less selective, but maintained important information about species or familiarity. Alternatively hippocampal response specifics might also come as feedback from the amygdala, which is anatomically close and also represent facial information (Rolls 1984; Leonard et al. 1985; Gothard et al. 2007; Hoffman et al. 2007; Mosher et al. 2010; Rutishauser et al. 2011; Hadj-Bouziane et al. 2012). However, compared with the monkey and human amygdala, where it was found that a majority of facial responses were inhibitory (Rutishauser et al. 2011) and that the “depth” of an inhibitory response can be just as informationally rich as the height of an excitatory response (Mosher et al. 2010), we have neither found these proportions in the monkey hippocampus (or area TE), nor these specificities.

Conclusion

Codes for social stimuli appear to be present throughout the temporal lobe until the hippocampus: faces and voices seemed to have colonized some processing resources of the monkey hippocampus. However, in this anterior part of the primate hippocampus, cognitive maps of visual—rather than auditory or bimodal—stimuli predominate, probably participating in the read-out of individuals' facial information rather than vocal one. An explicit conceptual coding of identity, as discovered in human hippocampus, has not been observed in the present study. These conclusions were not available in the preexistent literature, which focused mainly on secondary visual/auditory regions of the temporal cortex for face and voice processing while monkey hippocampal neurons were studied for their involvement in memory or navigation. The hippocampus plays a crucial role in autobiographical/episodic memory and social events make up those memories to a large extent. Thus the presence of cells representing social cues in the hippocampus constitutes the prime step to understand how social cues are incorporated into memories. The existence of neurons explicitly coding identity throughout modalities might be found in more posterior parts of the hippocampus or in other parts of the monkey brain (Kuraoka and Nakamura 2007; Romanski 2012).

Supplementary Material

Supplementary material can be found at: http://www.cercor.oxfordjournals.org/

Funding

This work was supported by a Marie Curie reintegration grant and a salary grant from Fondation pour la Recherche Médicale to S.W.; a PhD grant cofinanced by Centre National de la Recherche Scientifique and by Direction Générale de l'Armement to J.S.; Fondation pour la Recherche Médicale; Association des Femmes Françaises Diplômées d'Université-Dorothy Leet; Fondation Bettencourt-Schueller to J.S; and a grant by Agence Nationale de la Recherche BLAN-1431-01 and ANR-11-IDEX-0007 to J.R.D.

Notes

The authors thank S. Wiener for helpful comments on earlier versions of the manuscript; P. Baraduc for discussion on statistical analyses; J.-L. Charieau and F. Hérant for animal care. Conflict of Interest: None declared.

References

I
Hampton
RR
.
2011
.
Rhesus monkeys see who they hear: spontaneous cross-modal memory for familiar conspecifics
.
PLoS ONE
.
6
:
e23345
.
Allen
TA
Fortin
NJ
.
2013
.
The evolution of episodic memory
.
.
110
:
10379
10386
.
Bayley
PJ
Frascino
JC
Squire
LR
.
2005
.
Robust habit learning in the absence of awareness and independent of the medial temporal lobe
.
Nature
.
436
:
550
553
.
Baylis
GC
Rolls
ET
Leonard
CM
.
1985
.
Selectivity between faces in the responses of a population of neurons in the cortex in the superior temporal sulcus of the monkey
.
Brain Res Cogn Brain Res
.
342
:
91
102
.
Becker
A
Grecksch
G
Bernstein
HG
Hollt
V
Bogerts
B
.
1999
.
Social behaviour in rats lesioned with ibotenic acid in the hippocampus: quantitative and qualitative analysis
.
Psychopharmacology
.
144
:
333
338
.
Belin
P
.
2006
.
Voice processing in human and non-human primates
.
Philos Trans R Soc Lond B Biol Sci
.
361
:
2091
2107
.
Bell
AH
Corneil
BD
Munoz
DP
Meredith
MA
.
2003
.
Engagement of visual fixation suppresses sensory responsiveness and multisensory integration in the primate superior colliculus
.
Eur J Neurosci
.
18
:
2867
2873
.
Bird
CM
Burgess
N
.
2008
.
The hippocampus and memory: insights from spatial processing
.
Nat Rev Neurosci
.
9
:
182
194
.
Bruce
C
Desimone
R
Gross
CG
.
1981
.
Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque
.
J Neurophysiol
.
46
:
369
384
.
Bruce
V
Young
A
.
1986
.
Understanding face recognition
.
Br J Psychol
.
77
(Pt 3)
:
305
327
.
Burton
AM
Bruce
V
Johnston
RA
.
1990
.
Understanding face recognition with an interactive activation model
.
Br J Psychol
.
81
(Pt 3)
:
361
380
.
Campanella
S
Belin
P
.
2007
.
Integrating face and voice in person perception
.
Trends Cogn Sci
.
11
:
535
543
.
Canolty
RT
Ganguly
K
Kennerley
SW
CF
Koepsell
K
Wallis
JD
Carmena
JM
.
2010
.
Oscillatory phase coupling coordinates anatomically dispersed functional cell assemblies
.
.
107
:
17356
17361
.
Clark
RE
Squire
LR
.
2013
.
Similarity in form and function of the hippocampus in rodents, monkeys, and humans
.
.
110
:
10365
10370
.
Clayton
NS
Russell
J
.
2009
.
Looking for episodic memory in animals and young children: prospects for a new minimalism
.
Neuropsychologia
.
47
:
2330
2340
.
Colgin
LL
Moser
EI
Moser
MB
.
2008
.
Understanding memory through hippocampal remapping
.
Trends Neurosci
.
31
:
469
477
.
Cunningham
JP
Yu
BM
.
2014
.
Dimensionality reduction for large-scale neural recordings
.
Nat Neurosci (

.
Dahl
CD
Logothetis
NK
Hoffman
KL
.
2007
.
Individuation and holistic processing of faces in rhesus monkeys
.
Proc R Soc B Biol Sci
.
274
:
2069
2076
.
Damjanovic
L
Hanley
JR
.
2007
.
Recalling episodic and semantic information about famous faces and voices
.
Mem Cognit
.
35
:
1205
1210
.
Davidson
PS
Drouin
H
Kwan
D
Moscovitch
M
Rosenbaum
RS
.
2012
.
Memory as social glue: Close interpersonal relationships in amnesic patients
.
Front Psychol
.
3
:
531
.
Denkova
E
Botzung
A
Manning
L
.
2006
.
Neural correlates of remembering/knowing famous people: an event-related fMRI study
.
Neuropsychologia
.
44
:
2783
2791
.
De Souza
WC
Eifuku
S
Tamura
R
Nishijo
H
Ono
T
.
2005
.
Differential characteristics of face neuron responses within the anterior superior temporal sulcus of macaques
.
J Neurophysiol
.
94
:
1252
1266
.
Eichenbaum
H
.
1993
.
.
Science
.
261
:
993
994
.
Eifuku
S
De Souza
WC
Nakata
R
Ono
T
Tamura
R
.
2011
.
Neural representations of personally familiar and unfamiliar faces in the anterior inferior temporal cortex of monkeys
.
PLoS One
.
6
:
e18913
.
Ellis
HD
Jones
DM
Mosdell
N
.
1997
.
Intra- and inter-modal repetition priming of familiar faces and voices
.
Br J Psychol
.
88
(Pt 1)
:
143
156
.
Freiwald
WA
Tsao
DY
.
2010
.
Functional compartmentalization and viewpoint generalization within the macaque face-processing system
.
Science
.
330
:
845
851
.
Fried
I
Cameron
KA
Yashar
S
Fong
R
Morrow
JW
.
2002
.
Inhibitory and excitatory responses of single neurons in the human medial temporal lobe during recognition of faces and objects
.
Cereb Cortex
.
12
:
575
584
.
Fried
I
Macdonald
KA
Wilson
CL
.
1997
.
Single neuron activity in human hippocampus and amygdala during recognition of faces and objects
.
Neuron
.
18
:
753
765
.
Ghazanfar
AA
Santos
LR
.
2004
.
Primate brains in the wild: the sensory bases for social interactions
.
Nat Rev Neurosci
.
5
:
603
616
.
Gil-Da-Costa
R
Braun
A
Lopes
M
Hauser
MD
Carson
RE
Herscovitch
P
Martin
A
.
2004
.
Toward an evolutionary perspective on conceptual representation: species-specific calls activate visual and affective processing systems in the macaque
.
.
101
:
17516
17521
.
Gothard
KM
Battaglia
FP
Erickson
CA
Spitler
KM
Amaral
DG
.
2007
.
Neural responses to facial expression and face identity in the monkey amygdala
.
J Neurophysiol
.
97
:
1671
1683
.
Gothard
KM
Erickson
CA
Amaral
DG
.
2004
.
How do rhesus monkeys (Macaca mulatta) scan faces in a visual paired comparison task?
Anim Cogn
.
7
:
25
36
.
Gross
CG
Rocha-Miranda
CE
Bender
DB
.
1972
.
Visual properties of neurons in inferotemporal cortex of the Macaque
.
J Neurophysiol
.
35
:
96
111
.
Habbershon
HM
Ahmed
SZ
Cohen
YE
.
2013
.
Rhesus macaques recognize unique multimodal face–voice relations of familiar individuals and not of unfamiliar ones
.
Brain Behav Evol
.
81
:
219
225
.
F
Liu
N
Bell
AH
Gothard
KM
Luh
W-M
Tootell
RBH
Murray
EA
Ungerleider
LG
.
2012
.
Amygdala lesions disrupt modulation of functional MRI activity evoked by facial expression in the monkey inferior temporal cortex
.

USA
.
109
:
E3640
E3648
.
Hampson
RE
Pons
TP
Stanford
TR
SA
.
2004
.
Categorization in the monkey hippocampus: a possible mechanism for encoding information into memory
.
.
101
:
3184
3189
.
Hampton
RR
Schwartz
BL
.
2004
.
Episodic memory in nonhumans: what, and where, is when?
Curr Opin Neurobiol
.
14
:
192
197
.
Hanley
JR
Smith
ST
J
.
1998
.
I recognise you but i can't place you: An investigation of familiar-only experiences during tests of voice and face recognition
.
Q J Exp Psychol A
.
51
:
179
195
.
Hanley
JR
Turner
JM
.
2000
.
Why are familiar-only experiences more frequent for voices than for faces?
Q J Exp Psychol A
.
53
:
1105
1116
.
Hasselmo
ME
Rolls
ET
Baylis
GC
.
1989
.
The role of expression and identity in the face-selective responses of neurons in the temporal visual cortex of the monkey
.
Behav Brain Res
.
32
:
203
218
.
Hays
AV
Richmond
BJ
Optican
LM
.
1982
.
A UNIX-based multiple-process system for real-time data acquisition and control
.
WESCON Conference Proceedings
.
Bethesda
:
National Eye Institute
,
2
:
1
10
.
Hoffman
KL
Gothard
KM
Schmid
MC
Logothetis
NK
.
2007
.
Facial-expression and gaze-selective responses in the monkey amygdala
.
Curr Biol
.
17
:
766
772
.
Hoffman
KL
Mcnaughton
BL
.
2002
.
Coordinated reactivation of distributed memory traces in primate neocortex
.
Science
.
297
:
2070
2073
.
Holdstock
JS
Crane
J
Bachorowski
JA
Milner
B
.
2010
.
Equivalent activation of the hippocampus by face–face and face–laugh paired associate learning and recognition
.
Neuropsychologia
.
48
:
3757
3771
.
Ishai
A
Schmidt
CF
Boesiger
P
.
2005
.
Face perception is mediated by a distributed cortical network
.
Brain Res Bull
.
67
:
87
93
.
Ison
MJ
Mormann
F
Cerf
M
Koch
C
Fried
I
Quian Quiroga
R
.
2011
.
Selectivity of pyramidal cells and interneurons in the human medial temporal lobe
.
J Neurophysiol
.
106
:
1713
1721
.
Joassin
F
Maurage
P
Bruyer
R
Crommelinck
M
Campanella
S
.
2004
.
When audition alters vision: An event-related potential study of the cross-modal interactions between faces and voices
.
Neurosci Lett
.
369
:
132
137
.
Joassin
F
Pesenti
M
Maurage
P
Verreckt
E
Bruyer
R
Campanella
S
.
2011
.
Cross-modal interactions between human faces and voices involved in person recognition
.
Cortex
.
47
:
367
376
.
Jutras
MJ
Buffalo
EA
.
2010
.
Recognition memory signals in the macaque hippocampus
.
.
107
:
401
406
.
Kikuchi
Y
Horwitz
B
Mishkin
M
.
2010
.
Hierarchical auditory processing directed rostrally along the monkey's supratemporal plane
.
J Neurosci
.
30
:
13021
13030
.
Konorski
J
.
1967
.
Some new ideas concerning the physiological mechanisms of perception
.
Acta Biol Exp (Warsz)
.
27
:
147
161
.
Kreiman
G
Koch
C
Fried
I
.
2000
.
Category-specific visual responses of single neurons in the human medial temporal lobe
.
Nat Neurosci
.
3
:
946
953
.
Kriegeskorte
N
Mur
M
Ruff
DA
Kiani
R
Bodurka
J
Esteky
H
Tanaka
K
Bandettini
PA
.
2008
.
Matching categorical object representations in inferior temporal cortex of man and monkey
.
Neuron
.
60
:
1126
1141
.
Ku
S-P
Tolias
AS
Logothetis
NK
Goense
J
.
2011
.
fMRI of the face-processing network in the ventral temporal lobe of awake and anesthetized macaques
.
Neuron
.
70
:
352
362
.
Kuraoka
K
Nakamura
K
.
2007
.
Responses of single neurons in monkey amygdala to facial and vocal emotions
.
J Neurophysiol
.
97
:
1379
1387
.
Lafer-Sousa
R
Conway
BR
.
2013
.
Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex
.
Nat Neurosci
.
16
:
1870
1878
.
Leonard
CM
Rolls
ET
Wilson
FA
Baylis
GC
.
1985
.
Neurons in the amygdala of the monkey with responses selective for faces
.
Behav Brain Res
.
15
:
159
176
.
Leopold
DA
Rhodes
G
.
2010
.
A comparative view of face perception
.
J Comp Psychol
.
124
:
233
251
.
Logothetis
NK
Guggenberger
H
Peled
S
Pauls
J
.
1999
.
Functional imaging of the monkey brain
.
Nat Neurosci
.
2
:
555
562
.
Love
SA
Pollick
FE
Latinus
M
.
2011
.
Cerebral correlates and statistical criteria of cross-modal face and voice integration
.
Seeing Perceiving
.
24
:
351
367
.
CJ
Bachevalier
J
.
2006
.
The impact of selective amygdala, orbital frontal cortex, or hippocampal formation lesions on established social relationships in rhesus monkeys (Macaca mulatta)
.
Behav Neurosci
.
120
:
761
786
.
Matsumura
N
Nishijo
H
Tamura
R
Eifuku
S
Endo
S
Ono
T
.
1999
.
Spatial- and task-dependent neuronal responses during real and virtual translocation in the monkey hippocampal formation
.
J Neurosci
.
19
:
2381
2393
.
Meyer
K
Damasio
A
.
2009
.
Convergence and divergence in a neural architecture for recognition and memory
.
Trends Neurosci
.
32
:
376
382
.
Mohedano-Moriano
A
Martinez-Marcos
A
Pro-Sistiaga
P
Blaizot
X
Arroyo-Jimenez
MM
Marcos
P
Artacho-Pérula
E
Insausti
R
.
2008
.
Convergence of unimodal and polymodal sensory input to the entorhinal cortex in the fascicularis monkey
.
Neuroscience
.
151
:
255
271
.
Mohedano-Moriano
A
Pro-Sistiaga
P
Arroyo-Jimenez
MM
Artacho-Pérula
E
Insausti
AM
Marcos
P
S
Martínez-Ruiz
J
Muñoz
M
Blaizot
X
et al
2007
.
Topographical and laminar distribution of cortical input to the monkey entorhinal cortex
.
J Anat
.
211
:
250
260
.
Mosher
CP
Zimmerman
PE
Gothard
KM
.
2010
.
Response characteristics of basolateral and centromedial neurons in the primate amygdala
.
J Neurosci
.
30
:
16197
16207
.
Mruczek
RE
Sheinberg
DL
.
2012
.
Stimulus selectivity and response latency in putative inhibitory and excitatory neurons of the primate inferior temporal cortex
.
J Neurophysiol
.
108
:
2725
2736
.
Mullennix
JW
Ross
A
Smith
C
Kuykendall
K
Conard
J
Barb
S
.
2009
.
Typicality effects on memory for voice: Implications for earwitness testimony
.
Appl Cognitive Psych
.
25
:
29
34
.
Munoz-Lopez
MM
Mohedano-Moriano
A
Insausti
R
.
2010
.
Anatomical pathways for auditory memory in primates
.
Front Neuroanat
.
4
:
129
.
Olsson
N
Juslin
P
Winman
A
.
1998
.
Realism of confidence in earwitness versus eyewitness identification
.
J Exp Psychol-Appl
.
4
:
101
118
.
Parr
LA
Winslow
JT
Hopkins
WD
De Waal
FB
.
2000
.
Recognizing facial cues: individual discrimination by chimpanzees (Pan troglodytes) and rhesus monkeys (Macaca mulatta)
.
J Comp Psychol
.
114
:
47
60
.
Patterson
K
Nestor
PJ
Rogers
TT
.
2007
.
Where do you know what you know? The representation of semantic knowledge in the human brain
.
Nat Rev Neurosci
.
8
:
976
987
.
Paxinos
G
Huang
X-F
Toga
AW
.
2000
.
The Rhesus monkey brain in stereotaxic coordinates
.
San Diego; London; Boston
:
.
Perrett
DI
Rolls
ET
Caan
W
.
1982
.
Visual neurones responsive to faces in the monkey temporal cortex
.
Exp Brain Res
.
47
:
329
342
.
Perrett
DI
Smith
PA
Potter
DD
Mistlin
AJ
AS
Milner
Jeeves
MA
.
1984
.
Neurones responsive to faces in the temporal cortex: studies of functional organization, sensitivity to identity and relation to perception
.
Hum Neurobiol
.
3
:
197
208
.
Perrett
DI
Smith
PA
Potter
DD
Mistlin
AJ
AS
Milner
Jeeves
MA
.
1985
.
Visual cells in the temporal cortex sensitive to face view and gaze direction
.
Proc R Soc Lond B Biol Sci
.
223
:
293
317
.
Perrodin
C
Kayser
C
Logothetis
NK
Petkov
CI
.
2011
.
Voice cells in the primate temporal lobe
.
Curr Biol
.
21
:
1408
1415
.
Peters
J
Koch
B
Schwarz
M
Daum
I
.
2007
.
Domain-specific impairment of source memory following a right posterior medial temporal lobe lesion
.
Hippocampus
.
17
:
505
509
.
Peters
J
Suchan
B
Köster
O
Daum
I
.
2007
.
Domain-specific retrieval of source information in the medial temporal lobe
.
Eur J Neurosci
.
26
:
1333
1343
.
Petkov
CI
Kayser
C
Steudel
T
Whittingstall
K
Augath
M
Logothetis
NK
.
2008
.
A voice region in the monkey brain
.
Nat Neurosci
.
11
:
367
374
.
Petrulis
A
Alvarez
P
Eichenbaum
H
.
2005
.
Neural correlates of social odor recognition and the representation of individual distinctive social odors within entorhinal cortex and ventral subiculum
.
Neuroscience
.
130
:
259
274
.
Petrulis
A
Peng
M
Johnston
RE
.
2000
.
The role of the hippocampal system in social odor discrimination and scent-marking in female golden hamsters (Mesocricetus auratus)
.
Behav Neurosci
.
114
:
184
195
.
Pinsk
MA
Desimone
K
Moore
T
Gross
CG
Kastner
S
.
2005
.
Representations of faces and body parts in macaque temporal cortex: a functional MRI study
.
.
102
:
6996
7001
.
Quian Quiroga
R
A
Koch
C
Fried
I
.
2009
.
Explicit encoding of multimodal percepts by single neurons in the human brain
.
Curr Biol
.
19
:
1308
1313
.
Quiroga
RQ
.
2012
.
Concept cells: the building blocks of declarative memory functions
.
Nat Rev Neurosci
.
13
:
587
597
.
Quiroga
RQ
Reddy
L
Kreiman
G
Koch
C
Fried
I
.
2005
.
Invariant visual representation by single neurons in the human brain
.
Nature
.
435
:
1102
1107
.
Rendall
D
Rodman
PS
Emond
RE
.
1996
.
Vocal recognition of individuals and kin in free-ranging rhesus monkeys
.
Anim Behav
.
51
:
1007
1015
.
Rockland
KS
Van Hoesen
GW
.
1999
.
Some temporal and parietal cortical connections converge in CA1 of the primate hippocampus
.
Cereb Cortex
.
9
:
232
237
.
Rolls
ET
.
1984
.
Neurons in the cortex of the temporal lobe and in the amygdala of the monkey with responses selective for faces
.
Hum Neurobiol
.
3
:
209
222
.
Rolls
ET
Cahusac
PM
Feigenbaum
JD
Miyashita
Y
.
1993
.
Responses of single neurons in the hippocampus of the macaque related to recognition memory
.
Exp Brain Res
.
93
:
299
306
.
Rolls
ET
O'mara
SM
.
1995
.
View-responsive neurons in the primate hippocampal complex
.
Hippocampus
.
5
:
409
424
.
Rolls
ET
Tovee
MJ
.
1995
.
Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex
.
J Neurophysiol
.
73
:
713
726
.
Rolls
ET
Xiang
J
Franco
L
.
2005
.
Object, space, and object-space representations in the primate hippocampus
.
J Neurophysiol
.
94
:
833
844
.
Romanski
LM
.
2012
.
Integration of faces and vocalizations in ventral prefrontal cortex: Implications for the evolution of audiovisual speech
.
.
109
:
10717
10724
.
Rotshtein
P
Henson
RNA
Treves
A
Driver
J
Dolan
RJ
.
2005
.
Morphing Marilyn into Maggie dissociates physical and identity face representations in the brain
.
Nat Neurosci
.
8
:
107
113
.
Rutishauser
U
Schuman
EM
Mamelak
AN
.
2008
.
Activity of human hippocampal and amygdala neurons during retrieval of declarative memories
.
.
105
:
329
334
.
Rutishauser
U
Tudusciuc
O
Neumann
D
Mamelak
AN
Heller
AC
Ross
IB
Philpott
L
Sutherling
WW
R
.
2011
.
Single-unit responses selective for whole faces in the human amygdala
.
Curr Biol
.
21
:
1654
1660
.
Saleem
KS
Logothetis
NK
.
2006
.
Atlas of the rhesus monkey brain in stereotaxic coordinates a combined MRI and histology
.
London [USA]
:
.
Sato
T
Uchida
G
Lescroart
MD
Kitazono
J
M
Tanifuji
M
.
2013
.
Object representation in inferior temporal cortex is organized hierarchically in a mosaic-like structure
.
J Neurosci
.
33
:
16642
16656
.
Sigala
R
Logothetis
NK
Rainer
G
.
2011
.
Own-species bias in the representations of monkey and human face categories in the primate temporal lobe
.
J Neurophysiol
.
105
:
2740
2752
.
Sliwa
J
Duhamel
JR
Pascalis
O
Wirth
S
.
2011
.
Spontaneous voice-face identity matching by rhesus monkeys for familiar conspecifics and humans
.
.
108
:
1735
1740
.
Stein
BE
Burr
D
Constantinidis
C
Laurienti
PJ
Alex Meredith
M
Perrault
TJ
Ramachandran
R
Röder
B
Rowland
BA
Sathian
K
et al
2010
.
Semantic confusion regarding the development of multisensory integration: a practical solution
.
Eur J Neurosci
.
31
:
1713
1720
.
Sugase
Y
Yamane
S
Ueno
S
Kawano
K
.
1999
.
Global and fine information coded by single neurons in the temporal visual cortex
.
Nature
.
400
:
869
873
.
Sugiura
M
Kawashima
R
Nakamura
K
Sato
N
Nakamura
A
Kato
T
Hatano
K
Schormann
T
Zilles
K
Sato
K
et al
2001
.
Activation reduction in anterior temporal cortices during repeated recognition of faces of personal acquaintances
.
NeuroImage
.
13
:
877
890
.
Tamura
R
Ono
T
Fukuda
M
Nakamura
K
.
1992
.
Spatial responsiveness of monkey hippocampal neurons to various visual and auditory stimuli
.
Hippocampus
.
2
:
307
322
.
Tamura
R
Ono
T
Fukuda
M
Nishijo
H
.
1992
.
Monkey hippocampal neuron responses to complex sensory stimulation during object discrimination
.
Hippocampus
.
2
:
287
306
.
Tsao
DY
Freiwald
WA
Knutsen
TA
Mandeville
JB
Tootell
RB
.
2003
.
Faces and objects in macaque cerebral cortex
.
Nat Neurosci
.
6
:
989
995
.
Tsao
DY
Freiwald
WA
Tootell
RB
Livingstone
MS
.
2006
.
A cortical region consisting entirely of face-selective cells
.
Science
.
311
:
670
674
.
Viskontas
IV
Ekstrom
Wilson
CL
Fried
I
.
2007
.
Characterizing interneuron and pyramidal cells in the human medial temporal lobe in vivo using extracellular recordings
.
Hippocampus
.
17
:
49
57
.
Viskontas
IV
Quiroga
RQ
Fried
I
.
2009
.
Human medial temporal lobe neurons respond preferentially to personally relevant images
.
.
106
:
21329
21334
.
Von Heimendahl
M
Rao
RP
Brecht
M
.
2012
.
Weak and nondiscriminative responses to conspecifics in the rat hippocampus
.
J Neurosci
.
32
:
2129
2141
.
Von Kriegstein
K
Giraud
AL
.
2006
.
Implicit multisensory associations influence voice recognition
.
PLoS Biol
.
4
:
e326
.
Von Kriegstein
K
Kleinschmidt
A
Sterzer
P
Giraud
AL
.
2005
.
Interaction of face and voice areas during speaker recognition
.
J Cogn Neurosci
.
17
:
367
376
.
Waitt
C
Buchanan-Smith
HM
.
2006
.
Perceptual considerations in the use of colored photographic and video stimuli to study nonhuman primate behavior
.
Am J Primatol
.
68
:
1054
1067
.
Waydo
S
A
Quian Quiroga
R
Fried
I
Koch
C
.
2006
.
Sparse representation in the human medial temporal lobe
.
J Neurosci
.
26
:
10232
10234
.
Wirth
S
Yanike
M
Frank
LM
Smith
AC
Brown
EN
Suzuki
WA
.
2003
.
Single neurons in the monkey hippocampus and learning of new associations
.
Science
.
300
:
1578
1581
.
Xiang
JZ
Brown
MW
.
1998
.
Differential neuronal encoding of novelty, familiarity and recency in regions of the anterior temporal lobe
.
Neuropharmacology
.
37
:
657
676
.
Yamane
S
Kaji
S
Kawano
K
.
1988
.
What facial features activate face neurons in the inferotemporal cortex of the monkey?
Exp Brain Res
.
73
:
209
214
.
Yanike
M
Wirth
S
Suzuki
WA
.
2004
.
Representation of well-learned information in the monkey hippocampus
.
Neuron
.
42
:
477
487
.
Yukie
M
.
2000
.
Connections between the medial temporal cortex and the CA1 subfield of the hippocampal formation in the Japanese monkey (Macaca fuscata)
.
J Comp Neurol
.
423
:
282
298
.
Zhong
YM
Rockland
KS
.
2004
.
Connections between the anterior inferotemporal cortex (area TE) and CA1 of the hippocampus in monkey
.
Exp Brain Res
.
155
:
311
319
.
Zhong
Y-M
Yukie
M
Rockland
KS
.
2005
.
Direct projections from CA1 to the superior temporal sulcus in the monkey, revealed by single axon analysis
.
Brain Res
.
1035
:
211
214
.
Zynyuk
L
Huxter
J
Muller
RU
Fox
SE
.
2012
.
The presence of a second rat has only subtle effects on the location-specific firing of hippocampal place cells
.
Hippocampus
.
22
:
1405
1416
.