Long-term familiarity facilitates recognition of visual stimuli. To better understand the neural basis for this effect, we measured the local field potential (LFP) and multiunit spiking activity (MUA) from the inferior temporal (IT) lobe of behaving monkeys in response to novel and familiar images. In general, familiar images evoked larger amplitude LFPs whereas MUA responses were greater for novel images. Familiarity effects were attenuated by image rotations in the picture plane of 45°. Decreasing image contrast led to more pronounced decreases in LFP response magnitude for novel, compared with familiar images, and resulted in more selective MUA response profiles for familiar images. The shape of individual LFP traces could be used for stimulus classification, and classification performance was better for the familiar image category. Recording the visual and auditory evoked LFP at multiple depths showed significant alterations in LFP morphology with distance changes of 2 mm. In summary, IT cortex shows local processing differences for familiar and novel images at a time scale and in a manner consistent with the observed behavioral advantage for classifying familiar images and rapidly detecting novel stimuli.
Familiarity with images supports visual expertise. Trained observers are better at recognizing familiar stimuli in noisy environments and under conditions of target uncertainty. For example, experienced mammographers are quicker and more accurate than trainee radiologists at recognizing potential breast cancers (Nodine et al. 1996, 1999, 2002). Visual discriminations and search is more efficient for familiar, compared with novel items (Wang et al. 1994; Li et al. 2002; Mruczek and Sheinberg 2005).
In humans, visual expertise is accompanied by an enhancement in visual event related potentials (ERPs). Scott et al. (2006) demonstrated increased N170 and 250 amplitudes for images in expertise categories. Similar results have also been reported for monkeys. Peissig et al. (2006) documented increases in ERP amplitudes to familiar, compared with novel, visual stimuli within 150 ms of target presentation from extracranial posterior electrodes.
Inferior temporal (IT) cortex is a high level visual area in the ventral visual stream. IT neurons show relatively invariant visually selective responses to pictures of objects (Logothetis and Sheinberg 1996; Fujita 2002) IT is reciprocally connected to medial temporal cortices involved in visual memory (Seltzer and Pandya 1991; Suzuki and Amaral 1994) and is therefore well situated anatomically to participate in the differentiation of familiar and novel images.
IT neurons show modulations of their responses to novel and familiar images. Li et al. (1993) screened for visual activity with familiar visual stimuli and then tested monkeys' abilities to report repeated presentations of novel stimuli when there were variable numbers of intervening familiar image presentations. In general, there was a declining neuronal response to novel images with repetition. Using a serial recognition task, Xiang and Brown (1998) found that ∼40% of the visually responsive neurons in macaque temporal cortex showed a decrease in spike magnitude with stimulus familiarity. They also reported “novelty” cells that responded to first presentations of new stimuli, but not familiar stimuli. Kobatake et al. (1998) measured the response of IT neurons to complex shapes in 5 monkeys with different levels of familiarity with those shapes. Two monkeys trained to discriminate the shapes showed a higher proportion of neurons with robust responses to shapes within the training set compared with the 3 naïve animals. More recently, Freedman et al. (2006) studied the response of IT neurons to rotated versions of familiar images and between novel and familiar images. In general, they found that neurons showed greater stimulus selectivity for familiar items. For rotated versions of familiar items, stimulus selectivity was typically greatest at the familiar, learned orientation. In addition, however, the average spiking response was less to familiar stimuli than novel stimuli due to a greater sustained firing of IT neurons to novel stimuli in the phasic period after the initial transient response (approx. 150–600 ms). In the perirhinal portion of IT, Holscher et al. (2003) found that the response of single neurons increased gradually over a few weeks of experience training, and that this effect could combine with short term response reductions observed within a session.
What is the source of the familiarity related signal augmentation observed in ERPs and how is it related to the modulations of single cell activity recorded in IT? To address these questions we simultaneously recorded local field potential (LFP) and multiunit activity (MUA) in IT. The LFP evoked response is commonly believed to reflect the population response to an incoming volley of afferent activity (Mitzdorf 1987), but there is uncertainty over the spatial extent of the activity captured by local intracortical electrodes. Recent data suggest that LFP responses in visual areas might be quite local. Liu and Newsome (2006) found tuning for direction and speed in the LFP from area MT and concluded that the LFP reflected activity over cortical columns within a few hundred micrometers of the recording site. Kreiman et al. (2006) showed that the LFP signal magnitude (i.e., power in the 1- to 300-Hz band) was selective for individual images in IT. MUA is another measure of local neural activity, which is not strictly redundant to the LFP (Kreiman et al. 2006). The comparison of field potential and spiking activity may offers insights into the role that cortical areas play in information processing (Kreiman et al. 2006; Nielsen et al. 2006).
A technical challenge in addressing these questions is how to select recording sites without bias. Visually responsive recording sites cannot be detected without showing some sort of visual stimulation, but if stimulus familiarity affects neuronal activity, then screening for visual neurons with either familiar or novel images could yield a biased population. To mitigate this issue, we used an alphabet of “strokes” and textures to create a virtually inexhaustible library of novel “blobs” that were used for screening. This allowed us to select visually responsive recording sites with stimuli that were not familiar and which differed from the photographic images used for novel and familiarity testing.
The subjects were 4 adult male rhesus monkeys (Macaca mulatta; identifiers: J, M, O, and S), weighing 9–13 kg. Prior to these experiments, the monkeys were highly familiar with the general behavioral procedures and had participated in other behavioral studies and electrophysiological studies.
The monkeys were first surgically implanted with a single piece titanium head restraint post. In a second surgery, animals were fitted with recording chambers (Horsley–Clark coordinates: +15 anterior, +20 lateral; left hemisphere for monkeys M and S; right hemisphere for monkeys J and O). Two different recording chambers were used. For 2 monkeys (M and O), a raised closed system was used. The raised chamber contained a ball and socket joint controlling the direction of a chronic 18-gauge guide tube. For the other 2 monkeys (J and S), a local modification of the open chamber technique was used with a 16-mm-inner-diameter chamber affixed to the skull with titanium screws and dental acrylic.
All animal surgeries were performed under aseptic conditions using inhaled isoflurane anesthesia and were approved by the Institutional Animal Care and Use Committee at Brown University and carried out in accordance with the guidelines published in the National Institutes of Health (NIH) Guide for the Care and Use of Laboratory Animals (NIH publication no. 86-23, revised 1987).
Eye movements for all animals were recorded with an infrared system (EyeLink II, SR Research, Ontario, Canada) operating at 500 Hz. The analog outputs from the eye tracking hardware were sampled by the control system at 1 kHz and a moving average was stored to disk every 5 ms (200 Hz). Saccades were automatically extracted from offline eye records using a velocity-based algorithm written in C, which marked the start and end time, and start and end position for every saccade on each trial. The parameters of this algorithm were set to reliably detect saccades down to approximately 0.4° in amplitude.
For monkeys M and O we primarily used the Thomas Recording multi-channel system with a 5 channel head (Thomas Recording GmbH, Giessen, Germany). This system employs a 5 channel microdrive with integrated preamplifiers (26 dB gain). Each of 5 quartz coated tungsten/platinum fiber microelectrodes (80 micron diameter quartz, 20-μm-diameter platinum/tungsten core) is capable of independent advancement. Signals are subsequently amplified and filtered for the simultaneous recording of both action potentials and LFP (Eckhorn and Thomas 1993). For some sessions with monkeys O, M, and always for Monkey S, we recorded with single Alpha-Omega (Nazareth, Israel) glass coated tungsten electrodes advanced with a hydraulic microdrive (Kopf Model 670, Tujunga, CA). Signals for LFP were obtained with a Grass HZP amplifier (Warwick, RI) interfaced to a Grass Model 15 Amplifier system (Grass Technologies). Lastly, data collected from Monkey J used the Grass amplifier system, but the electrodes were FHC (Bowdoin, ME) epoxy coated tungsten electrodes driven by a NAN micromanipulator system (NAN drive; NAN instruments, Nazareth, Israel).
We routinely split the neurophysiological signal after preamplification for separate handling of LFP and spike signals. LFP signals were filtered at 1–100 Hz (Thomas System) or 0.3–300 Hz (Grass System) and then digitized at 2500 Hz with a cumulative gain of 5000. Because of the preliminary low pass filtering, our data analyses were carried out on LFP signals down sampled to 500 Hz. Analog signals for spike analyses were sampled at 34 kHz and raw traces were displayed on-line along with on-line automated estimates of spike wave forms, using 2 time-amplitude discrimination windows, as an aid to the investigator during electrode advancement. All data from a trial were saved for off-line analysis. For calculating total power we used the amplitude of the sum of squared voltages after subtracting the mean offset.
IT cortex was located based on the stereotaxic placement of the recording chamber and by counting white–gray matter transitions. Because the animals are still actively participating on ongoing experiments, we do not have precise anatomical locations for the recordings. MRI images from 1 of the monkeys (S) obtained after the experiments were completed and reveal a tract left by the guide tube targeting both the lower bank of the superior temporal sulcus and the lateral convexity of the inferior temporal gyrus. Based on the noted electrode depths and single cell responses properties recorded for other experiments, we believe that our results come from visually responsive anterior IT lobe but not more medial temporal lobe structures including the perirhinal cortex.
Depth Effects on Cortical LFP Evoked Responses
In our earlier work (Peissig et al. 2006), we used extracranial EEG for recording evoked responses. Extracranial EEG is a relatively poor source localizer. To evaluate the precision and specificity of LFP, we recorded the visual and auditory evoked responses from 1 monkey as the recording electrode was systematically progressed in depth.
Figure 1 shows the modulation of the evoked response at various depths during a vertical penetration passing in a dorsal to ventral direction through the superior temporal sulcus and toward the ventral surface of the temporal lobe. These recordings show a marked modulation of the response to visual stimulation with changes of the recording electrode by 2 mm. As a control for modality specificity, the auditory-evoked response was also recorded. The auditory evoked response was observed dorsal to the site, which produced the largest magnitude signal for visual stimulation. There was minimal spill-over between the 2 sensory modalities at the same recording depth. These data support the contention that LFP recorded by conventional extracellular recording techniques is a “local” field potential.
Each stimulus was constructed by combining 4 strokes chosen out of a set of 64 predefined elements exported as alpha masks from Adobe Illustrator (Adobe Systems Incorporated, San Jose, CA). The composed image occupied a space of 256 × 256 pixels. For a given stimulus, the location of the 4 strokes was randomly assigned an x–y position no more than 96 pixels from the center. The resulting union of the 4 strokes formed a template onto which 1 of 124 previously generated texture patterns was mapped. The probability of selecting the same 4 strokes and texture for 2 different stimuli using this procedure is 1 in 78,786,624, not including the random positioning of the elements. An example stimulus is shown in Figure 2. When presented to the monkey, these images were scaled to approximately match the familiar and novel objects for visual angle (4°)
The familiar stimulus set contained 100 full-color images of everyday objects (Hemera Photo-Objects, Gatineau, Quebec, Canada). Objects subtended approximately 4° x 4° of visual angle and appeared on a uniform gray (50%) background. The monkeys had extensive experience with these objects during months of preliminary behavioral training in which they learned to associate a left or right button-press with each image. Each monkey saw each image hundreds of times and performed this classification task with accuracy greater than 90%. This training spanned several weeks. Thus, each monkey not only saw each image several times, but also over an extended period of time.
For Monkey S, an additional 50 objects served as a set of familiar objects without a button-response association. These objects were used for several months as distractors in separate visual search experiments (Mruczek and Sheinberg 2007a, 2007b).
The novel stimulus set containing 250 full-color images of everyday objects (Hemera Photo-Objects). Novel objects were defined as images that the monkey had never seen prior to an experimental session.
Auditory stimuli for a given session were taken from a collection (1001 Sound Effects, Sony, Tokyo, Japan) of real world sounds (e.g., a bee swarm, a balloon being blown up).
The animals were tested in experimental setups consisting of a separate animal testing room and adjoining experimenters' workstations. Each setup contained a graphics stimulator running an OpenGL based display program, a control console, and a local area network of computers running a real-time operating system (QNX; QSSL, Ontario, Canada) for experimental control. All behavioral data were available for on-line control and stored to disk for offline analysis. Each animal testing room was electrically shielded and sound isolated.
The monkeys were seated in a primate chair with their head restrained. Although monkeys gained familiarity with the familiar stimulus set in a classification task, all the novel-familiar response comparisons in this paper were made during a passive-viewing task. Stimuli were presented on a computer monitor (25.5° x 19.1° of visual angle) positioned 90 cm from the monkey. The monkeys initiated trials by fixating for 450 ms on a square fixation spot, which subtended 0.3° of visual angle in the center of the monitor. The fixation spot was removed from view once the fixation requirement was met. Next, images were presented sequentially (typically 3 per trial), each for 300 ms with a 300-ms gap in-between. Following the last image, a new fixation spot appeared at 1 of 4 eccentric locations and the monkey had to fixate this new location to acquire a fluid reward. The monkey was free to make small eye movements during the trial as long as his gaze did not leave a 10° square centered on the fixation square. No manual response was permitted for these blocks. For auditory stimulation, sounds were delivered by speakers situated to the left and right of the visual display. Sound level was 70–74 dB SPL at the location of monkey's ear.
Daily recording sites were selected using the following procedure. The recording electrode was advanced to a depth previously determined to produce visual responses. We then showed “blobs” (see Fig. 2) in a passive viewing task until we either detected visually modulated spiking or demonstrated a consistent visual evoked LFP, upon which we immediately proceeded to the passive-viewing task with novel and familiar stimuli. Subsequently, on that day, the animal might participate in other experimental tests. However, for this paper, all data sets refer to the first recording blocks of a day and where the selection of the recording location was specifically not based on screening with images similar to those in the test set.
For some of the sessions with the Thomas recording device there was more than one active channel. For these sessions, we selected the one with the largest amplitude visual evoked LFP for analysis. Thus, all data sets refer to a different day, different recording position, and different sets of novel and familiar images.
For novelty-familiarity testing we used either 4 or 10 novel and familiar images each (selected randomly for each day). Each image was presented 10 times within a block for each condition. To determine the effect of simple image manipulations on the familiarity effect we rotated images in the image plane counterclockwise 0°, 45°, 90°, 135°, or 180° or varied the contrast of each image. Contrast was manipulated by averaging the color channels with the neutral gray background in ratios of 1.0:0, 0.1:0.9, 0.02:0.98, or 0.015:0.985.
MUA was extracted off-line from the stored analog signal by setting a threshold to obtain an average of 40 events in a 200-ms time window (200 Hz) preceding stimulus onset across a block of trials. This procedure has been used in previous neurophysiology studies to minimize the arbitrary nature of the multi-unit signal across recording sessions (DeAngelis et al. 1998; DeAngelis and Newsome 1999).
In order to treat the multiunit signal in a manner comparable to the LFP we first convolved the MUA events with an asymmetric kernel, where the causal:acausal ratio of 2 Gaussians was 3:1 and the combination width was 2.5 SDs. The asymmetric filter provides an estimate of the instantaneous firing rate while minimizing backward biasing of each spike (Thompson et al. 1996; Brincat and Connor 2006). The continuous MUA function was sampled every 2 ms to be equivalent to our LFP sampling rate. After this procedure, both data types were processed similarly.
Permutation Tests and Difference Plots
We compared the average visually evoked LFP, or MUA, to novel and familiar stimuli. Statistical significance for this difference was computed using a permutation test (Efron and Tibshirani 1993) with at least 1000 permutations. For each comparison, the difference between the “novel” and “familiar” images was calculated after randomly shuffling the “novel” and “familiar” labels. From many such permutations, we obtained a distribution of differences that would be expected to occur by chance if there was no actual difference in the response to novel and familiar images. Actual differences that lay outside the central 99% of the permuted distributions for a minimum of 10 consecutive time points (20 ms) were considered significant (i.e., 2-tailed test at an alpha-level of 0.01).
In addition, we compared the average MUA rate, the mean LFP magnitude and the total LFP power evoked by familiar and novel stimuli from 50 to 450 ms after stimulus onset. LFP magnitude for each trial (zeroed to the value at image onset) was determined by taking the sum of the rectified LFP response. To quantify the difference in these measures in response to familiar or novel stimuli we used either a Wilcoxon ranked sum test or we computed empirical “receiver (or relative) operating characteristic” (ROC) curves and estimated the area under these curves (Green and Swets 1966; Swets 1996). The area under the ROC curve ranges from 0.0 to 1.0 and gives a reliable measure for the separation of 2 distributions, with 0.5 indicating no difference and values farther away from 0.5 indicating larger differences. The area under the ROC curve is a distribution-free estimate of sensitivity, and does not assume that the data are normally distributed.
Selectivity Measures and Image Classification
We measured stimulus selectivity using a Broadness Index, defined as the proportion of stimuli that evoked a significant multi-unit response (Kobatake et al. 1998; Tamura et al. 2004; Freedman et al. 2006). Significance was determined by comparing the stimulus-evoked response with the background response using a Wilcoxon rank sum test and an alpha level of 0.01. We defined background as the average spike rate in the 100 ms before stimulus onset stimulus. The stimulus-evoked response was calculated from 50 to 300 ms after stimulus onset.
We took the LFPs from 0 to 350 ms for every trial and normalized each to unit magnitude. We classified trials by using a leave one out strategy. For each trial, for each image category (familiar or novel) and contrast level (1.0, 0.1, 0.02, 0.015), we computed the mean vector for each image (4 images per each category of novel or familiar) omitting the one trial we were trying to “guess.” We then calculated our guess by computing the angle between the test trial and each of the mean image vectors as the inverse cosine of their dot product. The image with the minimum angle was our guess for that trial. We counted correct guesses for each of the 8 (4 contrast × 2 image familiarity) subcategories and used this in a 2 factor repeated measures analysis of variance (ANOVA).
Novelty and Familiarity: LFP
Three monkeys (J, M, and S) participated in a total of 18 sessions (7, 6, and 5, respectively) in which 10 trials of 10 novel and 10 familiar stimuli where shown intermixed in a passive viewing task. Figure 3A shows the average visually evoked LFP separated for novel and familiar images. The shape of the average wave form was different for each monkey, and which probably reflects some variation in the recording position (see Fig. 1) and the extent and location of the reference. Despite this variability, the visually evoked LFP showed consistent differences in the response to familiar and novel stimuli for all 3 monkeys. In general, the LFP was more negative early (160 ms) and more positive later (300 ms) when comparing the familiar to the novel responses.
To quantify this result statistically, we compared the difference between the average evoked LFP to novel and familiar stimuli and to permuted difference traces (see Fig. 3B). To generate a difference trace, we randomly permuted the labels “novel” and “familiar” and recalculated the difference between the “novel” and “familiar” averages. If the novel and familiar labels are interchangeable, then the empirically observed difference plot simply reflects random variation within the collection of recorded responses. The permuted difference traces establish the bounds of the random variation under this null hypothesis.
For all 3 monkeys (M, J, and S) the empirically observed difference exceeds the 99% confidence interval at approximately 166 ms after image onset. For Monkey M, we also found an earlier difference around 118 ms. For all 3 monkeys, differences between the LFP evoked by familiar and novel objects lasted until approximately 375 ms after stimulus onset.
Although there are large group differences between the LFP for novel and familiar images, this grand average includes recordings from different days at different absolute depths. Could we also distinguish responses to novel and familiar images in individual data sets? In a first analysis we compared the raw magnitudes of the signal for differences between the novel and familiar trials on a data set by data set basis. For each trial we took the LFP from 50 to 450 ms after image onset (each trial zeroed to the value at image onset) and rectified it (because the typical evoked response crossed the zero plane) and summed it. We then compared these values with a rank sum test. For 7 of the 18 data sets the familiar images had, on average, statistically significant greater signal magnitude (P < 0.05). Only 1 data set showed significantly weaker amplitude of the LFP for familiar images. In a second assessment, we conducted an ROC analysis on the total LFP power and calculated the area under the ROC curve (see Methods) for each of the 18 data sets. The overall mean ROC area was 0.59 (standard deviation [SD] = 0.15; 13 of 18 data sets had ROC areas of greater than 0.5; P = 0.02 binomial distribution). The maximum ROC area was 0.86. In summary, although the effect of stimulus familiarity is most obvious when pooling data sets, it can be seen at statistically reliable levels in approximately one-third of our data sets.
Novelty and Familiarity: MUA
The results for the MUA were generally similar to those for the LFP except that the signal magnitude was greater for novel images. Figure 4A shows the average visually evoked MUA separated for novel and familiar images. Consistent with previous reports (Freedman et al. 2006; Mruczek and Sheinberg 2007b), novel stimuli elicited a stronger MUA response, particularly during the later, sustained period of the response. This difference emerged at different times for the 3 monkeys, ranging from 118 to 158 ms after stimulus onset. For Monkey J, the difference in MUA emerged sooner than the difference for the LFP (118 vs. 166 ms). However, the difference between the evoked response for familiar and novel images emerged around the same time in the MUA and LFP signals for Monkey M (MUA: 120 ms vs. LFP: 118 ms) and Monkey S (MUA: 158 ms vs. LFP: 166 ms).
To compare the MUA and LFP measures we conducted an ROC analysis on the total MUA power (50–450 ms after image onset) and computed the area under the ROC curve (see Methods) for each of the 18 data sets using the same criterion as for the LFP analysis above. The overall mean ROC area was also 0.59 (SD = 0.09), but the number of data sets with an ROC area greater than 0.5 was 15 of 18. The maximum ROC Area was 0.79. An interesting difference is that the LFP has more power for the familiar stimuli, whereas the MUA signal is greater for the novel stimuli. Although both signals showed discriminative power, a comparison of the ROC areas for MUA power to those for the LFP power across each data set showed that these measures were largely independent (r = 0.04, Pearson correlation). This is consistent with prior results for visual selectivity (Kreiman et al. 2006).
Is the Response Difference due to Visual Familiarity or Motor Planning?
Our familiar targets became familiar when the monkeys learned them during a visual classification task. This task required each monkey to associate a button-press response to each image. Because the most consistent LFP and MUA differences occurred at a time when a motor response could have been observed in the active task, one potential explanation for the observed differences is that they reflect motor response processing or potentially inhibition of a learned response. We show that this is not the explanation by examining 5 additional data sets for Monkey S.
Monkey S had participated in a classification task where some images were used as distractors. That is, they were never the basis for deciding which button to press but were there to make detecting a target image more difficult. However, each distractor image was used and seen by the monkey hundreds of times. This provided us with a set of images familiar to the monkey but for which there was no learned motor association to either plan or inhibit. For each data set analyzed in this section the monkey had 10 trials for each of 10 novel, 10 familiar targets, and 10 familiar distractors. Figure 5 (left column) shows the comparison of visually evoked LFP to familiar categorized objects, familiar distractor objects, and novel objects. Both classes of familiar stimuli showed a difference between the novel and familiar stimuli that emerged around 175–200 ms. Except for the slight, but significant, differences for familiar targets and distractors in the LFP signal, the same pattern was found for the MUA (Fig. 5 [right column]). There were strong differences between both classes of familiar stimuli and the novel stimuli. The difference between both classes of familiar stimuli and the novel stimuli emerged at approximately 180 ms.
These results demonstrate that the differences in evoked responses to novel and familiar stimuli cannot be attributed to planning or inhibiting a specific motor response. However, because our monkeys have participated for years in tasks requiring motor responses to pictures it is possible that familiar images are associated with a nonspecific inhibition of a motor set even when there is no specific trained association. Because of our monkeys' behavioral history we are not in a position to directly address this explanation.
What's Familiar—Objects or Images?
Did our monkeys become familiar with these objects per se or simply with the image-specific views they saw repeatedly? Two monkeys (J and O) underwent a total of 20 sessions (12 and 8, respectively) where they viewed 10 trials of 4 familiar objects in the upright, learned configuration, as well as 10 trials with the image rotated in the picture plane 45°, 90°, 135°, or 180°. An equal number of trials and novel images for each of the rotation conditions were also shown. We included rotated versions of the novel images to demonstrate that there was nothing special about the unrotated, “natural” view of the real-world objects that comprised our stimulus set.
For both monkeys there was a pronounced difference in the visual evoked LFP between the learned view and the rotated views of the familiar objects (Fig. 6). The difference between the familiar image at the learned angle and the unrotated novel images (Fig. 6, second row) is similar to the difference between the unrotated familiar images and the same images rotated 45° (Fig. 6, third row). However rotated versions of familiar pictures are not completely “novel,” as there were significant differences between rotated versions of familiar images and similarly rotated versions of novel images (Fig. 6, bottom row). All of these differences occurred between 170 and 240 ms for both monkeys.
Effects were similar, but less pronounced for the MUA response, especially for Monkey J. The unrotated familiar images differed from the unrotated novel images (Fig. 7, second row), as well as the familiar images rotated 45° (Fig. 7, third row). There were also small differences between the familiar and novel images when both were rotated by 45° (Fig. 7, fourth row). For Monkey O, there were small significant differences for the MUA comparison of the unrotated familiar images and the unrotated novel images (Fig. 7, second row) and the 45° rotated familiar and 45° rotated novel images (Fig. 7, fourth row), but not for the unrotated familiar images compared with the 45° rotated familiar images (Fig. 7, third row).
Are The Physiological Differences Functionally Important?
Behaviorally we are faster and more accurate at classifying familiar items. Are these behavioral advantages related to the differences in physiological activity we observe in IT when passively viewing familiar and novel images? Here we show that the image categories, familiar and novel, had significant effects on sensitivity measures for the MUA and LFP. For these analyses we used 48 data sets (M = 24, O = 16, J = 5, and S = 3) where each of 8 images (4 familiar and 4 novel) was repeated 10 times at 4 different contrast levels (see Methods above).
Previous studies have demonstrated that IT neurons are more selective for familiar compared with novel objects (Kobatake et al. 1998; Freedman et al. 2006; Mruczek and Sheinberg 2007b). For this analysis, we replicated these results for our MUA data. We quantified the stimulus selectivity of the MUA using selectivity broadness (Kobatake et al. 1998; Freedman et al. 2006), which is the proportion of stimuli that evoke a significant response compared with baseline. This means that the greater the broadness measure, the less selective the signal. In the extreme, a channel that showed a similar, significant response to all stimuli would be very broad, but would not be at all selective because the firing could not be used to differentiate which image had been shown. Figure 8 shows the Broadness Index calculated between 50 and 300 ms after stimulus onset as a function of stimulus class and contrast for 3 monkeys. A repeated-measures ANOVA showed a significant effect of contrast for Monkey M (F3,69 = 15.80, P < 0.000001) and Monkey O (F3,45 = 3.30, P = 0.002) indicating that at lower contrasts few stimuli evoked a strong MUA response. This same effect was marginally significant for Monkey J (F3,12 = 2.97, P = 0.07). For all 3 monkeys there was a significant effect of stimulus class (Monkey M, F1,23 = 12.62, P = 0.002; Monkey O, F1,15 = 6.04, P = 0.03; Monkey J, F1,4 = 9.97, P = 0.03). In general, significant MUA responses were more broadly evoked by novel images. Additionally, there was a significant interaction between stimulus class and contrast for all 3 monkeys (Monkey M, F3,69 = 13.25, P = 0.00001; Monkey O, F3,45 = 5.59, P = 0.002; Monkey J, F3,12 = 3.65, P = 0.04). The differences between familiar and novel images were more pronounced for higher contrast images. The broadness index for the familiar images is relatively flat across the contrast range we used. From other data (Anderson and Sheinberg, forthcoming) we know that our monkeys can consistently and rapidly identify images at the lowest contrast values and that single cell neural responses, whereas attenuated, are still present. Therefore, the flat broadness measure reflects a scaling down of all responses and a retaining of selectivity.
These results are consistent with previous reports (Kobatake et al. 1998; Freedman et al. 2006; Mruczek and Sheinberg 2007b) and demonstrate that neurons in IT are more selective for familiar images. Also consistent with previous reports, these effects were generally stronger for the later phase of the neural response (175–300 ms after stimulus onset) compared with the earlier phase (50–175 ms after stimulus onset; data not shown).
Previous studies have used LFP power to estimate stimulus responses to individual stimuli (Kreiman et al. 2006; Liu and Newsome 2006). However, this measure is extremely impoverished because it collapses a temporally varying voltage trace into a single number. Therefore, we chose to quantify the information regarding stimulus identity that was present in the shape of the LFP waveform and whether this information differed between novel and familiar images.
Figure 9 shows the percent correct classification for the LFP subdivided by image familiarity and stimulus contrast. Chance performance is 25% (1 out of 4). For both categories and across all levels of contrast, performance is better for the familiar stimuli. Even though there is marked attenuation of the magnitude of the LFP as contrast decreases (data not shown), the classification performance of this shape based method is only modestly affected. This is consistent with behavioral result demonstrating that monkeys are able to classify such low contrast images at greater than 90% accuracy (Anderson and Sheinberg, forthcoming). Both the contrast and image familiarity effects are significant by 2-way repeated measures ANOVA (familiarity F1,47 = 10.50, P = 0.002; contrast F3,141 = 3.64, P = 0.014; interaction F3,141 = 0.27, P = 0.85).
Are Eye Movements the Basis for Differences in the Neurophysiological Signals of Familiar and Novel Images?
Studies in humans and monkeys have shown differences in the time spent examining novel images (e.g., Gothard et al. 2004). Studies that measure looking time traditionally involve pairs of images and a relatively long period of free viewing. This is a very different procedure from our task where images were centrally presented in sequences of a few hundred milliseconds for each image. However, even this short presentation period provides sufficient time for visual recognition and the generation of saccades. To examine the effects of eye movements on our results, we first computed for each stimulus presentation, the location of the center of gaze relative to the center of each image and determined the median position during the time the image was on the screen. We then compared these median eye positions for each data set (using those data sets where all images were shown at full contrast, n = 42) by the nonparametric rank-sum test. There was no systematic difference between eye position for the novel and familiar images. Median eye position deviated further from center for novel (n = 2, P < 0.025) stimuli and for familiar (n = 2, P > 0.975) stimuli; an equal number of data sets.
We did, however, observe some differences for the number of saccades when comparing trials containing either familiar or novel images. We used a χ2 test to compare the proportion of images in each familiarity category for the number of saccades made (within the ±5° fixation window) during the time the image was on the screen. Nine of the 42 data sets (21%) had a significant test (P < 0.05) and for 8 of the 9 it was because the novel images had greater proportions of zero saccade trials and familiar images had greater proportions of one or more saccades. To be sure that these differences were not responsible for the reported differences in electrophysiological activity, we repeated our analyses using only those trials for which there were no saccades during stimulus presentation. The results of this analysis did not qualitatively change the principal findings. We also looked at the timing and variability of the first saccade (see Table 1). Overall, the mean latency and variability of first saccade time, for those trials were there was at least one saccade, were similar within each monkey (although they differed across monkeys, see Table 1). Also, the time of first saccades were, on average, later than the time at which observed differences emerged for the physiological signals. This makes the probability of executing a saccade, and the timing of that saccade, more likely a consequence of neural changes attendant on familiarity rather than a cause of those signal differences.
|Time (ms)||SD (ms)||Time (ms)||SD (ms)|
|Time (ms)||SD (ms)||Time (ms)||SD (ms)|
Practice makes perfect and experience matures into expertise. How this happens is an open question. Prior work with human and monkey extracranial recorded EEG have suggested that novel and familiar material can be distinguished between 100 and 200 ms after image onset (Tanaka and Curran 2001; Rossion et al. 2002; Peissig et al. 2006). This time frame is after the first responses of primary visual cortices (Schmolesky et al. 1998) and close to the time of the initial response of higher level cortex in the visual ventral stream (Anderson et al. 2007). IT, a central component of the anterior ventral visual stream, contains neurons with highly selective visual response profiles that are robust to modest changes in rotation, position, and scale (Logothetis and Sheinberg 1996). This area also projects to medial temporal areas important for visual memory (Seltzer and Pandya 1991; Suzuki and Amaral 1994). IT is well placed anatomically to participate in the processing advantage for familiar material and the time course of the temporal response of IT neurons to visual material is consistent with prior visual evoked potential studies.
To pursue the idea that IT supports the development of “familiarity,” we recorded LFP and MUA from IT sites in 4 monkeys while they viewed familiar and novel images. Our LFP results were broadly consistent with our earlier work using extracranially recorded EEG. Evoked potentials to familiar and novel material diverge within the first 100–200 ms after image onset, and there is an increased positivity late (>300 ms). The techniques used for this report provide greater anatomical resolution than the skull and scalp electrodes used previously (Peissig et al. 2006). Our recording of visual and auditory evoked LFP show that the sensory modalities can be distinguished and that the shape and magnitude of the visual evoked response attenuates substantially a few millimeters away from visual cortex. Therefore, we can conclude that the source of the different LFP shapes for novel and familiar stimuli is within the visually responsive areas of anterior IT. We can also be more confident that our observed differences were not due to bias in recording site selection because we screened for visual responses with stimuli broadly different from both the familiar and novel images in our test sets.
Our visually evoked LFP responses show greater shape variation across our monkey subjects than is common for human extracranial ERPs. This is likely related to differences in recording location. Our examination of variation of the visually evoked LFP with recording depth showed that differences in position of a millimeter or 2 can have detectable effects on wave form shape, within a single subject along a single electrode trajectory. Despite our using standard skull coordinates for recording chamber placement, modest variation in the size of the monkeys and intersubject variation in brain size and shape suggests that there was likely some variation in the precise cortical location for each monkey. This variation cannot explain differences between the novel and familiar images responses because these comparisons were made for each monkey and for specific data sets. Therefore, whatever variation there was in recording location, it was common for both the familiar and novel images trials and so the differences with image familiarity are superimposed upon the individual variation in wave form shape.
Eye movements can affect the morphology of electrophysiological signals directly. However, making a monkey maintain fixation for many seconds with minimal tolerance for excursion beyond a small virtual window may achieve one sort of experimental control, but at the cost of distorting natural behavior and the relevance for understanding natural vision. How the pattern of eye movements varies while inspecting images and scenes of varying familiarity may be related to the physiological responses. This may not be a confound, but an essential part of the phenomenon. Such findings can only be observed if animals are free to engage in more natural responses and this requires a relaxing of experimental limits.
In many studies of visual familiarity, familiarity is gained by directing actions in response the stimuli. That is, the visual stimuli are often the source of a conscious decision for which there is a specific motor implementation. This is obviously true for the laboratory based object classification task popular in monkey research, but is also true more generally (e.g., the bird watcher who identifies a new species and immediately makes note of it in a log). We were able to demonstrate that the familiarity effects we observed were not due to explicit motor response planning. We compared familiar objects learned in a visual classification task to familiar objects for which there were no associated motor response. The latter class of stimuli was used as distractors during a visual search task and it would have been useful for the subject to ignore them. Despite these differences in training history and response association, the LFP and MUA responses to familiar stimuli of both types were consistent. Motor response planning is not the basis for the familiarity-based electrophysiological differences.
Attention and motivation can alter the magnitude of electrophysiological responses to visual stimuli (Reynolds and Chelazzi 2004). The data in this report do not address whether attention was systematically different for novel and familiar images. The monkeys' task was simply to acquire a new fixation spot after the images were shown, so novel and familiar images were not directly related to reward. Our control analysis indicates that overt eye movements did not account for the differences across familiar and novel images. Covert spatial attention may have differed across familiar and novel images, but it is unclear whether such a difference would be a cause of the neurophysiological differences or a result of those differences. This is an open question that we cannot resolve with these data.
When comparing the LFP and MUA results an interesting difference emerges. The signal magnitude of the LFP was greater for familiar images and the signal magnitude of the MUA was generally greater for novel images. One way to reconcile these data are to emphasize that the visually evoked LFP reflects both afferent activity and its transmission within a complex of many active neurons. Changing phase relationships or altering the coordination among active elements will have a large effect on the shape of the evoked response, even if the activity level of each contributing element remains constant. If familiar images evoke fewer spikes, but more correlated spikes, than you could have greater MUA activity for novel images but lower LFP stimulus magnitude.
What is it about these images that became familiar? Is it the object pictured or simply the specific 2D image array that has been viewed repeatedly? The latter would seem most likely for our monkeys. They have no hands-on experience with the type of items that made up the image set. Further, rotating familiar images by 45° made the electrophysiological responses look much more like that of the novel category. Although IT cells are relatively tolerant to object rotation, 45° of rotation would have a large impact on the firing of most IT cells (Logothetis and Pauls 1995). It remains an open question as to how much familiar images can be perturbed and still retain their electrophysiological signature as familiar. It is also a question for future work to determine how different visual dimensions (e.g., scale, position, rotation, distortion) interact with training history.
Although our results show clear differences between novel and familiar images the evidence that IT plays a causal role in the behavioral advantage for familiar items is more limited and circumstantial. Consistent with previous reports (Kobatake et al. 1998; Freedman et al. 2006; Mruczek and Sheinberg 2007b), we were able to show that selectivity measures computed on the MUA were greater for the familiar images than the novel images. Furthermore, we were better able to classify individual trials based on the shape of the LFP within the familiar image set and we showed that this result was resistant to image degradation by contrast reduction. Therefore, there would appear to be more image specific information available for familiar images in IT responses, making it plausible that novel versus familiar distinctions could be occurring in this brain region.
The fact that LFP responses are image specific is a relatively new observation. Kreiman et al. (2006) found that the LFP in IT was often image selective and that this selectivity was poorly reflected in the spiking activity recorded at the same site. Liu and Newsome (2006) were able to deduce direction selectivity from the LFP recorded in area MT. In both of these studies the LFP was quantified using total power, or power in a few broad frequency bands. Although visual inspection of the LFP traces from novel and familiar images reveals that, on average, there are changes in response magnitude, we were able to delineate the image shown on individual trials using the LFP response after normalization. Therefore, it is not just the size of the response, but also the shape of the response that carries information about the image identity. In our classification procedure we compared each trial to the average response for each image from all other trials using a similarity measure based on the angle between vectors. This treats the entire LFP response as a vector in a high dimensional space. It accords equal weight to every voltage at every time period. Angle between vectors is a common similarity metric in high dimensional spaces, but it may be that much better classifications could occur with other approaches and our method should be viewed merely as setting a lower bound and demonstrating that there is information in the shape of the LFP response.
Because we only recorded from visually responsive areas of IT we cannot address whether similar changes would be found elsewhere in the visual stream. However, given when these differences are detectable in IT, we conjecture that either the distinction between novel and familiar images is directly specified by the afferent activity to IT or that it is computed local to IT cortex.
Responses to visual stimuli can occur as early as 70–80 ms after stimulus onset in anterior IT (e.g., Xiang and Brown 1998). In other work that we have conducted on the same monkeys used in the present report, we have found information about the image available from the spike counts as soon as 80 ms after image onset (Anderson et al. 2007). Because the differences we observe between familiar and novel images emerge about 125–175 ms after image onset, there is a time gap. Whether feedback from other neural areas can deliver signals in this time frame remains to be determined, but our data do provide temporal bounds for such feedback processes. The principal candidate for a possible recurrent process would be the medial temporal lobe, given its role in memory and the reciprocal connectivity between this region and IT (Miyashita 2004; Squire et al. 2007). In summary, our data show that novel and familiar images have different electrophysiological signatures in IT cortex and that there exists more information about familiar than novel images in the LFP and MUA responses.
James S. McDonnell Foundation (NIH RO1-EY014681); National Science Foundation Science of Learning Center (SBE-0542013); and National Science Foundation (CRCNS 0423031).
Conflict of Interest: None declared.