The visual system is confronted with rapidly changing stimuli in everyday life. It is not well understood how information in such a stream of input is updated within the brain. We performed voltage-sensitive dye imaging across the primary visual cortex (V1) to capture responses to sequences of natural scene contours. We presented vertically and horizontally filtered natural images, and their superpositions, at 10 or 33 Hz. At low frequency, the encoding was found to represent not the currently presented images, but differences in orientation between consecutive images. This was in sharp contrast to more rapid sequences for which we found an ongoing representation of current input, consistent with earlier studies. Our finding that for slower image sequences, V1 does no longer report actual features but represents their relative difference in time counteracts the view that the first cortical processing stage must always transfer complete information. Instead, we show its capacities for change detection with a new emphasis on the role of automatic computation evolving in the 100-ms range, inevitably affecting information transmission further downstream.
Characterization of stimulus–response relationship is the most fundamental approach to accessing cortical coding behavior. This procedure starts with the assumption that neuronal populations sample sensory input and form faithful internal representations of its actual content. In fact, classical reverse-correlation techniques enable us to determine neuronal tuning properties by backtracking responses to stimulus variations across rapid sequences of presentation (Eckhorn et al. 1993; Ringach et al. 1997). Thus, these techniques build on the idea that neuronal activity is permanently updated by current stimulation, maintaining an ongoing representation of the outer world (Jonides et al. 1982).
To measure the continuous dynamics of cortical population activity, we used voltage-sensitive dye imaging, which reflects gradual changes in membrane potentials across several square millimeters of cortex with an emphasis on superficial layers (Grinvald et al. 1994; Petersen et al. 2003; Jancke et al. 2004; Chen et al. 2006; Roland et al. 2006; Berger et al. 2007; Sit et al. 2009; see Grinvald and Hildesheim 2004 for review). This method does not provide single neuron activity or its dependence on different cortical layers. On the upside, it avoids biased sampling of neurons and it captures population activity irrespective of receptive field locations and preferred feature selectivities (Lee et al. 1988; Vogels 1990; Jancke et al. 1999; Tsodyks et al. 1999; Jancke 2000; Dinse and Jancke 2001; Pouget et al. 2003; Graf et al. 2011; Gilad et al. 2012; Lewis and Lazar 2013), hence, providing the global tuning of the cortex across millions of neurons under different stimulus conditions.
We report, using recordings in cat visual cortex (V1), that ongoing encoding can be found for the representation of briefly presented stimulus sequences (33 Hz) consistent with earlier studies (Ringach et al. 1997; Benucci et al. 2009). However, slower image sequences (10 Hz) reveal an essential addition. Population tuning in the primary visual cortex no longer represents the complete image content, but rather those orientations that were newly added or removed. We propose that such a precise detection of change across sequences of natural scene contours involves the interplay between 2 well-known neuronal behaviors, adaptation and stimulus off-responses (Movshon and Lennie 1979; Duysens et al. 1996; Müller et al. 1999; Bair et al. 2002; Dragoi et al. 2002; Felsen et al. 2002) that are important for stimulus transitions (Eriksson et al. 2008, 2012) and interact here to encode the difference to past images. In combination with eye movements at different spatiotemporal scales, the observed frequency-dependent encoding of image content might help to remove predictable input correlations in order to emphasize object borders and discontinuities within natural scenes (Desbordes and Rucci 2007; Rucci et al. 2007; Rucci 2008; Kuang et al. 2012). We conclude that input timing may entail predictive encoding (Rao and Ballard 1999; Friston 2005) at the very first cortical processing stages without the involvement of voluntary or attentional top-down mechanisms.
Materials and Methods
Visual Stimuli and Presentation
Construction of Oriented Stimuli (V, H) and Superpositions (VH)
We presented sequences of stimuli to anesthetized cats (11 males, 4 females, adult). First, we used 128 natural images (64 urban and 64 nature scenes) in grayscale to construct stimuli with dominant vertical (V) and horizontal (H) orientation. We derived these oriented stimuli by filtering the natural images in Fourier space with real-valued, polar-separable filters. The angular function of these filters was a triangular hat function, symmetric across 180°, with its maxima of 1 at either vertical (for stimulus type V) or horizontal (for stimulus type H) frequency. The half width at half maximum was 45° for the construction of broadly filtered images, and 2.8° for the construction of narrowly filtered images. The radial function (adapted from Simoncelli and Farid 1996) included a low-pass filter with cutoff at 6.6 c/° visual angle, and the DC component was set to zero. From the first harmonic up to 6.0 c/° (start of low-pass transition range), the function was 1. Thus, in this range, we preserve the relative amplitudes of the original image, including the characteristic 1/f fall-off for natural stimuli.
Next, we constructed superpositions (VH) of vertical (V) and horizontal (H) stimuli by summing them (in image space). We normalized the stimuli such that the global contrast of all superpositions was the same (rms contrast: 0.71); this means that component stimuli (V and H) were scaled with the same factor as their superposition (VH). During this normalization, we had to clip some pixels of images with a large intensity range (in average 2%, at most 15%).
We also presented vertical (V) and horizontal (H) square-wave gratings (0.2 c/°) and their linear superpositions (VH, contrast also 0.71, phases were varied over repetitions). In addition to oriented stimuli and their superpositions, we used an isoluminant screen as blank (B) stimulus. All stimuli were gamma-corrected according to the presentation monitor (100 Hz, Sony Triniton GDM-FW900, Japan). Mean luminance of each stimulus, including blank, was 20 cd/m2. Stimuli covered a visual field of 31° × 31°.
From the 4 stimulus types: (V) vertical stimulus, (H) horizontal stimulus, (VH) superposition, and (B) blank stimulus, we created pseudorandom sequences including all 16 (42) possible transitions, that is, switches, between them, resulting in 17 stimuli per sequence. We constructed 64 different sequences optimizing the following criteria: As response variance across different experimental trials is relatively high, we applied the constraints that all stimulus types (V, H, VH, and B) should occur equally often (4 times within each 10-Hz sequence) and that every switch should occur equally often (once within each 10-Hz sequence). To avoid systematic effects in the responses to particular switch types in dependence of their position in the sequence (start, middle, or end), we randomized positions of both stimulus type and transition type across the 64 sequences (see Fig. 1 for a sketch of the paradigm and averaging procedure).
These 64 sequences were repetitively shown with 2 different presentation frequencies, at 10 or 33 Hz. For 10-Hz presentation (100-ms presentation of each stimulus), one sequence was shown per trial. Here, stimulus presentation per trial lasted 1700 ms. For 33-Hz presentation (30-ms presentation of each stimulus), we showed 3 sequences in 1 trial (1530 ms per trial). Optical data were recorded for 2 s per trial, including a 200-ms baseline, in both cases. The relatively short trial durations were used to avoid dye bleaching effects, photodynamic damage of the cortical tissue, and possible contamination by intrinsic signals (Grinvald and Hildesheim 2004). The intertrial interval, in which a blank stimulus was shown, was set to a minimum of 5 s. When 2 stimulus conditions were used (such as narrowly filtered and broadly filtered), these were randomized across trials. In between stimulus trials, we recorded blank conditions, where an isoluminant gray screen was shown for 2 s, to allow correction of breathing and heart beat artifacts (2 blanks for 16 stimulus trials).
In the main body of imaging experiments, we used 10-Hz presentation frequency and the 2 stimulus conditions narrowly filtered and broadly filtered images (Fig. 4 and 7, 12 experiments, 256–896 repetitions of each switch type). In one additional imaging experiment, we used oriented square-wave gratings for 10-Hz presentation (Fig. 4 (bottom row) and 5 (imaging trace), 236 repetitions of each switch type). To recapitulate previous optical imaging results (Benucci et al. 2009), in one experiment (Figs 2, 3, and 4 (upper row), 384 repetitions of each switch type), we used 33-Hz presentation frequency and 2 different stimulus conditions: “gratings” and narrowly filtered images. Finally, instead of using long stimulus sequences we presented 2 stimuli (i.e. isolated single switches) in 1 experiment. Here, we concentrated on the most informative switch type, superpositions to a single orientation (VH to V, and VH to V). Gratings were used in 3 different timing conditions: The superposition (VH) was shown for 33, 100, and 500 ms before switches to a single orientation (V or H) occurred (Fig. 6, 80 repetitions).
High-Contrast Moving Gratings
For calibration and mapping of orientation preference, we used moving square-wave gratings (rms contrast: 1, 0.2 c/°, 6 Hz, mean luminance 53 cd/m2 [8 hemispheres], or 38 cd/m2 [9 hemispheres]) with 4 different orientations (0°, 45°, 90°, 135°) and both motion directions. We recorded for 1 s in each trial, including 200-ms prestimulus time. These calibration trials were recorded throughout the entire experiment in between blocks of the main stimulus protocol of each experiment (specified above).
Preparations for Optical Imaging
All animal experiments were carried out in accordance with the European Union Community Council guidelines and approved by the German Animal Care and Use Committee (application number: AZ 18.104.22.168.32.07.032) in accordance with the Deutsches Tierschutzgesetz (§ 8 Abs. 1) and the NIH guidelines. For further details, see Onat, Nortmann et al. (2011). In brief, animals were initially anesthetized with ketamine (20 mg per kg i.m.) and xylazine (1 mg per kg i.m.), artificially respirated, continuously anesthetized with 0.8–1.5% isoflurane in a 1:1 mixture of O2/N2O, and fed intravenously. Both weak effects on neuronal tuning properties (e.g. Niell and Stryker 2010) and strong modulations (e.g. Adesnik et al. 2012) have been reported when comparing anesthetized and awake states. This might additionally depend on the type of anesthetics used. Therefore, it is an interesting question, and it remains to be generally tested, in how far results obtained under anesthesia hold in behavioral settings. However, our anesthetized and paralyzed preparation provides the advantage that eyes are fixed and hence allow complete control of the dynamics of the visual input. We administered 0.4 mg/kg dexamethasone i.m. and 0.05 mg/kg atropine sulfate i.m. daily and 20 mg/kg cephazolin twice a day. In few control experiments, we used contact lenses with a 3-mm diameter pupil. Heart rate, intratracheal pressure, exhaled CO2, and body temperature were monitored. The skull was opened above area V1 (A18, occasionally parts of A17), the dura was removed, a chamber was mounted, the cortex was stained for 3 h (and occasionally re-stained) with voltage-sensitive dye (RH-1691), and unbound dye was washed out.
Data Acquisition and Preprocessing
Optical imaging was conducted with Imager 3001 (Optical Imaging, Inc., Mountainside, NY, USA). The camera was focused ∼500 µm below the cortical surface. Data acquisition onset was synchronized with heartbeat and respiration. For detection of changes in fluorescence, the cortex was illuminated with light of wavelength 630 ± 10 nm, and emitted light at wavelengths above 665 nm was collected. The frame rate was set to 100 Hz. We performed normalization by dividing each pixel value by its average 200-ms prestimulus activity; heartbeat and respiration-related artifacts were removed by subtracting the average blank signal. These preprocessing steps lead to a unitless relative signal of fluorescence, denoted by ΔF/F. For the main paradigm (10-Hz narrowly/broadly filtered), we excluded 5 hemispheres from analysis because of insufficient staining. Data were used when the amplitude of the evoked response in the Fourier power spectrum at 10 Hz (the switch-type unspecific response) was at least 3 times larger than at surrounding frequencies (±2 Hz).
Electrode recordings served as a control of the voltage-related responses reported by the fluorescent optical signals. The recorded units were collected at a depth between 400 and 700 μm. Before electrophysiology, a vascular map of the brain was captured by illumination with green light (546 nm) from 2 optic fiber light guides. Additionally, the afterward measured orientation maps were overlaid. This combined map was then used to guide electrode penetrations to orientation-selective domains. Spikes were sorted online by a multiple spike detector, MSD (Alpha Omega Engineering, Ltd., Israel). Cells were selected upon differences in spike-wave forms. Multiple unit activity (MUA, mostly 3–4 cells, occasionally we recorded single units in addition to MUA (5 of 27 sites)) was recorded with tungsten electrodes (0.8–2 M
For these electrophysiological recordings, we presented oriented square-wave gratings at 10 Hz. We adopted our corresponding imaging paradigm (see above) to use longer trial durations. This allowed us to show sequences of 66 stimuli (6600-ms presentation duration per trial) consisting of the 64 possible triplets of the 4 stimulus types (V, H, VH, and B). Positions of triplet types within sequences were randomized across trials. Data were recorded in 4 hemispheres (in 2 of which we also performed optical imaging).
Maps were computed using data from high-contrast, square-wave moving gratings. After trial-wise preprocessing, we averaged data over repetitions (50–119 repetitions), motion direction, and time. The resulting dataset was spatially band-pass filtered from 1 to 3 c/mm. The vertical-horizontal (VH) orientation maps (used for correlation analysis in Fig. 6E and Supplementary Fig. S1) were obtained by subtracting the horizontal from the vertical map. We also computed additive VH maps as a control (Fig. 6E). Orientation maps, which cover the full range of orientations, were computed based on the 4 measured orientations using vector summation, and downsampled to 18 bins of 10° each.
To obtain responses to each of the 16 possible switches (i.e. the pairs of 4 different stimulus types: V, H, VH, and B (blank)), we first removed responses unspecific to switch type by pixel-wise subtraction of the average response across the entire stimulus sequences. We then computed switch-triggered responses for each of the 16 switch types. This was done by aligning responses to a particular switch in time and then averaging over repetitions measured in different sequences. Thereby we averaged over responses to varying stimuli before and after a specific switch. The procedure to compute switch-triggered averages is illustrated in Figure 1 (a comparison to stimulus-triggered averaging is provided in Supplementary Fig. S2). When using filtered natural stimuli the 2 image categories (urban/nature scenes) were pooled.
The data were spatially band-passed from 1 to 3 c/mm. Population tuning was computed by averaging responses across pixels with the same orientation preference (as determined by the vector-based orientation map). Overall population tuning was then obtained by averaging across experiments (10 Hz narrowly/broadly filtered n = 12, 10 Hz gratings n = 1, 33 Hz narrowly filtered/gratings n = 1) and time (50–90 ms after switch).
Modulation Depth and Tuning Width
Modulation depth was calculated as the average difference between activity at pixels preferring 90 ± 15° and 0 ± 15° (time window 50–90 ms after switch). We tested for differences between broadly and narrowly filtered stimuli, using responses from blank (B) to vertical (V) and horizontal (H) orientations. We first averaged over vertical and horizontal conditions, sign-inverting the latter, and subsequently we used a pairwise 2-tailed t-test to compare filter conditions within experiments (n = 12). Tuning width was estimated by fitting a Gaussian with 3 parameters (amplitude, width, and basis level) for each experiment and filter condition, and using a paired 2-tailed t-test comparing filter conditions within experiments. We only used experiments for this subsequent test, in which the fitted Gaussian yielded a fit better than 80% of variance explained (the residual variance divided by the variance of the data, subtracted from one), which was the case in 8 of 12 experiments. In general, before conducting t-tests, we tested the respective distributions for normality using a Lilliefors test (α = 0.05).
Statistical Evaluation of Population Tuning
To evaluate population tuning statistically we used a 3-step procedure. First, we quantified how much the time-averaged orientation-tuning curves deviated from a flat orientation-tuning curve (zero-baseline). To quantify this difference to zero, we computed one χ2-test statistic. This allowed us to include the data from all experiments in an overall quantification, while at the same time accounting for differences in sample sizes, variances, and response levels across different experiments. The χ2-test statistic is computed across all experiments (experiment e = 1,…,n; where n is the number of experiments), and orientation bins (bin
Here, denotes the average over repetitions in experiment e and the response in experiment e, orientation bin
In the second step of the evaluation of population tuning, we fitted sinusoids to the population tuning curves and repeated the quantification on the residuals. We used the sinusoid function with amplitude ae, which is fitted for each experiment, e, by minimizing (for each experiment e).
Goodness of fit was then evaluated by computing the χ2-test statistic on the residuals as specified above (using the fitted sinusoids in place of the constant baseline).
In this second step, there are fewer degrees of freedom, df2 = 18n−1−n, because we did fit n parameters (10 Hz narrowly/broadly filtered n = 12, 10 Hz gratings n = 1, 33 Hz narrowly filtered/gratings n = 1).
In the third step of the procedure, we investigated the direction of population tuning. It is characterized by the sign of the amplitude of the sinusoid, ae, which was fitted in the second step. When ae is positive, the maximum of the sinusoid is at bin 18, which covers orientation preferences 175°–185°, and its minimum is at bin 9, which covers 85°–95°. Thus, positive amplitude ae indicates horizontal population tuning. Negative amplitude, correspondingly, indicates vertical population tuning.
To mimic natural viewing conditions (Betsch et al. 2004), stimuli were rapid sequences comprising blank (B), vertical (V), horizontal (H), superimposed orientations (VH) with different degrees of complexity (Fig. 2A–C). We set up pseudorandom sequences from these 4 stimulus types such that each sequence included all 16 possible transitions. Activity was continuously recorded in 10-ms time frames (Fig. 2D), and switch-triggered averages were generated for each individual transition.
Representation of Current Orientations
First, we verified that we can reproduce former findings within our settings. Figure 2D–F depicts cortical responses to a switch (green line) from vertical to horizontal when square-wave gratings (Fig. 2C) were presented within short 30-ms stimulus sequences. Note the rapid change in color (bluish to reddish) that indicates subsequent activation of spatially distinct populations of neurons representing each stimulus orientation briefly before and after the switch. To obtain a compact depiction of overall population tuning over time, we remapped the data onto orientation space (Fig. 2E). Figure 2F summarizes, in 10-ms time frames, how peak population activity shifted from vertical to horizontal, with a transition phase of typically one to two 10-ms frames (color bar for activity levels, tuning profiles on left and right). Thus, as described in previous electrophysiological studies (Ringach et al. 1997; Gillespie et al. 2001) as well as in a recent work which evaluated both voltage-sensitive dye imaging and extracellular measurements for this paradigm (Benucci et al. 2009), we showed how the primary visual cortex acts as a straightforward “instantaneous decoder” (Benucci et al. 2009) by mapping of currently presented orientations.
Next, we tested whether this coding scheme also holds for complex stimuli that contain more than just a change to a single orientation. To answer this question, we introduced switches from a single grating orientation (V or H) to superposition (VH plaid) and vice versa (Eriksson et al. 2010; Nortmann et al. 2011). Superimposing an orthogonal orientation to the present one resulted in relatively flat distributions, representing the average of the individual orientation patterns (Fig. 3, columns #1, stimulus conditions on top; see figure legend), indicating an unbiased processing of both the sustained and added orientation (Busse et al. 2009; MacEvoy et al. 2009). Importantly, a change back from superposition to a single orientation led to responses tuned to the orientation present after the switch (Fig. 3, columns #2, see blue and red arrows). Therefore, current orientations were again directly encoded with a processing delay of ∼50 ms, similar to when turned on from blank (Fig. 3, compare columns #3). This scheme deviated exclusively for switches from an oriented stimulus back to blank (Fig. 3, columns #4). In this case, activity showed persistent tuning after the stimulus was turned off (Coltheart 1980; Duysens et al. 1985). In summary, our data using 33-Hz square-wave grating sequences confirmed the ongoing encoding of orientation, including the known exception of the tuned response after stimulus offset (Benucci et al. 2009).
Response characteristics obtained with simple artificial stimuli, like gratings of optimal spatial frequency used so far, do not necessarily generalize towards input of ecological relevance (Smyth et al. 2003; David et al. 2004; David and Gallant 2005; Felsen et al. 2005; Haider et al. 2010; Fournier et al. 2011; Onat, König et al. 2011). That is, response behavior to natural input can deviate significantly from predictions based on simple parameterized stimuli, probably due to the extensive spatial context in natural images (see Carandini et al. 2005; Olshausen and Field 2005 for reviews). Thus, it may be important to validate findings obtained with artificial stimuli using more natural stimulus conditions (Felsen and Dan 2005). To address this point in a first step, we extracted oriented contours from 128 different natural images (see Materials and Methods) by filtering them along the orientation dimension in Fourier space. In contrast to gratings, these images retain important properties of natural stimuli, such as the phase relationships and the typical 1/f fall-off of amplitudes along the spatial dimension (Simoncelli and Olshausen 2001; Geisler 2008; Hyvärinen et al. 2009; see Fig. 2B for example). Indeed, population tuning had an overall lower amplitude than for gratings (∼30%, cf. scale bars in Figs 3 and 4), most likely due to the heterogeneity of local contrasts in the natural stimuli, which may engage widespread population gain control reflected in the dye signal (Sit et al. 2009). Nevertheless, also for these stimuli, actual orientations after the switch were well represented (compare Fig. 4 first row with Fig. 3, columns #1 and #2).
Representation of the Difference Between Past and Present Orientations
During natural vision changes of contour orientation can occur on relatively slow time scales (Gallant et al. 1998; Dragoi et al. 2002; Betsch et al. 2004; Kayser et al. 2004). Hence, in the following, we contrast the above scheme of ongoing encoding with the processing characteristics found for slower sequences of natural scene contours, using 100-ms stimulus periods (Fig. 4, bottom 3 rows). Strikingly here, responses to a switch from the vertical orientation to superimposed horizontal and vertical (Fig. 4, left column) represented almost exclusively the horizontal orientation, hence the orientation that was added rather than the present superposition. Likewise, switches from superposition to vertical (Fig. 4, second column) were followed by responses that were tuned to horizontal, thus representing the removed orientation instead of the remaining vertical orientation (compare first-row blue with bottom red arrows). The same characteristics were found for changes in vertical orientations (Fig. 4, 2 right columns; see Supplementary Fig. S1 for the correlation between population tuning over time and a standard VH orientation map).
Most remarkably, the mechanism worked precisely for natural scenes (Fig. 4, third row) in which the superposition of broadly filtered horizontal and vertical versions was almost identical to original images (see example Fig. 2, top). Therefore, even for the most complex stimuli that contained a rich mixture of multiple orientations, we found sensitive cortical tuning for changing orientations rather than for currently presented orientations.
Population tuning to turned-off orientations after a switch from the superposition to a single orientation (VH to V or H, 0.29 ± 0.12 mad, ×10−5 ΔF/F; see Materials and Methods and Supplementary Tables S1–S3; n = 48: medians across 12 experiments, both orientations, and both filter conditions of the natural images) suggests that responses to orientations that were turned-off are stronger than responses to orientations that sustained. When these components were measured directly, tuning amplitudes were indeed higher for turned-off orientations (switches from H or V to blank, 0.58 ± 0.28 mad, ×10−5 ΔF/F) than for sustained orientations (V to V or H to H, 0.14 ± 0.16 mad, ×10−5 ΔF/F; n = 48; paired two-tailed sign test P < 0.0001).
When comparing responses to narrowly and broadly filtered images, we calculated for the latter a decrease in modulation depth of 24 ± 7% (the difference between preferred and orthogonal responses, paired t-test P < 0.02), while we did not observe differences in tuning width (broadly filtered, HWHM 46 ± 2° sem, narrowly filtered, 49 ± 3° sem, pairwise difference 3 ± 3° sem, n = 8 experiments, P = 0.33). Because broadly filtered images provide enriched orientation content, a divisive normalization across populations of neurons with different preferred orientations (Busse et al. 2009; MacEvoy et al. 2009) may cause the observed decrease in modulation depth. For a proof of principle, we finally applied sequences of square-wave gratings in an additional experiment (Fig. 4 bottom). As expected, modulation depth was large (cf. colorbar) and also for those stimuli, we found a dominant representation of orientation change.
Even though VSD imaging may be a powerful tool to measure neuronal population dynamics with high spatiotemporal resolution, the relationship between the imaging signal and spiking activity is not entirely clear. Eriksson et al. (2008) suggest a close relationship between spike rate and the derivative of the VSD response rather than its magnitude. Such behavior might especially apply to the rising phase of the membrane potential after stimulus onsets (Jancke et al. 2004; Sit et al. 2009). Moreover, combined VSD and calcium-sensitive dye imaging suggest that the relationship between spiking activity and the amplitude of the VSD response depends on stimulus intensity (Berger et al. 2007). Chen et al. (2012) propose that these relationships are well captured by a power function with an exponent of ∼4, similarly as observed for the relationship between average membrane potential and spike rates in single V1 neurons (Anderson et al. 2000; Finn et al. 2007). However, dependencies on individual experimental settings, on the particular stimuli used, and on the likely differences between species, are widely unexplored. Hence, to address this issue, and to directly exclude the possibility that the dye signal levels reporting the difference to past orientations would merely reflect subthreshold activity (Petersen et al. 2003; Jancke et al. 2004; Berger et al. 2007; Eriksson et al. 2008), we additionally performed electrophysiological recordings in 4 hemispheres.
Figure 5A shows average multiunit activity (MUA, blue trace) recorded after imaging (black trace) in response to grating sequences. Along the entire stimulus sequence small bumps in the dye signal level coincided precisely with the generation of spiking activity, even at low levels during the early phase of the imaged response. Meaningful neuronal signals in VSD recordings can indeed be small in relation to overall activity (cf. Sharon and Grinvald 2002, for a signature of cross-inhibition suppression). Benucci et al. (2007) showed that small oscillations in the dye signal correlated with the frequency of counterphase oscillating gratings. In Onat, Nortmann et al. (2011), we showed for the first time that small bumps of activity represented exactly the retinotopic propagation of moving gratings. Very recently, using longer sequences (>10 s) of flashed gratings with different spatial phase, these formerly elusive retinotopic components in the imaging signal were shown in awake monkey (Omer et al. 2013).
For each electrode-recording site, we determined the preferred orientation tuning of MUA. Figure 5C depicts average spiking responses to moving gratings of different orientations. In this example, the neurons' preferred orientation was horizontal (red axis). Next, in Figure 5D, their time-resolved responses to a switch from superposition to either a horizontal or a vertical orientation are shown (see icons). Despite the fact that these neurons were tuned horizontally (Fig. 5C), responses for the switch to the nonpreferred vertical orientation were larger than for the switch to the preferred horizontal orientation (black trace shows the difference). In Figure 5E, spiking activity from all recorded units is summarized. The plot indicates that the recorded neurons responded stronger when their preferred stimulus was removed rather than when it was present, similarly as observed in our imaging signals (Fig. 4, bottom 3 rows).
Finally, we used a single-switch paradigm (Fig. 6; cf. Eriksson et al. 2010, 2012), also to rule-out that stimulus frames other than the switch-pair under analysis (Felsen et al. 2002; see Materials and Methods) may have significantly affected the observed response behavior using continuous sequences. The superposition of 2 orientations was presented for 30, 100, and 500 ms followed by a sudden switch to a single orientation (Fig. 6A). Whereas switches after 30-ms delay produced activity that was tuned to the orientation present after the switch, longer delay times yielded the opposite effect (Fig. 6B). Here again, population activity represented the orientation that was removed rather than the sustained orientation (Eriksson et al. 2010, 2012). This effect was stable and slightly increased (though not significantly) for 500 ms delays (Fig. 6C; in support of these findings, see Eriksson et al. (2012) for similar results with a stimulus duration of 250 ms). In Figure 6D, the time-courses of global population activity are shown. For each condition, both the initial responses to the superposition as well as the second response to a single orientation were represented by peaks of activity that were separated in time with longer switch delays (dark gray area). For 100 and 500 ms delays, activity to the superposition nearly adapted to baseline levels (occasionally we observed some further oscillations, see gray trace in the first plot and black trace in third plot), followed by a strong response after the stimulus changed to a single orientation. In contrast, for the 30-ms switch, activity did not adapt to baseline and responses to both stimuli were merged within a double-peak transient. To demonstrate that the orientation-selective part of the responses was directly traceable across the cortical activation patterns, we show time-averages of the most active pixels in each condition (Fig. 6E, see maps). Note the opposing cortical patterns in the maps depending on the switch direction and its delay times. The temporal evolution of the orientation specific patterns is shown as their correlation to a standard VH orientation map over time (black and gray traces in Fig. 6E; cf. Supplementary Fig. S1). Thus, by using only a single switch, these measurements confirmed our findings obtained with stimulus sequences: a short 30-ms period of stimulation resulted in the representation of current orientations (Fig. 6E left), whereas longer stimulus periods (100 and 500 ms) caused representation of the difference to past orientations (Fig. 6E, second and third graphs).
Our main result is that for sequences of natural scene contours presented with 10 Hz, activity no longer led to population tuning that represented actual stimulus contours. Instead, when compared with the preceding image, the cortical activity patterns characterized exactly the difference in orientations. Consequently, large amounts of incoming data were relatively suppressed, reminiscent of differencing methods (Fowler et al. 1995) used for video data compression in communication technology. For higher temporal frequency (33 Hz), activity was instead updated linearly, providing an ongoing representation of current stimulus orientation (Ringach et al. 1997; Benucci et al. 2009). Because we opposed 2 stimulation dynamics selected from a wide range of possible sequence frequencies, the exact time course of the transition between the different encoding schemes remains to be determined. Interestingly, in the same vein, using single squares of light Eriksson et al. (2008) showed in ferrets that V1 responses to rapidly presented stimulus switches (<83 ms) were dominated by stimulus onsets, whereas both VSD imaging and spiking responses to longer stimuli (>133 ms) carried additional prominent information about stimulus offset. We speculate that joint processing at various stimulus temporal frequencies is required to produce a coherent interpretation of a visual scene (Jonides et al. 1982; Rucci et al. 2007; Belitski et al. 2008; Rucci 2008; Nikolić et al. 2009; Jurjut et al. 2011; Onat, Nortmann et al. 2011; Eriksson et al. 2012), which might be implemented through differences in coherence between neuronal signals carrying different information at different frequencies, as recently shown for orientation tuning in monkey V1 (Gilad et al. 2012; Womelsdorf et al. 2012).
Interaction Between Adaptive- and Off-Response Components
Because our results were dependent on stimulus frequency, we provide strong evidence that input history has a decisive effect on cortical orientation tuning. Several time-dependent changes in cortical orientation selectivity could be accounted for by mechanisms of adaptation. Specifically, stimulus-selective adaptation of visual cortical neurons has been shown to reduce responsiveness and causing a shift in tuning curves away from the adapting orientation on short timescales (Movshon and Lennie 1979; Müller et al. 1999; Felsen et al. 2002). In our experiments, neuronal adaptation mechanisms (Galaretta and Hestrin 1998; Varela et al. 1999; Sanchez-Vives et al. 2000) and immediate tuned suppression (Nelson 1991) may decrease activity for the sustained component. Adaptation to a single orientation was shown to also enhance the representation of orthogonal orientations (Dragoi et al. 2002), which in our case would furthermore boost the on-response for the newly added orientation. Thus, for switches within sequences from a single, (V) or (H), orientation to a superimposed (VH) stimulus, adaptation alone would explain the resulting dominant representation of the newly added orientation (instead of both of the current orientations).
These mechanisms, however, cannot entirely explain the prominent representation of the disappeared orientation after exposure to superimposed orientations. The latter provide no bias that could induce adaptation of a particular orientation before the switch. Thus, in addition, responses to the disappeared orientation must be involved (Bair et al. 2002; Sit et al. 2009; Eriksson et al. 2010, 2012). Signals following stimulus removal are commonly referred to as visual off-responses, which tend to increase with stimulus duration (Duysens et al. 1996), as similarly found in the somatosensory cortex (Kyrazi et al. 1994). A likely assumption is that off-responses result from post-inhibitory rebound due to sustained hyperpolarization arising from synaptic inhibition (Pernberg et al. 1998, but see Scholl et al. (2010) for auditory cortical neurons), supposedly mediated by tuned push-pull mechanisms within the cortical circuitry (Hirsch et al. 2003). Importantly, such tuned off-responses and adaptive contributions can be disentangled in our study. A change from superimposed orientations to a single orientation has 2 underlying constituents: A switch from one orientation to blank (off-component), and an overlaid continuous presentation of the orthogonal orientation (sustained, adaptive component). Figure 7 summarizes such a composition for a switch from superposition (VH) to a single orientation (V or H). Median fits for responses to each of the constituent stimuli are shown (blue/red; see Results), the gray curve outlines their average (i.e. divisive normalization, MacEvoy et al. 2009). The black curve shows the median fit for the responses that we obtained for the composite switch (Supplementary Tables S1–S3), indicating tuning to the orientation that was turned off.
Approximating the measured change response for this type of switch (VH to H and VH to V) by a weighted average of the 2 component responses (Busse et al. 2009; MacEvoy et al. 2009), resulted in a significantly higher average contribution of 60% from the off-component, compared with 40% from the adaptive component (see Supplementary Table S4 for details). Thus, for a 100-ms stimulus period the off-component is facilitated (see blue arrows in Fig. 7) and overrides the response component that undergoes adaptation. Accordingly, due to increased contribution of orientation-selective off-responses, the combination of adaptive- and off-response components results in a representation of the difference between the past and the present image. Taken together, our data suggest mutual interaction across population responses to changing and nonchanging, that is, sustained, features. As a consequence, after periods of longer stimulation, a stimulus-change within stimulus sequences triggers activity to report that a particular feature has disappeared in comparison to what is left.
Stimuli usually do not occur in temporal isolation, but within a temporal context where stimuli in the range of a few milliseconds up to seconds appear in succession (cf. Zucker and Regehr 2002). In an early account, so-called paired pulse stimulation (Allison 1962) was used, showing that responses to a second stimulus were severely suppressed when compared with the first response dependent on the interstimulus interval. Although the underlying mechanisms are not resolved, there is agreement that paired pulse suppression is a cortical phenomenon with a strong GABAergic contribution (Wehr and Zador 2005). In contrast to paired pulse suppression, which addresses the transient behavior of cortical response properties, continuous, periodic stimulation addresses the response behavior during so-called steady-state conditions (Onat, König et al. 2011). Conceivably, our single-switch condition (Fig. 6) resembles paired stimulation, while the sequence conditions reflect steady-state stimulation as termed by others. Interestingly, the built-up and the time course of inhibition during steady-state stimulation can be quite different from what can be inferred from paired stimulation (Hickmott 2010). It is therefore surprising that we obtained equivalent results for both paradigms which may hint on involvement of widespread excitatory–inhibitory mechanisms (Markram et al. 1998). Interestingly, in a recent work by Olsen et al. (2012), it was shown that neurons in cortical layer 6 have a major impact in controlling the gain of activity in upper layer neurons, the layers imaged here. Most strikingly, gain control occurred without changing orientation tuning. Thus, such mechanism may be the ideal candidate to balance the relative weights of adaptive and off-components dependent on visual input frequency. This might be realized by different gating of activity across 2 neuronal circuits acting (i.e. “competing” Adesnik and Scanziani 2010), in parallel: those including adapting neuronal populations and those that produce off-responses.
Coding of Stimulus Differences Between Past and Present
Representation of difference in the primary visual cortex may lead to attenuation of redundancies (Attneave 1954) over time and increased sensitivity to dissimilar structures, such as differently oriented borders of objects (Das and Gilbert 1997; Downar et al. 2000; Dragoi et al. 2002; Desbordes and Rucci 2007; Rucci et al. 2007; Rucci 2008; Beste et al. 2011). The proposed activity dynamics of adaptive- and off-response components might be viewed as short-term memory processes (Sperling 1960; DiLollo 1977; Coltheart 1980), which begin with stimulus onset (DiLollo 1977), trigger recurrent networks (McCormick et al. 2003), and may be coupled with feed-back from higher areas (Rockland and Pandya 1979; Roland et al. 2006; Golomb et al. 2010; Vetter et al. 2015) to allow prolonged influence of past activity (Coltheart 1980; Duysens et al. 1985; McCormick et al. 2003) on the processing of current input (Gould 1967; Jonides et al. 1982; Jancke 2000; Eagleman et al. 2004; Eriksson et al. 2008, 2010, 2012; Nikolić et al. 2009; Glasser et al. 2011).
During free viewing of natural scenes, intersaccadic durations (in human ∼250 ms on average, Kuang et al. 2012; in cat >2000 ms, Moeller et al. 2004) can be even larger than our long stimulus intervals (>100 ms). After the onset of each new fixation (i.e., at low temporal frequencies), difference representation might facilitate cortical encoding of luminance discontinuities and edges at spatial scales larger than covered by retinal ganglion cells and also by neighboring cortical cells with similar orientation tuning (Müller et al. 1999). Specifically, cortical difference representation might compensate the extensive luminance correlations (and thus, reduce redundancies) conveyed by transient activity of retinal ganglion cell populations immediately after a saccade (Desbordes and Rucci 2007).
High correlation among activity of ganglion cells immediately after a saccade signals long-range correlations in natural images (Kuang et al. 2012). A second regime brought in by microscopic eye movements, operating at higher temporal frequencies, is proposed to cause whitening of input while decorrelating retinal activity (Kuang et al. 2012). Hence, these retinal signals emphasize small spatial details in visual structures during fixation (Desbordes and Rucci 2007; Rucci et al. 2007; Rucci 2008; Kuang et al. 2012), even of contour orientations (Rucci and Desbordes, 2003). The here reported cortical representation of current orientations at higher frequencies (33 Hz) may reflect the transmission of the acquired information further downstream. In conclusion, the 2 temporal regimes of eye movements may allow complementary contributions (Snodderly et al. 2001) in a motion-based coarse-to-fine processing of visual information (Parker et al. 1992; Jancke 2000; Ahissar and Arieli 2001, 2012; Geisler et al. 2001; Henning et al. 2002; Desbordes and Rucci 2007; Rucci 2008; Meirovithz et al. 2012) and could be an efficient mechanism for perception of salient structures in the environment. Whether these regimes act linearly at the cortical level cannot ultimately be decided upon our data. Modeling at the retinal level suggests that the 2 regimes can however be well captured using linear approaches (Desbordes and Rucci 2007).
More generally, the time-dependent cortical coding of the difference to past events can be interpreted as an early cortical signature of mismatch signals between ongoing stimulation and abrupt stimulus changes, as first described in the auditory domain (Näätänen et al. 1978). Most recently, mismatch signals have been shown in the primary cortex of the mouse to be cooperatively influenced by motor-related input (Keller et al. 2012). We suggest that both adaptation (Jääskeläinen et al. 2004) and stimulus off-responses play in important role in generating mismatch signals.
Finally, responses to stimulus differences fit well conceptually with predictive coding principles, proposing that deviations from cortically generated predictions are propagated up the visual hierarchy as error signals (Friston 2005; Garrido et al. 2009). Given the prediction that contour orientations remain stable over prolonged periods of time, as during periods of fixation, error signals would correspond to representations of change, as measured here. Predictive coding principles were found as early as in the retina (Srinivasan et al. 1982; Hosoya et al. 2005) and have been used in recent modeling frameworks of cortical visual responses (Rao and Ballard 1999; Friston 2005; Spratling 2010, 2012; Boerlin and Dèneve 2011), also suggesting a combination of stimulus and error-like coding within single neurons (Eriksson et al. 2012). The exact timescales (Thorpe et al. 1996) at which different predictive states may evolve under natural viewing conditions need to be further explored and elaborated in computational models to account for the qualitatively different behaviors of cortical responses—from ongoing representation to representation of difference—that are reported here.
Funding to pay the Open Access publication charges for this article was provided by the DFG (Deutsche Forschungsgemeinschaft), SFB-874 (TP A2 Eysel, Jancke) and the German-Israeli Project Cooperation (DIP, JA 945/3-1, SL 185/1-1).
We thank BMBF (Bundesministerium für Bildung und Forschung), DFG (Deutsche Forschungsgemeinschaft), SFB-874 (TP A2 Eysel, Jancke), German-Israeli Project Cooperation (DIP, JA 945/3-1, SL 185/1-1), ERC-2010-AdG (269716 Multisense, König), and the Ministry for Science and Culture of Lower Saxony, Germany (Lichtenberg Scholarship) for financial support. We also thank Drs. Saskia Nagel and Andrea Benucci for their helpful comments on an early version of the manuscript and Drs. Eyal Seidemann and David Eriksson for the discussion of a later version. Conflict of Interest: None declared.