We measured the timing, areal distribution, and laminar profile of fast, wavelength-insensitive and slower, wavelength-sensitive responses in V1 and extrastriate areas, using laminar current-source density analysis in awake macaque monkeys. There were 3 main findings. 1) We confirmed previously reported significant ventral–dorsal stream latency lags at the level of V4 (V4 mean = 38.7 ms vs. middle temporal mean = 26.9 ms) and inferotemporal cortex (IT mean = 43.4 ms vs. dorsal bank of the superior temporal sulcus mean = 33.9 ms). 2) We found that wavelength-sensitive inputs in areas V1, V4, and IT lagged the wavelength-insensitive responses by significant margins; this lag increased over successive levels of the system. 3) We found that laminar activation profiles in V4 and IT were inconsistent with “feedforward” input through the ascending ventral cortical pathway; the likely alternative input routes include both lateral inputs from the dorsal stream and direct inputs from nonspecific thalamic neurons. These findings support a “Framing” Model of ventral stream visual processing in which rapidly conducted inputs, mediated by one or more accessory pathways, modulate the processing of more slowly conducted feedforward inputs.
Several groups have proposed that during visual processing, fast inputs carrying relatively sparse information modulate or “frame” the processing of slower inputs carrying more detailed information (Nowak and Bullier 1997; Schroeder and others 1998; Bar 2003; Brincat and Connor 2006). Testing these propositions requires a clear description of the flow of afferent input through the visual pathways. A number of laboratories have attempted to provide this description by quantifying response onset latencies across a large proportion of the areas comprising the cortical visual system in macaque monkeys (Nowak and Bullier 1997; Schmolesky and others 1998; Schroeder and others 1998; Ledberg and others 2006). Although agreeing on the generally serial pattern of activation within the visual pathways and the overall latency advantage of the parietal (dorsal) stream over the inferotemporal (ventral) stream (Bullier 2003), these studies leave numerous questions open. For example, the differential conduction velocities of the subcortical magnocellular (M) and parvocellular systems (Marrocco 1976; Schroeder and others 1989, 1998), coupled with the fact that these systems converge within ventral stream areas (Merrigan and Maunsell 1993), predict multiple waves of “feedforward” activation in these areas. This question is complicated by the presence of koniocellular (K) afferents which bypass V1 and project directly into extrastriate cortices (Fries 1981; Lysakowski and others 1988; Sincich and others 2004). Our first goal was to determine if we could resolve multiple temporal activation components in V4 and inferotemporal cortex (IT). Our method exploited the finding (Givre and others 1995) that luminance-evoked responses in V1 display 2 components, a fast wavelength-insensitive component and a slower wavelength-sensitive component. We investigated the possibility that we could track these components in subsequent visual processing.
Another open question posed by connectional anatomy (Felleman and Van Essen 1991) is “what distinguishes activations in a given area that arise from feedforward, feedback, and lateral inputs?” Prior study indicates that in sensory cortical areas, standard feedforward and feedback inputs can be distinguished by their laminar activation profiles (Lipton and others 2006); as predicted by anatomy (Felleman and Van Essen 1991), standard feedforward responses begin in and near lamina 4, whereas feedback responses begin outside of lamina 4. The anatomy of lateral projections (Felleman and Van Essen 1991) predicts simultaneous activation across cortical layers, whereas that of K afferents (Hendry and Reid 2000) predicts initial activation in the superficial layers. Our second goal was to develop evidence as to the type of inputs that drive different activity components in extrastriate areas.
We used laminar current-source density (CSD) analysis to define the timing and laminar profile of the wavelength-sensitive and insensitive response components in the visual cortex. CSDs were derived from activities sampled during linear-array multielectrode recordings of V1, ventral stream areas V4 and IT, and several dorsal stream areas. This method provides distinct advantages for these studies (Schroeder and others 1998; Lipton and others 2006). First, because CSD analysis indexes the transmembrane currents comprising the first-order synaptic response, it provides a sensitive measure of synaptic inputs whether or not these lead to changes in local neuronal firing. Second, because the recordings sample all layers simultaneously, we can define and quantify laminar activation profiles.
Materials and Methods
Data for this study were assembled from experiments done in 12 male macaque monkeys (Macaca fascicularis) weighing between 5 and 9 kg and prepared surgically for chronic awake electrophysiological recordings. No monkeys were used exclusively for this study; rather, they were all assigned to other primary experiments. Because all of our experiments entail functional positioning of electrodes using white- and colored-light flash stimuli (see below), the data generated by the routine methodological procedures were available for the analyses outlined below. Prior to surgery, each animal was adapted to a custom-fitted primate chair and to the recording chamber. During recording, subjects were monitored continuously using electroencephalography (EEG) and video. Animals sat in a primate chair in a dark, isolated, electrically shielded, sound-attenuated chamber with heads fixed in position. They were maintained in an alert state and required to look at the visual stimulator but were not required to discriminate stimuli (Schroeder and others 1998).
Surgical preparation of the subjects for chronic recording was conducted using aseptic techniques under general anesthesia (sodium pentobarbital 25.0 mg/kg). Temperature and respiration rate of the subjects were continuously monitored throughout surgery. The tissue overlying the calvarium was resected, and appropriate portion of the cranium were removed. To allow electrode access to the brain and to promote an orderly pattern of recording across the surface of the studied cortices, groups of 80–100 stainless steel tubes (18 gauge), glued together in a parallel matrix and sealed with surgical grade silastic, were implanted (the neocortex and dura were left intact). These matrices were positioned normal to the surface of the brain for orthogonal electrode penetrations of studied cortical areas and placed within appropriately shaped craniotomies (Schroeder and others 1998). The matrices along with socketed Plexiglas bars (in order to permit painless head restraint) were imbedded in a pedestal of dental acrylic and secured to the cranium with titanium orthopedic screws.
Light flashes (10 μs) at 2/s were generated by 1 of 2 Grass PS22 Photo Stimulators (Grass-Telefactor Inc., West Warwick, RI) and projected onto a diffuser, subtending 11.8° of the visual field at a 43-cm viewing distance. Red light was produced by interposing a filter (Roscolux filter #19; Rosco Laboratories Inc., Stamford, CT), with peak transmittance/half-amplitude bandwidth of 650/141 nm, resulting in a photometric illuminance of 1.22 × 105 lux as measured with an International Light IL 1700 Research Radiometer/Photometer (International Light Inc., Newburyport, MA). White-light flash intensity was matched to this. This type of stimulus elicits robust CSD and multiunit responses throughout the visual pathways (Schroeder and others 1998; Mehta and others 2000). Critical to the purposes of the present study, the high intensity and brief duration produce responses with sharp onset and minimal response latencies, whose timing can be precisely related to the stimulus. Although the precision of eye position is not required for the present study due to the stimulation we used, eye position was monitored in some of the subjects using a Stoelting, Model 4100/4500 infrared system, which monitored one eye with a resolution of 1.0° of visual angle and a 60-Hz sampling rate. Stimulus presentation was gated so that the animal had to fixate within a 5-degree window surrounding the fixation point, in order to receive any stimulation.
Recording and Data Analysis
Data were collected bilaterally and sequentially from area V1; ventral stream areas V4 and IT; and dorsal stream areas middle temporal/medial superior temporal (collectively referred to as MT+), dorsal bank of the superior temporal sulcus (STSd), and ventral intraparietal/lateral intraparietal (collectively referred to as IP). Recordings, placements of an electrode, were targeted by presurgical magnetic resonance imaging (MRI), and recording sites were verified histologically using methods described in detail elsewhere (Schroeder and others 1998). The typical electrode approach to each of the areas we studied is illustrated using one subject's presurgical MRI in Figure 1. Figure 1 also incorporates illustration of histological reconstruction of recording sites in the key areas.
Laminar field potential profiles were sampled using linear-array multielectrodes positioned to span the cortical laminae at each recording site (illustrated in Fig. 2, left). Each multicontact electrode contained an array of 14 equally spaced contacts (150 μm intercontact spacing) and 1 contact 0.5–1 mm below the lowest channel of the main array. The impedance at each contact was 0.1–0.3 MΩ. Electrophysiological signals were acquired from the electrode arrays, with each intracortical electrode referenced to an epidural electrode at the frontal midline. Signals from each electrode contact were amplified via unity gain preamplifier to Grass P5 amplifier and with bandpasses of 1–3000 Hz. All signals were stored as continuous records onto a PC-based data acquisition system (Neuroscan, El Paso, TX).
CSD profiles were calculated from the averaged field potential profiles using a 3-point formula for estimation of the second spatial derivative of voltage (Freeman and Nicholson 1975):
A key to the optimal utilization of a linear-array multielectrode is a method for identifying cortical laminae. Earlier findings (Givre and others 1994; Schroeder and others 1998; Lipton and others 2006) suggested that we could functionally identify initial laminar activation profiles corresponding to feedforward versus “nonfeedforward” (i.e., feedback or lateral) profiles. Though functional identification of V1 laminae is straightforward, based on our earlier studies (Schroeder and others 1991; Givre and others 1995; Schroeder and others 1998), the precise functional characteristics that differentiate specific cortical laminae in extrastriate regions are not as well understood. Because the physical dimensions of cortical laminae are well-known (Lund 1988; Saleem and others 1993; Rockland 1997), we can get around this problem using a method for objectively and quantitatively characterizing laminar activation profiles, that we developed for this purpose (Lipton and others 2006). First, the electrode is positioned so that the contacts of the linear array bracket all of the laminae of a cortical region, using the laminar CSD profile to index the active zone of synaptic activity. This is done by iteratively recording response profiles and making small adjustments in the depth of the array, so that the contacts straddle the large amplitude sources and sinks, placing the low-amplitude components and null CSD waveforms at the margins of the array. Detailed illustration of the CSD profile changes that attend multielectrode penetration through a succession of visual cortical areas is given in an earlier paper (see Fig. 4 in Schroeder and others 1998). The electrodes used in these studies had 14 contacts in a line with 150 μm spacing, thus extending about 2 mm. As the first and last contacts do not have 2 neighbor contacts for calculating CSDs in a 3-point CSD approximation, a 14-channel field potential profile yields a 12-channel CSD profile. Based on the known dimensions of the cortical laminae, and assuming that the multielectrode array is positioned properly (above), we then assign CSD channels 1–6 to the supragranular laminar grouping (S), channels 7–8 to the granular laminar grouping (G), and channels 9–12 to the infragranular laminar grouping (I).
After appropriate positioning of the electrode, averaged laminar response profiles to red and intensity-matched white-light flashes were recorded. Each averaged response subsumed 100 single trials collected in a trial block. Trial blocks alternated between red and white conditions. To measure absolute onset latency, first, all single-channel CSD waveforms of each averaged response were full-wave rectified. “Absolute onset latency” was then defined as the earliest significant (>3 standard deviation units) deviation of any single-channel waveform of each recording from its baseline that was maintained for at least 5 ms. The baseline was defined as the period from post stimulus +3 ms to +20 ms in each single waveform. This was done to avoid stimulus onset artifact. These latency values were quantified across all recording sites within each cortical area. “Red response onset” was derived as a means of quantifying the degree to which the wavelength-selective component of the response lags the wavelength-insensitive component. At each recording site, single-channel CSDs were full-wave rectified, then averaged across channels into a single waveform (AVREC) (Givre and others 1994; Schroeder and others 1998). The samples (the collections of repeated measures obtained from multiple electrode penetrations) of red- and white-light AVREC waveforms obtained from each studied area were compared using paired t-tests (within each area, pairing the red and white AVREC waveforms across time points). At each time point, 1 paired t-test was run for determining the difference between 2 groups, red- and white-light AVRECs. The earliest time point at which these waveforms diverged significantly (and maintain separation for at least 5 ms) was taken as red response onset; that is, the point at which the wavelength-sensitive input is injected into the ongoing evoked activity initiated by fast wavelength-insensitive inputs. For each visual area, grand mean AVRECs, collapsing across recordings and subjects, were also computed for the red- and the white-light stimulation conditions. In addition to its potential use for inferring the nature of the driving input, “laminar activation profile” also provides better sensitivity to the earliest responses within a local neuronal ensemble. To define this quantity, for each recording, we selected the earliest single-channel CSD onset latency (using the same method as above for absolute onset latency) within each laminar division (i.e., S, G, or I) and quantified these values within division, across all recordings in each cortical area.
To examine the differences among absolute onset latencies of the cortical areas (Fig. 3), we utilized a 1-way analysis of variance (ANOVA) where the cortical area was the independent variable (6 levels: V1, V4, IT, MT+, STSd, and IP) and the absolute onset latency of the cortical area was the dependent variable. To eliminate outlier effects on the central tendency measures, univariate outlier analyses were performed for each cortical area studied, using a modified version of the procedure (Van Selst and Jolicoeur 1994). In the modified procedure, onset latencies are checked against a criterion appropriate to the sample sizes of studied cortical areas. Univariate outlier analyses eliminated a total of 2 recordings for V1 (2/41, 4.9%) and 1 for STSd (1/21, 4.8%) before the ANOVA.
Within each cortical area, the differences among absolute onset latencies of the S, G, and I laminae were also examined by using 1-way ANOVA (Fig. 5); the cortical lamina was the independent variable (3 levels: S, G, and I) and the absolute onset latency of the cortical lamina was the dependent variable. The totals of eliminated recordings by univariate outlier analyses were 2 for IT (2/126, 1.6%), 2 for STSd (2/63, 3.2%), and 1 for IP (1/48, 2.1%).
Due to different dependent variables in 2 1-way ANOVAs mentioned above, data cannot be analyzed by a single 2-way ANOVA. For effects detected with ANOVAs, appropriate post hoc tests were performed (i.e., if equal variances were not assumed, post hoc Games–Howell test was used; otherwise, post hoc Fisher's least significant difference [LSD] test was performed). The alpha value was set at 0.05 for all statistical tests.
The analyses are based on 41 recording sessions (6 subjects) in V1, 23 recording sessions (5 subjects) in V4, 42 recording sessions (8 subjects) in IT, 9 recording sessions (5 subjects) in MT+, 21 recording sessions (5 subjects) in STSd, and 16 recording sessions (6 subjects) in IP; each session yielded 1 multielectrode penetration, and each penetration produced an absolute onset latency, calculated from single channel waveforms and 3 values for earliest single-channel latency by laminar grouping (i.e., S, G, and I). Figure 2 illustrates the typical contrast between red- and white-light responses in ventral stream as well as the typical lack of red/white response difference in dorsal stream. Shown are colorized plots of averaged laminar CSD profiles sampled from STSd (left), and from IT (right), with responses evoked by red-light flash in the lower row and those evoked by intensity-matched white-light flash in the upper row. The condensed, average-rectified representation of each CSD profile (AVREC; see Materials and methods for details) is shown at the bottom, with the white- and red-light responses superimposed. This single-case comparison illustrates several important points of contrast between dorsal and ventral stream visual areas.
First, as shown earlier (Schroeder and others 1998), diffuse light-evoked ensemble responses have a feedforward laminar profile in the dorsal stream and a contrasting nonfeedforward activation profile in the ventral stream. The response in STSd (Fig. 2) typifies the feedforward laminar profile with initial postsynaptic activity (current sinks, yellow-red) occurring in lamina 4 (arrow), followed by secondary responses in the extragranular laminae. The local ensemble response in IT under the identical circumstances is quite different, exhibiting a nonfeedforward laminar profile. That is, the initial postsynaptic responses (current sinks) occur outside of lamina 4, in this case, concurrently in the supragranular and infragranular laminae. These are followed by a configuration of current sinks and sources, centered on lamina 4. Quantitative analyses of laminar activation profiles and comparisons across areas are presented below (Fig. 5). Potential input mechanisms for the nonfeedforward (i.e., feedback or lateral) activation profile are considered later (see Discussion).
Second, for most visual stimuli, dorsal stream areas respond more rapidly than ventral stream areas (Schroeder and others 1998). Irrespective of stimulus condition, the STSd response (Fig. 2) begins at about 28 ms, whereas that in IT begins at about 43 ms poststimulus. Because red- and white-light–evoked absolute onset latencies do not differ significantly in our earlier findings (Givre and others 1995) or the present study (paired t-test, P > 0.05), we represent the absolute onset latency pattern across areas using the red-light condition alone. The temporal response pattern across the system is quantified in Figure 3, giving the mean and standard error of the mean (SEM) of absolute onset latencies by area for all of the recording sites in this study. The patterns of significant latency differences between areas (1-way ANOVA, P < 0.001) replicate several aspects of the temporal activation pattern we have previously described for visual cortex in awake macaques under similar stimulation conditions (Givre and others 1995; Schroeder and others 1998; Mehta and others 2000). V1 latencies (mean = 30.9 ms, SEM = 0.8) are significantly shorter than V4 latencies (mean = 38.7 ms, SEM = 2.1, Games–Howell tests, P = 0.019) and IT latencies (mean = 43.4 ms, SEM = 2.2, P < 0.001). Dorsal stream latencies are significantly shorter than ventral stream latencies, at each level of the visual hierarchy. MT+ latencies (mean = 26.9 ms, SEM = 1.7) are significantly shorter than V4 latencies (P = 0.002) and IT latencies (P < 0.001). STSd latencies (mean = 33.9 ms, SEM = 2.2) are significantly shorter (P = 0.042) than IT latencies. Interestingly, average MT+ latency does not significantly differ (P = 0.335) from that of V1.
Third, dorsal and ventral stream areas are differentially sensitive to the wavelength of the stimulus. In the case of the STSd recordings (Fig. 2, left), comparison of the response profiles with red light (lower) and intensity-matched white light (upper) shows that the neuronal population at this recording site is largely insensitive to the wavelength content of the stimulus. The overlay of the AVRECs for the red- and white-light conditions (bottom left) underscores the similarity between the 2 stimulus conditions (paired t-test, P > 0.05). In the case of the IT recordings in Figure 2 (right), the finding is dramatically different. As described above, the initial phase of response does not differentiate the red- and white-evoked response profiles, and in each, the response has a nonfeedforward laminar activation profile (i.e., initial activity begins outside of lamina 4). However, beginning at about 95 ms poststimulus, a significant difference emerges between the red- and white-light responses (see AVREC overlay, bottom right). The most obvious change is the development of a large current sink at the depth of lamina 4, with a source extending below it into the infragranular layers; this is followed by new current sinks above lamina 4 and by significant enhancement of CSD responses distributed throughout the layers, especially prominent in the lower layers. Thus, in contrast to the initial phase of activation, which has a nonfeedforward profile, the second (color-enhancement) phase has a laminar activation profile that resembles the feedforward pattern with activity beginning in lamina 4.
In the dorsal stream areas, pair-wise statistical comparisons of red and white AVREC waveforms from all experiments detected no systematic impact of stimulus wavelength on processing. However, significant differences between the red and the white responses were detected in V1 and in both ventral stream areas. The difference between responses began at 45 ms in V1, 54 ms in V4, and 95 ms in IT. To allow visualization of the temporal patterns of the wavelength effects, grand mean AVREC waveforms were computed for the red- and white-stimulus conditions. Figure 4 presents grand mean AVRECs, collapsed across experiments and subjects for the red- versus white-light conditions. These are organized in a roughly anatomical, diagrammatic pattern that allows ready visualization of the temporal flows of initial wavelength-insensitive and latter wavelength-sensitive inputs through the visual pathways. Consistent with the more detailed illustration in Figures 3 and 4, it is clear that 1) irrespective of condition, onset latency generally increases from V1 through higher areas of the dorsal and ventral streams; 2) response onset is faster in the dorsal than in the ventral stream; 3) red–white response divergence is clear in V1 and in ventral stream areas V4 and IT, whereas no reliable difference is seen in the dorsal stream; and 4) red–white divergence lags response onset beginning at the V1 level (14 ms), and this lag increases over the pathway to IT (52 ms).
At first glance, the interpretation of the timing scheme appears simple: a fast, wavelength-insensitive input ascends through both the dorsal and ventral streams, and later, a long wavelength-sensitive input ascends in a second input volley through the ventral stream. However, the idea that initial activation in both streams is driven by a common feedforward input does not explain why there should be a longer response onset latency in the ventral as compared with the dorsal stream. Moreover, there is the repeated indication here (Fig. 2) and earlier (Givre and others 1994; Schroeder and others 1998), that laminar activation profile in ventral stream areas violates the prediction of an anatomical feedforward model. This is coupled with the fact that there are at least 3 nonfeedforward routes by which initial activation could reach ventral stream areas. The arrows in Figure 4 depict these possibilities: 1) lateral input from dorsal stream areas (blue arrows), 2) direct input from thalamus (green arrows), and 3) feedback from prefrontal cortex (gray arrows). These are discussed in detail below (see Discussion).
In order to evaluate objectively the laminar activation profiles of the visual areas we studied, the mean and SEM of the onset latencies of the S, G, and I of each area are plotted in Figure 5; because red- and white-light–evoked absolute response latencies do not differ significantly, only the red-light response pattern is shown. “Area V1” clearly displays a feedforward pattern (1-way ANOVA, P < 0.001). The absolute onset latency in the granular laminae (34.4 ms, SEM = 1.1) of V1 is significantly shorter than that in the supragranular laminae (42.8 ms, SEM = 1.4, Fisher's LSD tests, P < 0.001) or that in the infragranular laminae (38.6 ms, SEM = 1.7, P = 0.038) of V1. In addition, the absolute onset latency in the infragranular laminae of V1 is significantly shorter (P = 0.035) than that in the supragranular laminae. There were no statistically significant granular–extragranular onset latency differences in the “dorsal stream” areas (1-way ANOVAs; MT+, P = 0.672; STSd, P = 0.199; IP, P = 0.498); qualitatively, however, these areas all resemble V1 (i.e., the laminar activation profile forms a “<” sign). In all cases, the earliest postsynaptic response was detected in lamina 4. The laminar activation profile in the “ventral stream” differs markedly from this feedforward profile. In IT, the initial activity is detected in the supragranular laminae, whereas onsets in the granular and infragranular laminae are significantly later (Games–Howell tests, P < 0.001 and P = 0.019, respectively). The profile in V4 appears similar to that in IT; although the F value of the 1-way ANOVA of V4 does not reach the significant level, a significant granular–supragranular difference is detected by post hoc Fisher's LSD test (P = 0.031). The laminar onset latencies of V4 and IT suggest initial responses occurred in the extragranular laminae (1-way ANOVAs; V4, P = 0.087; IT, P < 0.001). In sum, the laminar activation profile in V1 clearly adheres to the feedforward model and though less clear, the profile noted in the dorsal stream appears qualitatively similar. In marked contrast, the activation profile of ventral stream areas clearly violates the feedforward prediction of initial activation in lamina 4.
A key strength of this study is its use of laminar CSD analysis. This provides a sensitive and reliable index of onset latency and laminar onset profile, as well as direct comparability with event-related potential (ERP) and magnetoencephalography measures taken from humans (Schroeder and others 1998). Another strength is that we used a few well-controlled stimulus types with uniform recording and analysis methods. Critically, we studied alert monkeys. These practices minimize the interpretive problems that have plagued many earlier studies, particularly the “meta-analyses,” which pool results across laboratories, measurement techniques and criteria, stimulation protocols, and a variety of anesthetized and awake conditions. All of these sources of variation are potentially destructive to an accurate picture of timing in visual processing.
We confirm prior studies of timing in the primate visual system in 3 areas: 1) a general increase in latency from V1 through successive levels of extrastriate cortex (Givre and others 1994; Nowak and Bullier 1997; Schmolesky and others 1998; Schroeder and others 1998; Ledberg and others 2006); 2) the general latency advantage of the dorsal over the ventral visual pathway, at any given level of the hierarchy (Nowak and Bullier 1997; Schmolesky and others 1998; Schroeder and others 1998); and 3) the surprisingly early onset of responses in MT+ relative to V1 (Raiguel and others 1989; ffytche and others 1995; Nowak and Bullier 1997; Schmolesky and others 1998; Schroeder and others 1998). The major difference between our findings and those of most other studies is that overall the latencies we report are much shorter. This is due mainly to the fact that we used an extremely bright luminance increment stimulus. Our absolute latencies in V1 compare well with those of Maunsell and Gibson (1992), who used the onset of a bright high contrast bar grating as a stimulus (50 cd/m2 with a background luminance of 1 cd/m2). Their shortest latencies based on single-unit responses in V1 ranged from 20 ms to 31 ms. Our mean latency value for V1 was 31 ms.
Which Inputs Drive the Initial Response in Ventral Stream Areas?
The laminar activation profiles we observed in ventral stream areas do not adhere to the pattern expected for a feedforward projection, that is, one in which the initial response begins in lamina 4. Excluding feedforward, there are 3 plausible alternative routes, based on known connections (Fries 1981; Lysakowski and others 1988; Felleman and Van Essen 1991; Schall and others 1995; Hendry and Reid 2000; Stepniewska and others 2005) (see Fig. 4): lateral projections from dorsal stream areas, K-thalamic projections, and feedback from higher cortical areas. While ruling out a feedforward explanation for initial activation of the ventral stream is an important step, the laminar activation profile we observe does not immediately differentiate among the remaining alternatives. Consideration of the overall latency pattern across the system is helpful in this regard.
Any explanation one can put forward has to account for the temporal activation pattern across the areas of the system. In particular, within each stream, latencies appear to increase systematically from V1 to the higher cortical areas, and at each hierarchical level, the dorsal stream component is faster than the ventral stream component. This aggregate pattern fits most easily with a “lateral activation” model in which initial responses in ventral stream areas are triggered by lateral inputs from within the same level of the dorsal stream (e.g., MT–V4 and IP– or STSd–IT).
In order to account for the different response onset times between V4 and IT, the K-input model would require that different populations of K-thalamic neurons with different response latencies drive initial responses in V4 and IT. Though less parsimonious than lateral activation, this explanation remains a possibility, particularly given the heterogeneity of thalamic K and other neuron populations that receive either direct or indirect retinal input (Hendry and Reid 2000).
It is more difficult to see how aggregate timing patterns can fit with the feedback alternative. There is an indication that prefrontal cortex is activated almost as quickly as area MT (Schmolesky and others 1998; Ledberg and others 2006) and thus would be in a position to drive the initial responses in the ventral stream using feedback projections (Bar 2003). However, prefrontal areas that are likely to be rapidly activated through the dorsal stream, such as areas 8 and 46, do not appear to project directly to V4 (Barbas and Mesulam 1981; Felleman and Van Essen 1991; Schall and others 1995; Stepniewska and others 2005), and thus, V4 would be activated by prefrontal feedback later, rather earlier than IT. Clearly, additional experiments on this issue are necessary.
Is Initial Activity in MT+ Driven by Direct Thalamic Input?
Based on sensory manipulations along the blue/yellow color dimension, several groups have suggested that movement sensitivity in MT is partially attributable to K thalamic inputs (Seidemann and others 1999; Wandell and others 1999; Morand and others 2000). Given that the color-motion stimuli used in most of the studies may not isolate the K system from the M system inputs to motion processing (Chatterjee and Callaway 2002), this remains an open question. Even so, such a K thalamic input would not necessarily be the fastest input to MT. Based on the very short latency of MT responses under a variety of conditions, several groups have proposed that the initial response in MT is driven by an input that bypasses V1, either from the tectopulvinar or from the lateral geniculate nucleus K-system (Raiguel and others 1989; Bullier and Nowak 1995; ffytche and others 1995; Sincich and others 2004). The present findings do not negate the possibility that the fastest inputs to MT bypass V1, but there are several caveats in order. The chief alternative explanation (Schroeder and others 1998) is that the neurons contributing the earliest part of the V1 distribution produce the initial response in MT. In the present sample, there is greater apparent separation of MT and V1 latencies, but the V1 and MT distributions are not significantly different. It is also noteworthy that the earliest V1 latencies we measured earlier (Schroeder and others 1998) were in the extreme peripheral retinal representation, which were not sampled in the present study. Also, there is an indication that the laminar activation profile in the dorsal stream areas (including MT, see also Schroeder and others 1998) begins in lamina 4. This is consistent with a feedforward input from V1 or V2 (Rockland and Pandya 1979; Felleman and Van Essen 1991), whereas K inputs generally terminate above lamina 4, most densely in the vicinity of lamina 2 (Hendry and Reid 2000). In any case, because the human ERP recordings obtained by Morand and others (2000) and the CSD analyses used in this study both measure the same, first-order synaptic process (i.e., transmembrane current flow), our data are directly comparable. Using a 3/5 rule for extrapolating from monkey to human sensory response latencies (Schroeder and others 1995), our mean onset latency of 27 ms (for the earliest cortical activation in MT+) extrapolates to approximately the latency value reported by Morand and others (2000).
Implications of Fast and Slow Stream Interactions in Visual Processing
We find that the simple procedure of comparing ensemble responses with red and intensity-matched white light allows us to isolate a slow, long wavelength-sensitive component in the responses of V1 and of ventral stream areas V4 and IT. In every case, the wavelength-sensitive input follows an initial, wavelength-insensitive response component, which is described above. Quantitatively, the lag between objectively scored initial and later (red-enhanced) components increases from 14 ms in V1 to 52 ms in IT. The present findings, as well as earlier ones (Givre and others 1994; Nowak and Bullier 1997; Schmolesky and others 1998; Schroeder and others 1998; Morand and others 2000), describe visual processing dynamics consistent with a “Frame and Fill Model” of visual processing. In this model, fast inputs penetrate the higher regions of the visual system very quickly using the fast M system and possibly also direct projections from nonspecific thalamic neurons. By whatever anatomical route (e.g., lateral dorsal to ventral stream projections), the fast inputs prime or otherwise frame the processing of the later arriving inputs that carry the “fill”; that is, the dense form and color information load necessary to construct a detailed visual representation. The Frame and Fill Model is conceptually similar to the recent proposition (Bar 2003) that fast inputs rapidly activate prefrontal cortex to trigger top–down facilitation of object recognition processing in IT cortex. There are several potential functions for the fast inputs; for example, coarse form information may provide an “initial guess” as to object identity (Bar 2003), and motion-sensitive inputs into the ventral stream may aid in figure-ground segregation (Ferrera and others 1994). Because of the nature of our stimuli, we cannot comment directly on this issue; however, our data do indicate dynamic modulation of ventral stream areas by fast, nonfeedforward inputs, and this is likely to play a significant role in the operation of the inferotemporal pathway.
What might this role be? One possibility is raised by 2 recent findings. First, sensory processing is EEG phase dependent (Kruglikov and Schiff 2003; Lakatos and others 2005). That is, ongoing EEG oscillations reflect systematic shifting of the local neuronal ensemble between high and low excitability states, and this significantly modulates the postsynaptic response elicited by sensory inputs. Second, transient inputs can alter the phase of the ongoing EEG so that subsequent inputs arrive during the optimal, high-excitability phase of the oscillation (Lakatos and others 2005). Thus, fast transient inputs, whether they contain spatially specific information or not, may determine the processing of subsequent inputs carrying more dense specific information. Arieli and others (1996) described this as a process in which ongoing cortical “context” determines the processing of sensory “content.” At the moment, we are inclined toward the view that initial, context-biasing responses in the ventral stream are driven by lateral projections from the dorsal stream, rather than by direct inputs from the K-thalamic system. However, particularly given the heterogeneity of K neuron types (Hendry and Reid 2000), additional evidence will be necessary to settle this issue.
This work was supported by MH060358 and T32M072288. We thank Tammy McGinnis, Noelle O'Connell, and Aimee Mills for technical assistance and Drs J.C. Arezzo and George Karmos for advice and comments. Conflict of Interest: None declared.