Subfield analysis of the receptive fields (RFs) of parafoveal V4 complex cells demonstrates directly that most RFs are tiled by overlapping second-order excitatory inputs that for any given V4 cell are predominantly selective to the same preferred values of spatial frequency and orientation. These results extend hierarchical principles of RF organization in the spatial, orientation and spatial frequency domains, first recognized in V1, to an intermediate extrastriate cortex. Spatial interaction studies across subfields demonstrate that the responses of V4 neurons to paired stimuli may either decrease or increase as a function of inter-stimulus distance across the width axis. These intra-RF suppressions and facilitations vary independently in magnitude and spatial extent from cell to cell. These results taken together with the relatively large RF sizes of V4 neurons — as compared with RF sizes of their afferent inputs — lead us to hypothesize a novel property, namely that classes of stimulus configurations that enhance areal summation while reducing suppressive interactions between excitatory inputs will evoke especially robust responses. We tested, and found support for, this hypothesis by presenting stimuli consisting of optimally tuned sine-wave gratings visible only within an annular region and found that such stimuli vigorously activate V4 neurons at firing rates far higher than those evoked by comparable stimuli to either the full-field or central core. On the basis of these results we propose a framework for a new class of neural network models for the spatial RF organizations of prototypic V4 neurons.
The relatively large receptive fields (RFs) in V4 compared with those in preceding cortical areas (Desimone and Schein, 1987) offer an opportunity to test whether there is an ordered input across subfields of the RFs of neurons in V4 that reflects the RF sizes and functional properties of projection neurons to V4 from earlier cortical areas and particularly from V2, the source of its major afferent input (Felleman and Van Essen, 1991). Moreover, the mean bandwidths for both orientation (𝛉) and spatial frequency (SF) selectivity for neurons in V4 (Desimone and Schein, 1987) are in general broader than those for neurons in V1 and V2 (DeValois et al., 1982a,b; Foster et al., 1985; Levitt et al., 1994). Consequently, these results raise the question as to whether such broader bandwidths in V4 reflect summation over multiple inputs that may be individually selective to differing 𝛉 and SF bands or, alternatively, whether there is simply a broadening of selectivity common to all subfields.
Furthermore, there exists evidence that would not be inconsistent with the possibility that V4 neurons are selective to different orientations in different parts of the RF. For example, earlier workers (Gallant et al., 1993, 1996) studied the effects of stationary rectilinear (Cartesian), concentric, radial and hyperbolic grating patterns on neurons in V4, hoping to discover stimuli analogous to those basis functions that effectively drive neurons in dorsal MST. Such functions are specialized for processing optical flow and may on theoretical grounds also be involved in size- and rotation-invariant pattern recognition. The majority of V4 neurons studied fell into two groups that were more responsive to polar gratings than to rectilinear gratings; one sensitive to radial gratings and the other more selective to concentric or spiral gratings that collectively included all orientations.
However, because there is a virtually unlimited number of stimulus classes that can be tested, such studies can only compare the response selectivity to stimuli of one class with that of another. They cannot, in the absence of prior knowledge, establish whether any given stimulus or class is optimal. Thus, the optimal spatial selectivity of V4 neurons remains to be determined. Even so, if the optimal selectivity over the full RF of V4 neurons is not yet experimentally accessible, there is no similar limitation to determining the selectivity of the inputs to V4, at least with respect to such low-level form cues as orientation and spatial frequency, by testing such selectivity over small subfields comparable in size to the inputs to V4 from V1 and V2.
This is not to suggest that such low-level selectivity always corresponds to the optimal selectivity of neurons in V1 and V2. Indeed, it has been found (Hegdé and Van Essen, 2000) that some cells in V2 ‘although not in large numbers' were more selective to non-Cartesian than Cartesian gratings and some other V2 cells responded especially well to complex stimuli such as acute angles. However, the initial determination of the low-level form cues, such as 𝛉 and SF selectivity over small subfields, still remains one of the most general and least arbitrary approaches to the response selectivity of the inputs to V4.
Our studies are motivated by two underlying assumptions; namely, that the RF properties of V4 neurons may be derived in some as yet unknown way from the properties of their subfields and by interactions between subfields and that the responses of individual subfields — as their size becomes small — largely, but not exclusively, reflect the properties of their afferent projections. We acknowledge that local circuitry within a V4 functional column, together with lateral connections within V4 and/or re-entrant projections from higher cortical areas, may play a critical — and perhaps stimulus-dependent — role in shaping the selectivity of V4 neurons. Moreover, we cannot exclude the possibility that the observed selectivities of any particular subfield might emerge as a result of non-linear interactions between inputs spanning overlapping subfields and that the apparent ‘optimal’ orientation selectivity of any given subfield may vary when stimuli of different test orientations are presented to other subfields within the same RF. Even so, these qualifications pose higher-order issues that can best be resolved after these initial studies have been completed.
Consequently, as a first step towards resolution of the optimal selectivity problem, we have undertaken to determine the spatial RF organization of V4 neurons, at least with respect to the low-level form cues of its inputs and, in particular, to do so by initially resolving the issue as to whether V4 subfields are homogeneous or heterogeneous with respect to their selectivity for 𝛉 and SF. Contrary to our expectations, we found support for the simple hypothesis that all subfields within any given V4 cell are predominantly selective to the same preferred values of 𝛉 and SF. We then further probed RF organization by testing lateral interactions across subfields and discovered a novel property of V4 neurons that may account for some earlier results by others. Finally, our results may have bearing upon the controversy as to whether neurons in V4 — and, by extension, those in still higher visual areas — function as rather general multi-purpose filters (Tarr and Gauthier, 2000) or as highly specialized non-linear feature detectors (Reisenhuber and Poggio, 1999; Kanwisher, 2000). This issue remains fundamental to further understanding the role of single neurons and neural networks in object recognition and visual perception (Crick and Koch, 1995; Pollen, 1999).
Materials and Methods
Anesthesia and Analgesia
Seven macaque monkeys (Macaca fascicularis) were maintained under sufentanil anesthesia. Arterial pulse and blood pressure were continuously monitored. Animal care was in accordance with institutional guidelines. The dose of sufenta was adjusted, generally within the 2–8 μg/kg/h range, to eliminate pain as judged by precluding abnormal increases in pulse or blood pressure, either spontaneously or in response to tail pinch. The animals were paralyzed with Pavulon 0.2 mg/kg/h to maintain retinal fixation and ventilated to keep the exhaled CO2 close to 4%. At the end of each experiment, the animal was killed with pentobarbital 100 mg/kg.
Cycloplegia was achieved by topical application of ophthalmic atropine. We utilized slit retinoscopy to select contact lenses that would focus each eye on the monitor set at 1 m. Trial lenses were used to adjust the refraction to within 0.25 diopters. The positions of the optic discs and foveae were back-projected using a reversible ophthalmoscope and mapped. We generally recorded from V4 in the right hemisphere and visually stimulated the left eye with the right eye covered.
Localization of V4
Eighty-two neurons were studied in the lateral prestriate cortex 2–3 mm anterior to the lunate sulcus. The sulcus was sometimes visible through the dura. Otherwise, we made a small durotomy medial to the intended recording site so that we could find the lunate sulcus. The position of the sulcus was then noted, the dura closed with a fine suture and the microelectrode placed to traverse the dura more laterally into prestriate cortex. In each experiment we made a single long penetration roughly perpendicular to the cortical surface sampling cells at 100–150 μm intervals. In the first experiment, we carried out histological verification of the electrode penetration and studied the track in relationship to established sulcal patterns (Desimone and Schein, 1987; Gattass et al., 1988; Felleman and Van Essen, 1991). Subsequently, we relied upon these sulcal patterns, which were reconfirmed at the end of each experiment while taking care that penetrations were carried no deeper than 3500 μm so as to avoid entering area V3A.
Tungsten microelectrodes fabricated to penetrate the dura (Microprobe, Inc.) and coated with parylene were used. Recording techniques were conventional.
All sine-wave gratings were modulated about a fixed mean luminance level at a drift frequency of generally 4 Hz. In studies of 𝛉 selectivity, 12 stimuli were tested at 30° intervals from 0 to 330° within each set of stimulus presentations or block. In studies of SF selectivity, SFs were tested at one octave intervals, generally from 0.25 or 0.5 cycles/deg to 8 or 16 cycles/deg within each block. Within each block, all visual stimuli and a blank were interleaved in a random order that changed from block to block. Stimulus duration was 1–2 s with 1 s between stimuli and with delays of three or more seconds between blocks. Generally, 10–20 stimulus blocks were required to achieve an acceptable standard error of the mean (SEM). Ten blocks were generally sufficient to define full-field orientation and spatial frequency selectivity, as well as to define length and width tuning. However, 20 blocks were generally required for subfield analysis and two-bar interaction studies. In these studies each bar was sinusoidally counterphased at 4 Hz so that the mean luminance over the RF was unchanged when bars of opposite contrast polarity were simultaneously counterphased. When two bars were counterphased at the same contrast polarity, there was a transient change in mean luminance for each half cycle, but with no average change in mean luminance over each full cycle of stimulation. Spatial frequencies were tested in one octave steps and orientations in 30° steps. Contrasts of 0.9–1.0 were generally used. Cubic spline interpolations were used to connect data points, except in a few cases where points were connected by straight lines. However, all statistical analyses utilized the discretely sampled data.
Stimuli were displayed on a 17′′ color monitor (EZIO XT-C7S) with D65 white point (CIE standard) and with a spatial resolution of 640 × 480 pixels and a refresh rate of 160 Hz. A Minolta CL-100 chroma meter was used to measure the tristimulus values and intensity response of the R, G and B phosphors. Monitor gamma was corrected using look-up tables.
Determination of RF Dimensions
Once we established a cell's tentative preferred 𝛉, we utilized the following technique to determine the RF dimensions and shape. To define the extent and center of the width dimension, we drifted an effective sine-wave grating of one half to one cycle, apertured within a long and narrow rectangular window (Fig. 1A, inset) that was randomly tested at a number of positions across the width axis of the RF as preliminarily mapped. To map the length dimension, we randomly tested a drifting sine-wave grating within a suitably short and wide rectangular aperture at random positions across the length axis of the RF (Fig. 1B, inset). The borders of the RF were taken as those positions on either side of the center where the interpolated curve representing the response versus position function equalled zero after the spontaneous level of activity had been subtracted. These RFs as so mapped and, as calculated by simply reading the field borders at the zero crossings off of suitable response versus position functions, may be taken to represent minimal discharge fields. RF diameters ranged from 3 to 6° at retinal eccentricities of 4–8° and up to a diameter of 8° at a more lateral eccentricity in one experiment. All RFs were located in the contralateral inferior field consistent with previous mapping (Gattass et al., 1988).
Calculation of Subfield Shifts in SF Selectivity across the RF
There would be no error in the method if our test values for SF ranged from –∞ to +∞. However, there is a limit on the lowest SF we can meaningfully test without introducing unacceptable spectral spread in the stimulus. Consequently, we set the lowest SF to be one full cycle of a sinusoidal grating across the subfield. If the peak SF of a subfield is shifted to lower values than that of the reference subfield at the RF center, then our method may underestimate the shift. The extent of such an underestimation will be evaluated after presentation of some relevant data illustrating the method and its results.
For the analysis of responses to paired stimuli, differences in responses between the reference position and the other positions were evaluated using the Dunnett's t-test (Winer, 1971). For cells in which a comparison between responses to pairs of stimuli of like and unlike contrast polarity is of interest, analysis of variance for repeated measures for a factorial design (Winer, 1971) was used to evaluate the presence of stimuli, position and stimuli by position interaction effects. In the presence of significant interaction effects, pairwise comparisons between responses to stimuli of like and unlike contrast polarity at each position were made using paired t-tests with Bonferroni corrections to compensate for the additive type I error due to multiple comparisons. Pairwise comparisons of responses to pairs of responses to stimuli of like and unlike contrast polarity between positions were made using Tukey's HSD multiple comparisons procedure (Siegel, 1988). The compliance with the distributional assumptions of these tests was evaluated by performing the Kolmogorov–Smirnov one sample test for normality (Siegel, 1988) on residuals after fitting the appropriate analysis of variance model.
Confirming previous work (Desimone and Schein, 1987), we found that the vast majority of the 82 V4 neurons studied behaved like the complex cells of V1/V2 (Foster et al., 1985), in that they were excited by both increments and decrements of light at all positions across the RF and responded to drifting sine-wave gratings of optimal SF with increases of predominantly non-modulated activity. This result held whether the grating was drifted across the entire RF or limited by a circular aperture with a diameter of several cycles of the optimal grating to a region much smaller than the RF. These responses give little or no indication of the local contrast sign and thus are the consequence of an even-order nonlinearity.
Using the technique described above, we found that the RFs showed either a maximum across both width (Fig. 1A) and length (Fig. 1B) dimensions defining a RF center, or an apparent central minimum generally approaching zero (Fig. 1C,D), but invariably less than half the amplitude of adjacent maxima. Seventy-two of the 82 cells showed central maxima and ten showed central minima when tested with long and narrow bars across the length dimension and wide and narrow bars across the width dimensions. However, cells showing central minima were also characterized by strong length-and width-stopping, particularly at the RF center where suitably narrow and short bars evoked strong responses. Thus, the observed minima are a consequence of our mapping technique and do not indicate a genuine minimum at the RF center.
We determined the 𝛉 selectivity over the full RF or, when necessary, over a suitably reduced region for 82 neurons. All but two of 82 V4 neurons responded at a single preferred 𝛉 with relative maxima at opposite drift directions (Fig. 2A,B) and relative minima that were orthogonal to the maxima and which were also 180° apart. The 𝛉 bandwidths were sometimes broader for one direction of motion than for the other (Fig. 2C). The degree of directional selectivity was variable, but in most cases the responses in the preferred direction were not more than twice those in the non-preferred direction. Some V4 cells responded with response minima falling to zero after the spontaneous firing level was subtracted (Fig. 2A), which was done in every study. Other cells responded with minima at the least preferred 𝛉 that were substantially above the spontaneous firing level (Fig. 2B). We chose to measure full bandwidths for the broader of the curves generated by the two directions of motion and to carry out the subsequent subfield analyses for 𝛉 in the direction yielding the broader curves because the greater breadth of the full-field 𝛉 tuning curve increased the opportunity for discriminating differences in 𝛉 tuning between subfields, if such were indeed present.
As shown by others (Desimone and Schein, 1987; Gallant et al., 1993), many V4 neurons respond weakly to full-field Cartesian gratings. Therefore, we measured full bandwidths at half maximal amplitude only in those V4 neurons for which the peak response exceeded 15 spikes/s to such full-field stimulations and where the SEMs across the sampling intervals of interest were non-overlapping between maxima and minima so as to permit a meaningful measurement (n = 51). We considered the possibility that some of the cells rejected for analysis of full-field bandwidth because of the above criteria (Fig. 3A) might not be selective to orientation. However, these same cells responded much more strongly and with pronounced orientation selectivity when circular stimuli were confined to suitably tuned subfields (Fig. 3C–F). Full bandwidths in V4 ranged from 46 to 168°, with a mean value of 97° and a median value of 88° and are substantially broader than those of V1 cells (DeValois et al., 1982b).
The ratio of response minimum to response maximum at the least preferred 𝛉 ranged from 0.0–0.48 with a mean value of 0.2. The distribution histogram for these ratios was unimodal. Such ‘non-zero asymptotes', as so designated (McAdams and Maunsell, 1999) in orientation selectivity studies of V4 cells for orientation-selective cells, as distinguished from non-orientation-selective cells, have not been reported at earlier cortical levels. Evidently, most V4 cells not only have broader bandwidths than neurons at early cortical levels, but may also receive weak input from all other 𝛉 bands.
Two of the 82 cells showed a different pattern, with secondary peaks to at least one 𝛉 that was orthogonal to the two principal peaks (Fig. 2C). Such strong responses at orthogonal 𝛉s were not a consequence of spectral spread in the stimulus, because enough cycles of the grating of optimal SF were always included in the test stimulus so that its spectral spread to 𝛉 bands orthogonal to the central 𝛉 was always <12%. However, because such secondary peaks were defined by only one datum and appeared only twice in 82 cases, we cannot exclude the possibility that they represent statistical outliers. Moreover, the normalized 𝛉 selectivity curves averaged for the population of V4 cells studied by McAdams and Maunsell show only two primary 𝛉 peaks at opposite drift directions, nor did these authors report any V4 cells that were not orientation-selective (McAdams and Maunsell, 1999).
Subfield Analysis of Orientation Selectivity
The question arises as to whether those V4 neurons more broadly tuned for 𝛉 than those in V1 may be tuned to different 𝛉s over different parts of the RF, or whether 𝛉 selectivity is simply broader across the RF of V4 cells in general. Consequently, we subdivided the RF into either nine subfields in a 3 × 3 array or tested three to five subfields across the width dimension. We apply the term ‘subfield’ to signify any circular subregion within the V4 RF that encompasses roughly one or two full cycles of the period of the grating of optimal SF characterizing inputs from V2 to V4. The preferred value of 𝛉 across the RF within any given cell varied little, as shown for the four strongest subfields within a cross-like array of five including a central subfield (Fig. 3C) and subfields below (Fig. 3D) and above (Fig. 3F) the central subfield. Subfields were also tested on either side of the central subfield. The subfield on the left side (Fig. 3E) produced a strong response, but the zone on the right side did not produce a sufficiently strong response to warrant analysis. Each subfield was tested 20 times at twelve 𝛉s that were randomly interleaved within each block.
This cell was one of the 30 that gave strong responses to subfield stimuli (Fig. 3C–F), but only weak response to a circularly enclosed rectilinear sine-wave grating over the full minimal discharge field (Fig. 3A). The weak responses for full-field stimulation are presumably attributable to the strong width-stopping found for this cell (Fig. 3B).
In order to determine whether there are shifts in 𝛉 preference across subfields, we first need to apply a non-arbitrary method to estimate the preferred 𝛉 within each subfield for each cell to study. To make such estimates, we first collapsed responses to opposite drift directions. By this we mean regarding each response at an angle 𝛉 between 180 and 360° as having been measured at 𝛉 – 180°, so that all the responses are labelled by angles between 0 and 180°. We then doubled the value of each 𝛉 to complete a 360° circle so as to eliminate possible ‘edge effects' that might otherwise be introduced by summing vectors over a 180° interval. We then computed the vector sum over each subfield and determined the preferred angle for each vector. Finally, to compensate for the prior multiplication of each value of 𝛉 by two, we divided each initially calculated value of 𝛉 by two to obtain the actual 𝛉 preference for each subfield. Thus, all the data from the vectors at all 12 evenly spaced test orientations are used to compute the preferred 𝛉 for each subfield. The differences in preferred 𝛉 between any given subfield and the subfield producing the strongest response can then be calculated for each cell.
The peak responses at the preferred 𝛉 for each cell are then normalized to l.0 with the lesser responses proportionally scaled. The preferred values of 𝛉 for the strongest responding subfield for each cell are then normalized to 0°. The combined results for the population of 20 cells so tested are then displayed as a scatter plot showing the deviation in subfield 𝛉 from that of the reference subfield versus the relative magnitudes of the subfields with smaller vector sums (Fig. 4A).
For subfields with normalized peak responses ≥0.4, only a few subfields have 𝛉 preferences that differ from that of the central reference subfield by >±15° or, equivalently, 0.5 sampling intervals. Only for weakly responding subfields at the fringes of the RF with normalized response amplitudes in the 0.24–0.38 range are there a few outliers with subfield shifts >30° or, equivalently, l.0 sampling intervals.
Thus, we conclude that the subfields across most V4 neurons are predominantly selective to the same preferred values of 𝛉. Such common 𝛉 selectivity could reflect common afferent inputs, predominantly from V2 but, as noted in the Introduction, we cannot exclude the possibility that the results emerge as a result of non-interactions with inputs from other, i.e. overlapping, subfields. Nor, at this point, can we assess the extent to which the observed 𝛉 preferences and bandwidths reflect the properties of afferent inputs or are already shaped by lateral interactions within V4 and/or by re-entrant feedback from higher cortical areas.
We have also taken the normalized magnitude of each subfield to adjust for their relative strength and calculated the mean shift in 𝛉 across subfields for the population (Fig. 4A) taking the algebraic sums. The mean of 0.07 ± 1.23° is not statistically different from 0°. The mean shift in 𝛉 for the same population taking the normalized absolute values of the shifts in 𝛉 is relatively small, equalling 5.61 ± 0.90°.
We also calculated the mean shifts in 𝛉 bandwidths (full bandwidths at half-maximal amplitude or FBWHA) across subfields with respect to the FBWHA of the subfield yielding the strongest response for any given cell (Fig. 4B). These results show more scatter at all values of normalized response amplitude than do the subfield differences for 𝛉 preference. The mean shift for the non-normalized shifts in FBWHA from the mean value of FBWHA for all subfields (74.0 ± 7.4°) is 7.2 ± 3.3° and the mean shift for normalized response amplitudes is 3.2 ± l.8°. It is unclear whether the measurement of FBWHA is inherently noisier than that of 𝛉 preference (perhaps because the former measurement is critically dependent on an accurate prior subtraction of the spontaneous level of activity, whereas the latter estimate is not) or has physiological significance.
Spatial Frequency Selectivity
Preferred SFs ranged from 1.0 to 4.0 cycles/deg. FBWHA for the 72% of neurons exhibiting band pass selectivity that responded adequately over the full RF (n = 51) ranged from l.8 to 3.9 octaves (mean, 2.2 octaves; median, 2.4 octaves). Because these bandwidths are not, in general, narrower than those of parafoveal V1/V2 complex cells — mean; 1.8 octaves (Foster et al., 1985) — they are not likely attributable to a higher-order non-linearity and are presumably defined by second-order statistics as are the complex cells of V1 (Gaska et al., 1994). The remaining 28% of cells exhibited low pass SF selectivity when tested down to 0.25 cycles/deg. Cells with band-limited and low pass selectivity were interspersed within individual penetrations.
Subfield Analysis of SF Selectivity
We also carried out subfield analysis to determine whether SF preferences were the same or different across the RF. In theory, SF gradients might encode objects receding or approaching in depth and might be missed in SF studies testing single gratings over large fields. In some studies, we tested nine subfields using a 3 × 3 array or three to five subfields across the width dimension as we had for testing preferred 𝛉 and found similar SF preferences at each position (Fig. 5A–D), as shown for the same four strongest subfields for the same single cell tested for 𝛉 tuning in Figure 3C–F.
We then calculated a ‘center of mass' for each curve (see Materials and Methods) to assess the extent of any differences in Cm across the RF. Many SF curves with bandpass selectivity are reasonably symmetric about the peak on an octave scale (Fig. 5A–D). Here, the subfield in Figure 5C yields the strongest response, so shifts in SF selectivity of other subfields are compared with this reference Cm. The Cm for the SF selectivity curve of Figure 5A is shifted to a lower value by 0.19 octaves, whereas those for the subfields of Fig. 5B,D are shifted very slightly to higher values by 0.08 and 0.05 octaves, respectively.
We now assess the potential limitations in the center of mass method. The limitation on the high SF side is of no physiological significance, because at the retinal eccentricities at which we worked there is at best a minimal response at 16 cycles/deg and no responsivity at or above 32 cycles/deg. Hence, if a SF curve for a subsection shifted to higher values in the 1–16 cycles/deg range, there would be no practical limitation in detecting the shift, because all responses for SF values tested above 16 cycles/deg would be essentially 0.
However, our methods would underestimate the Cm shift for a subfield that had a SF preference shifting towards lower values. Consequently, we have calculated the extent of such underestimated shifts in Cm if, to take an extreme case, the SF selectivity curve shifted by one octave. Such shifts of one octave or greater would have been visually apparent and such results were never observed, but, even so, let us estimate the error that would have been introduced had such shifts occurred. To make this estimate, we return to the results of Figure 5. The amplitude of the normalized response at the lowest SF tested (0.5 cycles/deg) equals 0.167 and the cell was tested in one octave steps up to 16 cycles/deg. Suppose now that we shift the curve to the left by one octave, in the process of which we ‘lose’ the value of 0.l67. Loss of this data point changes the balance of the original set of numbers, but only slightly so. When we calculate Cm for the remaining five values of SF, we compute a value of Cm that underestimates the imposed one octave shift by 0.15 octaves.
The underestimation is greater if the amplitude of the response at the lowest tested SF is higher. For example, for the curve of Figure 5B, the normalized value of the lowest SF tested at 0.5 cycles/deg equals 0.38. An imposed one octave shift to lower SF test values eliminates the lowest value of 0.38. Dropping this point would cause us to underestimate the imposed one octave shift by 0.37 octaves.
Of all subfields tested, 53% had normalized responses to the lowest test SF of ≤0.2, 26% had normalized responses between 0.2 and 0.35 and 21% had normalized responses between 0.35 and 0.5. Thus, for the majority of the population studied, the maximal underestimate of ΔCm, even in the worst likely case, is apt to be <0.15 octaves and ≥0.37 octaves in few cases.
Other SF selectivity curves are not so symmetric, but tend to show the same general shape across all subfields. Thus, in the asymmetric cases, the values of Cm may be shifted a bit to one side of the actual peak, but since we are interested in calculated shifts in Cm from subfield to subfield, the deviations from a purely symmetric function are not of major consequence. For greater spatial resolution along one dimension, we also tested five subfields arranged in a line across the RF for the same cell, while randomly interleaving values of both SF and spatial position to minimize the effects of changes in excitability between stimuli and found comparable results (Fig. 5G). The latter figure displays mean values for an experiment in which we tested ten blocks of six SFs at six values of SF. The ΔCms on one side of the central peak were 0.03 and 0.15 octaves on one side and –0.35 and –0.28 octaves on the other, showing no systematic change in ΔCm with position.
Similar analyses were done on 18 cells with bandpass SF selectively and the scatter plot of ΔCm for SF in octave sampling intervals versus the normalized peak for firing rates for each subfield is shown in Figure 4C. Most of the ΔCm s are <±0.25 octaves. Moreover, we also tested five additional cells with low pass SF selectivity and found that the positions of the 50% high cut-off frequencies across subfields did not differ by >±0.25 octaves.
Thus, because the ΔCms for SF across the RFs are small and do not vary systematically with position, we conclude that V4 neurons are predominantly selective to common preferred values of SF at all positions across the RF. These results hold both for cells with bandpass and low pass SF selectivity. Such common SF selectivity could reflect common afferent inputs, predominantly from V2 but, as noted in the Introduction, we cannot exclude the possibility that the results emerge as a result of non-interactions with inputs from overlapping subfields. Nor at this point can we assess the extent to which the observed SF preferences and bandwidths reflect the properties of afferent inputs or are already shaped by lateral interactions within V4 and/or by re-entrant feedback from higher cortical areas.
However, in one exceptional case a cell exhibited low pass SF selectivity when gratings were tested over the entire RF (Fig. 5E), but showed bandpass selectivity at all spatial positions that accounted only for the higher SF range when individual subfields were tested (Fig. 5F). This cell required stimulation over most of the RF to evoke responses at low SF, but did not violate the general finding that all subfields were selective to the same SF.
We also calculated the shifts in SF bandwidths (FBWHA) for selectivity curves for subfields relative to the subfield giving the strongest response in each cell. These results (Fig. 4D) show a greater spread in bandwidths across subfields than for SF preferences. The mean absolute shift in bandwidth equalled 0.39 ± 0.06 octaves, with the mean FBWHA over all subfields averaging 2.26 ± 0.13 octaves. As for the case of the greater spread in 𝛉 bandwidths than for 𝛉 preferences across subfields, we do not know whether the measurement of SF bandwidth is inherently noisier than that for preferred SF selectivity or reflects some as yet not evident physiological property.
Lateral Interaction Studies
Single-bar and grating stimuli are not sufficient to probe the organization of lateral interactions across the axial dimensions of neurons such as V4 cells, that are characterized by even-order nonlinearities. Such information is essential for predicting how such cells will respond to arbitrary brightness distributions. Two-bar interaction studies across space (Movshon et al., 1978) and across space and time (Gaska et al., 1994) have well characterized the second-order lateral interactions across complex cells in V1 and the same principles may be applicable for the study of neurons in higher cortical areas.
Such studies require selection of suitable bar lengths and widths for sampling across the RF and prior knowledge of the optimal length and width tuning for single gratings as a preliminary step may be helpful. Thus, we begin this section with a consideration of conventional length and width tuning of responses to individual grating patches. V4 cells showed considerable variation with respect to their intra-RF length-and width-tuning in agreement with earlier work (Desimone and Schein, 1987). Some cells show little if any length-and width-stopping; others show appreciable length-stopping, but little or no width-stopping; others show minimal if any length-stopping, but variable degrees of width-stopping; and still other cells show high degrees of both length- (Fig. 6A) and width-stopping (Fig. 6B). A scatter plot summarizes the ratios of the optimal lengths to the respective RF lengths along the y-axis plotted against the ratios of the optimal widths to the respective RF widths along the x-axis (Fig. 6I). Although the number of studies is relatively small (n = 23), the plot is sufficient to establish that some cells are subject to little suppression across either axis, others are subject to strong suppression across both axes and still others are subject to much more suppression along one axis than to the other, again confirming Desimone and Schein. Note also that mean response to single grating patches that are optimally tuned across both dimensions can be appreciable, approaching 100 impulses/s (Fig. 6B).
There are at least two ways that lateral interaction studies between a reference stimulus and a probe can be tested. The choice of circular or ovoid patches, defining one cycle of the optimal SF, enhances response signal but at the expense of spatial resolution, whereas, the choice of narrow single bars of less than one-half period of the grating of optimal SF enhances spatial resolution, but at the expense of response signal. We have employed both methods.
Altogether, we have tested lateral interactions across the width dimension of the RF in 26 V4 neurons. We tested such interactions using two spatially disparate one cycle circular or ovoid patches presented at different spatial offsets in ten bidirectionally selective V4 cells characterized by variable degrees of width-stopping. We paired the reference patch that was placed at the center of the RF together with a second test patch that was randomly interleaved at various positions to one side of the center across the width axis. Simultaneous stimulation by both patches either counterphased (Fig. 6C) or drifting in the same direction (Fig. 6D), reduced the response compared with that elicited by the control patch alone. The strength and extent of the suppression fell off with inter-stimulus distance and varied from cell to cell (Fig. 6C,D).
We next tested lateral interactions at higher spatial resolution in another 11 band-limited cells by combined stimulation with a narrow reference bar with a width not greater than half a period of the grating of optimal SF at the RF center and a second test bar or probe of equal size at a common contrast polarity. Such narrow bars produce responses substantially smaller than those evoked by the optimal sine-wave grating stimulus, but such discrete stimuli are required to test second-order spatial interaction with high spatial resolution. The second bar was randomly interleaved at various positions across the width axis on both sides of the first bar. As a control, the responses to a single bar were tested at intervals across the RF (Fig. 6E). Both stimuli were counterphased in-phase at 4 Hz, which was slow enough to permit resolution of steady state inter-stimulus interactions, yet rapid enough to yield a high signal-to-noise ratio.
This response profile (Fig.6E) — as well as all other single bar controls — showed a response maximum at the RF center (X = 0°) with the response declining more or less monotonically to the RF borders on each side of the center. However, the response to combined stimulation showed a statistically significant minimum at X = –0.25° (Fig. 6F), compared with the response at the reference position with P = 0.042. Here, the response to combined stimulation was only 28% of the response to the single bar at the reference position of 0°. The minima at –1.5 and 0.75° have standard errors of the mean that do not overlap those of their respective adjacent surrounding maxima, but after adjusting for multiple comparisons, these differences in response were not statistically significant (P = 0.193 and 0.21 respectively).
Even so, the curves generated in response to single bar stimuli (Fig. 6E) and paired stimuli (Fig.6F) are not equivalent. For example, the control curve (Fig.6E) can be easily fitted by a third-order cubic fit, but there is no polynomial regression up to the fourth order that fits the curve generated by the responses to paired stimuli. Therefore, at least some of the major differences between the two curves must reflect the effects of second-order interactions. However, because the present study is an initial descriptive exploration of paired interactions, we had not formulated any a priori hypotheses that could have been tested against the data.
The very next cell in the penetration showed response minima on either side of the center and the spacings between the minima were broader (Fig. 6G). Here, the response minima to combined stimulation at –1.1 and at 1.3° are reduced to zero. Thus, the response patterns define second-order interaction profiles that vary from cell to cell.
In order to formulate a metric to compare the maximal strength and spatial extent of the intra-receptive field suppressions across different cells, we plotted the magnitudes of the strongest suppression observed for each cell versus the spatial extents of the initial suppressive zone on each side of the RF center. Such suppressions vary widely in strength and spatial extent from cell to cell (Fig. 6H). The average suppression was 57.2 ± 7.0% and ranged from 30 to 100%, as seen in the scatter plot (Fig. 6H).
In some, but not all V4 cells, two-bar interaction studies at high spatial resolution reveal closely spaced antagonistic subzones similar to those in V1. For example, the response to the central bar alone (Fig. 7A, thin arrow) was reduced when a second bar of the same contrast polarity was simultaneously presented to either side of the central zone. The reduced response to two adjacent bars suggests activation of antagonistic flanking subzones. Conversely, when the two adjacent bars were presented at opposite contrast polarities, strong response summation was observed (Fig. 7A, open circles). This result is also consistent with the activation of antagonistic subzones, perhaps initially at earlier cortical levels. The non-linear response summation observed for adjacent bars of opposite contrast polarity (Fig. 7A, thick arrow) may simply reflect the consequence of threshold nonlinearities found as early as V1 (Schumer and Movshon, 1984), i.e. activation must exceed some threshold before cell firing can commence. As in the previous studies (Fig. 6E), the responses to the single bar fail to reveal such antagonistic subzones (Fig. 7B).
Because of the limitation on the generality of the results obtained using only optimally oriented elongated bars, we also tested pairwise interaction using small circular bright and dark discs in another five cells. The unbroken lines (Fig. 8A) represent the responses to small circular discs of like contrast polarity and the broken lines indicate the responses to pairs of discs of unlike contrast polarity with respect to a central disc of either contrast polarity at the center of the RF marked as 0°. The responses to the stimuli of unlike contrast polarity reach relative maxima at –1 and at l.5°, at which positions the responses to stimuli of like contrast polarity reach relative minima. The points in the two curves at X = –1° are statistically different with P = 0.009 and very nearly statistically significant at X = 1.5°, where P = 0.072. Responses to small discs are smaller than those to oriented bars of near optimal length and width (Figs 6A,B and 7B), but generally distinguish maxima and minima between curves to pairs of discs of like and unlike contrast polarity, as noted above.
We also demonstrated statistically significant facilitation when two discs of the same contrast polarity were tested across the width axis at an appropriate non-contiguous inter-stimulus offset (Fig. 8B). Even after compensating for additive type I errors due to multiple comparisons, the response maximum to paired stimuli at 1° relative to the response at the reference position is statistically significant at the P = 0.004 level. The qualification of non-contiguity is necessary to exclude those cases where two adjacent narrow bars simply produce a single wider bar, which sometimes evokes a stronger response than that to the single bar. We observed such non-contiguous enhancement in two-bar interaction studies in seven of the 26 cells tested across the width axis.
Stimulus Configurations that Minimize Axial Inhibition and Length-stopping
The above results — especially those on length- and width-stopping — suggest that one function of V4 cells may be to extract SF and 𝛉 information over subfields of different optimal lengths and widths and to generalize such specificity over a larger region of space than is possible in V1/V2. Such encoding may be especially pertinent when an observer selectively attends to a focal region within the RF. However, this function would not, in itself, explain the strong responsivity of V4 cells to polar gratings that span the RF (Gallant et al., 1993, 1996). These results suggest that the V4 cell may not be restricted to encoding some optimal sized grating patch. Thus, in view of our result that suppressive and excitatory interactions exhibit variations with inter-stimulus distance, we wondered what would happen if we could devise global stimuli that would enhance areal summation while reducing suppressive interactions.
Our stimulus presentation system constrained us to test at most two spatially offset stimuli or two stimuli in a center– surround arrangement. Within these constraints, we tried to configure stimuli large enough to enhance areal summation and narrow enough to reduce activation of lateral (width-stopping) and/or collinear (length-stopping) inhibitory mechanisms. For example, the area between the inner and outer diameters of an annular grating can be relatively large. This circular areal arrangement provides an opportunity for spatial summation as long as the distance across the annulus is kept narrow enough to avoid suppressive interactions from RF regions, both beyond the outer diameter of the annulus and within the inner diameter.
Thus, we reasoned that an intra-RF annulus of appropriate size confining a drifting sine-wave grating of optimal 𝛉 and SF would produce a much stronger response than would either full-field or central core circular stimuli of comparable 𝛉 and SF in those V4 cells that were subject to inhibition across either the width, length or both axes, but would not produce a stronger response for cells lacking such suppressive intra-RF interactions.
One test of the prediction that such an annulus will produce a much stronger response than either a full-field grating or stimulation of the RF central core is shown in Figure 9A. The RF diameter was 6°. Responses decreased with increasing length beyond an optimal value of 1–2°, falling to 50% of the peak at 4°. The optimal width also ranged from 1 to 2°, with the response falling by >75% as the RF was fully covered. The cell was broadly tuned for a vertical 𝛉 (Fig. 3) and exhibited low pass SF selectivity with a superimposed secondary peak at 4 cycles/deg (Fig. 9A3).
When we tested a drifting grating covering the entire RF of 6° in diameter, we obtained only a minimal response (Fig. 9A, curve 1). When we decreased the outer diameter to 4°, the response became substantially larger (Fig. 9A, curve 2). We then eliminated effective visual stimulation of the central core of 2° by presenting a central stimulus at a contrast of 0.01, which was below the threshold necessary to activate this neuron.
The resultant annulus produced profound increases in activity at all SFs tested within the cell's bandpass (Fig. 9A, curve 3), suggesting that the previously activated central core had strongly reduced the cell's responses to the outer annulus. The increases in response when the central core was removed ranged from >50% at 2 cycles/deg to 100% or greater at other test SFs (cf. Fig. 9A, curve 3 with Fig. 9A, curve 2). The peak mean responses to the annulus reached 175 impulses/s for a SF of 0.25 cycles/deg, with responses as high as 100 impulses/s at SFs as high as 4 cycles/deg. At the lowest SF, the annulus alternately appeared as a bright or dark ring around the grey core. We carried out the same test in seven additional V4 cells that exhibited significant inhibitory interaction across either the length and/or width dimensions. In all these cases the responses increased robustly when we removed the central core thus creating an annulus. Such results held equally well for cells exhibiting low pass (Fig. 9A) or bandpass SF selectivity (Fig. 9C).
All eight cells so tested responded much more strongly to the annulus than to the full-field stimulus of comparable outer diameter (Fig. 10). In seven of these eight cases the results were statistically significance, with P < 0.0005 in five cases. Even though only eight cells were so studied, the results are of such high statistical significance that the substantially greater response of these cells to an annulus than to a full-field stimulus can scarcely be in doubt.
We make no claim that an annulus is an optimal stimulus for any V4 neuron. It is much more likely, particularly in view of the wide range of responses to annuli in different cells (Fig. 10), that these results simply confirm the general principle that global stimuli that enhance areal summation while reducing suppres -sive interactions produce much stronger responses than stimuli that do not. We suspect that much more complex stimulus configurations than could be tested here, but which conform to the above principle, will better approximate the as yet unknown optimal stimulus configurations.
Moreover, when we replaced the very low contrast stimulus to the central core with a high contrast grating, but at a SF of 2 cycles/deg for the example shown in Figure 9A, we continued to find very strong responses when the SFs within the annulus differed by an octave or more from the value of 2 cycles/deg stimulating the central core (Fig. 9B). However, when both the central core and the annulus were stimulated together at 2 cycles/deg, so that we were, in effect, again stimulating at a single SF with a single 4° diameter stimulus, then the response dropped to 50 ± 12 impulse/s. This value is comparable to that found under identical stimulus conditions in a previous test (Fig. 9A, curve 2). Thus, texture discontinuities as well as contrast discontinuities between the annulus and the central core can produce robust responses. These results also suggest that activation of the suppression, presumably by inhibitory interneurons, is not only orientation-selective (Carandini et al., 1998), but is at least in part also SF-selective. However, we have not excluded the possibility that texture discontinuities across subfields in the 𝛉 and SF domains may modify 𝛉 and SF selectivities so as to contribute to the observed strong responses.
We also carried out several studies to determine how a cell responds as a function of the inner and outer diameter of an annulus. As for the cell of Figure 9A, we tested SF selectivity at the preferred 𝛉 across both the full RF and within the test annuli over a broad range of SFs. The cell responded weakly, but with an increasing response as the diameter of the circular aperture was increased from 1 to 6° (Fig. 9D1). We then set the outer diameter at 6° and varied the inner diameter from l to 5° in l° steps, leaving the central core at zero contrast. The cell responded three times more strongly at the preferred SF to the annulus with an inner diameter of 3° (Fig. 9C2) than to a full stimulus of 6° in diameter (Fig. 9C1). Moreover, the cell responded much more strongly to this annulus than to a central core stimulus of any size tested. The responses at the preferred SF of 2 cycles/deg are plotted versus the inner diameter of the annulus and show that the response falls off for annuli > or <3° (Fig. 9D2). Thus, a selective spacing between the outer and inner diameter of the annulus seems essential for the strong response to an annulus.
On the other hand, when we carried out control studies on ten neurons that lacked both width and length intra-RF inhibitory interactions, the response to the newly created annulus decreased, often by 50% or more, when we eliminated the central core. Thus, removing a suitable central core from a larger circularly bound rectilinear grating to generate an annulus robustly increases the response in cells characterized by strong intra-receptive field suppressive zones and markedly decreases response in cells lacking such zones.
The Packing Density of Inputs to V4 Neurons
We would like to formulate a qualitative model of the V4 RF that accounts for the results presented up to this point. However, to do so we will also need to estimate the packing densities of the inputs to prototypic V4 cells. Such estimates can be obtained by dividing the number of cycles of the optimal grating that extends across the RF by the probable number of cycles characterizing each input to V4. Therefore, as a first step we multiplied the RF diameters by the respective preferred SFs to determine the number of cycles of the grating of optimal SF that would span the RF without overlap. These numbers typically ranged from 12 to 16 cycles. We then divided these estimates by the probable number of cycles per input. In V1 and V2, most complex cells confine an envelope of 1.5–3 cycles of the grating of optimal SF (Foster et al., 1985), although most of the response is defined by the central 1.5 cycles as seen in typical second-order kernels of V1 complex cells (Gaska et al., 1994). Thus, we divided the total number of cycles spanning the RF, roughly 12–16, by 1.5 cycles, to obtain the number of inputs, namely 8–12, that would span the RF assuming no overlap. However, because the ‘envelopes' of the RF are generally smoothed or at most show only several changes in slope across the RF (Fig. 7B), we increased the number of inputs across the length and width dimensions by roughly 1.5-fold to account for overlap. This calculation yields 12–16 inputs across each dimension of the RF and roughly 144–256 inputs over the entire RF assuming that the aspect ratio, i.e. the ratio of length to width of the RFs of the inputs to V4, is roughly unity, as holds for many V1 complex cells (Gaska et al., 1994). Such an arrangement of inputs to V4, as derived from early cortical levels, is schematically shown in Figure 10A.
Classes of Stimuli Evoking Strong Responses in V4
Although some of the following results were discussed earlier in a different context, it is convenient to review the classes of stimuli that produced robust activation of V4 neurons within a separate section.
When inhibitory interactions across both length and width dimensions are minimal, strong summation may occur for full-field gratings, accounting for mean responses >150 impulses/s (Fig. 2C). Moreover, when there is minimal length suppression — whether or not there is width suppression — a long narrow bar may also evoke responses reaching 150 impulses/s (Fig. 7B). When there is strong suppression along both length and width axes, an oval grating patch of optimal length and width may evoke responses approaching 100 impulses/s (Fig. 6A,B). Moreover, concentric annuli configured so as to enhance areal summation while reducing suppressive interactions, can evoke activity approaching 200 impulses/s (Fig. 9A3). All of these robust responses were obtained in sufentanil-anesthetized macaques in response to achromatic monocular stimuli. Responses would likely be even higher in alert macaques selectively attending binocularly (McAdams and Maunsell, 2000) to these same classes of stimuli additionally tuned to optimal chromaticity.
One major result of this study is that subfields of the V4 neurons that we studied receive inputs that for any given cell are predominantly selective to single common values of 𝛉 and SF. Could these results be a consequence of inadequate sampling? This possibility seems remote, given that the sizes of the subfields tested were taken to include one to two cycles of the grating of optimal SF — a choice taken to match the size of the likely inputs to V4 from V2 and to minimize spectral spread in both the 𝛉 and SF domains. SFs were sampled in one octave steps and orientation in 30° steps. Moreover, in addition to testing 3 × 3 arrays, we also tested five or six smaller subfields across the RF diameter for improved spatial resolution. Further reduction in stimulus size would have improved spatial resolution, but at the cost of poorer resolution in the 𝛉 and SF domains. Had there been punctate regions across the RF receiving narrowly tuned inputs that fell between our sampling parameters, we might have missed such inputs in studies of any given cell. However, the chances are remote that we would not have found more narrowly tuned inputs had they been present in any of the 20 subfield analyses for 𝛉 or the 23 for SF. Thus, our conclusion that the V4 neurons we sampled are predominantly selective to single common values of 𝛉 and SF appears well founded. Even so, we cannot exclude the possibility that subdivisions of V4 may exist within which such findings do not apply.
Moreover, the present results may account for conclusions by others. For example, it has been shown (Gallant et al., 1995) that V4 neurons are not selectively sensitive to 3-D texture patterns, but rather show complex, non-linear responses to stimulus properties related to SF and 𝛉 content. Gallant et al. also state that many cells are insensitive to the global spatial positions of patterns (i.e. their non-Cartesian stimuli), a result also consistent with other findings (Pasupathy and Connor, 1999) with respect to their set of feature contours. These invariances could be obtained in different ways, perhaps most simply if neurons were very broadly tuned for both 𝛉 and SF. Our results, however, account for these results on the basis of common preferences for 𝛉 and SF across the RF for neurons, with band-limited response characteristic for 𝛉 and in the majority of cases for SF as well.
Our first result, schematically depicted in Figure 11A, extends the hierarchical principle of RF organization first recognized to hold for simple to complex cell projections in the orientation domain in V1 (Hubel and Wiesel, 1962), to the spatial, orientation and spatial frequency domains within an intermediate extrastriate cortex. It may also be noteworthy that the neural correlates of the psychophysical ‘channels' (Campbell and Robson, 1968; Blakemore and Campbell, 1969) sensitive to individual 𝛉 and SF bands retain their selective identities from V1 at least through V4. Moreover, in MT/V5 directional selectivity across the RF appears to be position invariant (Raiguel et al., 1995), suggesting that some common principles of RF field elaboration at intermediate levels — at least with respect to the positional invariance of preferred lower-level cues — may hold within both dorsal and ventral streams.
The open and shaded subzones characterizing the inputs to V4 (Fig. 11A) represent second-order inputs from complex cells at early cortical levels that would respond equally to increments or decrements of light at both central (open bar) and flanking regions (shaded bars). Such cells respond most strongly when the central and flanking zones are stimulated at opposite contrast polarities. These zones are assumed to be modulated sinusoidally, but are schematically shown as square waves for illustrative simplicity.
The second major result is that the responses of V4 cells to a reference stimulus may be either reduced by intra-RF inhibitions or enhanced by intra-RF facilitations across the width axis and that these effects vary independently in magnitude and spatial extent from cell to cell. There have been previous studies in V4 (Reynolds et al., 1999), MT and MST (Recanzone et al., 1997) and IT (Rolls and Tovee, 1995) in which the response to one stimulus has been suppressed by the presence of a second. Such test stimuli were bars of variable color and orientation in V4, objects in MT and MST, and faces in IT. However, our studies differ from these both in technical aspects and in motivation. Each of our paired stimuli has been tuned to reflect the properties of very local maxima. Moreover, unlike previous studies, we systematically explored the dependence of inter-stimulus interactions on distance with high spatial resolution. Furthermore and also unlike previous studies, we have tested pairs of stimuli of both like and unlike contrast polarity. All of this has been necessary because we are not simply testing the effect of a ‘distractor’ upon the response to an attended stimulus, but rather we are exploring at high resolution the second-order spatial interaction profile across the RF for the reasons given earlier.
Our third result is that V4 cells respond strongly not only to local grating patches of optimal 𝛉, SF, length and width, but also to global stimulus configurations that enhance circumferential areal summation while reducing suppressive interactions between adjacent excitatory inputs. Several electrophysiological studies (Gallant et al., 1993, 1996; Kobatake and Tanaka, 1994; Pasupathy and Connor, 1999) and human psychophysical studies (Wilson et al., 1997; Gallant et al., 2000) have suggested that V4 neurons participate in the encoding of curvature. Moreover, recent fMRI studies (Wilkinson et al., 2000) have found that concentric and radial gratings activate V4 significantly more strongly than conventional sinusoidal gratings.
Our results support these views with respect to the strong responsivity to concentric gratings, not only because we have found especially robust responses to annular stimuli, but because we can relate their response strength both to the cell's selectivity for SF and 𝛉 and to reductions in intra-RF suppression. Our annular stimuli were always presented within apertures at the preferred 𝛉. However, the relatively broad 𝛉 tuning of many V4 cells and the frequently encountered non-zero responses at non-preferred 𝛉s would make the response of the V4 cell more tolerant to segments at non-preferred 𝛉s that deviate from lines at preferred 𝛉s that together comprise angles and curves as studied by others (Pasupathy and Connor, 1999).
However, we emphasize that our results do not prove that an annulus is the optimal stimulus for end- and width-stopped neurons. Rather, the enhanced responses of these cells to annuli over those to full-field gratings support the principle that stimulus configurations that enhance areal summation while reducing the opportunity for suppressive interactions will produce especially robust responses. Thus, at least some V4 neurons may respond much more strongly to as yet unspecified global stimulus configurations than to either single annuli or to localized features that may evoke only local maxima.
Nor do our results suggest that all V4 cells are selective only to curvature. Both we and earlier workers (Kobatake and Tanaka, 1994) found cells that respond strongly to long narrow bars, and we found several cells lacking end- and width-stopping that responded well to extended rectilinear gratings. In our view, the afferent excitatory inputs set the preferred 𝛉 and size scale (SF), but the spatial selectivity of V4 neurons is sculpted by intra-RF inhibitory and facilatory interactions of variable spatial extent and magnitude. If this is so, the sets of optimal stimuli encoded by V4 neurons within any 𝛉 and SF pairing may be extensive.
Preliminary Model of the Spatial Organization of the V4 Neuron
At the onset we acknowledge that there are not yet enough data available to formulate a fully comprehensive model of the spatial RF of the type of V4 neurons we have studied. However, any such model should accommodate our key new findings and provide an early opportunity to discern those deficiencies in existing information that must be remedied before a more complete model can be formulated. It is in this spirit that we propose a preliminary model. Nor can we know at this point how many levels of the visual system contribute to the intra-RF suppressions observed in V4. We assume that divisive inhibitions (Reichardt et al., 1983) and/or contrast gain control renormalizations (Heeger, 1992) occur at earlier cortical levels and perhaps between V4 cells with overlapping RFs as well. Consequently, we take the experimentally observed intra-RF two subfield interaction as a measure of the combined result of all feed forward, feedback and intracortical interactions. Thus, we formulate an ‘equivalent’ rather than a literal model (Fig. 10B) for the inhibitory interactions between inputs.
We then model the V4 cell as if each subfield receives direct excitatory inputs from earlier cortical levels that for any given cell are selective to common preferred 𝛉s and SFs. We further assume that adjacent inputs mutually inhibit or excite each other along width, length and oblique axes, but generally with different strengths that fall off with distance — but not necessarily uniformly so — along these axes (Fig. 10B). We also assume that suppressive and facilatory strengths at different loci are scaled according to the strength of the response to single stimuli at these loci. This assumption is consistent with the principle that response selectivity — at least for low level form cues — would be invariant across the RF but for magnitude.
We also assume that all excitatory afferent inputs are characterized by second-order inputs with a width of 1.5 cycles and length:width aspect ratios of roughly unity (Gaska et al., 1994). We further assume that the major afferent excitatory inputs to V4 originate from V2 (Felleman and Van Essen, 1991) and we omit minor projections from V1 to V4 that may be eccentricity dependent (Zeki, 1971) and ‘notably sparse and/or inconsistent’ (Van Essen et al., 1986). For now, we also omit the weak inputs at non-preferred 𝛉s that produce the non-zero asymptotes in the 𝛉 selectivity curves of some V4 neurons.
We tentatively assume that the RF is roughly circular, with the responses of the inputs falling off according to a common Gaussian function across length and width dimensions and that the subfields of inputs along these dimensions overlap by ~50%. We assume a rectangular packing architecture, but acknowledge that other architectures, such as hexagonal packing, have not been excluded. We also assume that inputs would summate linearly until saturation occurred, were it not for divisive inhibition or gain renormalizations between inputs to adjacent subfields. We assume, following earlier work (Wilson, 1999), that the spatial spread of inhibitory interactions is generally greater than that of excitatory.
A model incorporating the key elements described above accounts at least qualitatively for our results. Long narrow bars evoke strong responses when there is minimal length stopping. Grating patches of optimal 𝛉, SF, length and width take advantage of local areal summation until surrounding inhibitory interactions win out. A concentric annulus might activate areal summation over a circular perimeter — assuming less suppression than summation along the annular perimeter — while reducing inhibitory interactions across both inner and outer borders along length or width axes. Thus, the formulation of even this preliminary model identifies a problem area where relevant physiologic data are lacking; namely, what are the summation properties for interactions to stimuli along a circular perimeter at positions where the two stimuli are neither in an axial nor a length axis collinear relationship?
Geometric considerations of RF substructures and the assumption of symmetrical interactions about the RF center suggest that V4 neurons with summation along one major axis and suppression along the other may also respond well to oval or ellipsoidal annuli elongated with respect to the axis associated with non-suppression. It also remains to be determined whether concentric or ellipsoidal annuli with multiple rings will activate neurons more strongly than single-ringed annular stimuli. If subfield interactions about the RF center are asymmetric, then the classes of stimuli that strongly activate V4 neurons should reflect corresponding asymmetries. Moreover, since V4 also projects back to V1 and V2 (Felleman and Van Essen, 1991) the possibility should also be considered that such recurrent projections enhance the contrast gain of neurons at these earlier levels sharing common preferences for 𝛉 and SF, just as corticofugal projections from V1 enhance the contrast gain of LGN neurons (Przybyszewski et al., 2000).
In conclusion, we have formulated a preliminary qualitative model of a prototypic V4 neuron based on subfield analysis that offers prospects for further experimental and theoretical refinement. The results that provide the basis for this model indicate that the afferent excitatory inputs to V4 neurons from earlier cortical levels are homogeneous with respect to 𝛉 and SF selectivity. This simplification in input specificity, at least with respect to these low-level form cues, with a concurrent reduction in the size of parameter space that must be explored, opens the way for predicting response properties of V4 neurons based on the determination of 2-D second-order interaction profiles across a number of RF axes using the methods described here and perhaps in future the application of 3-D (i.e. two dimensions of space and one of time) spatiotemporal reverse correlation studies of V4 neurons by applying methods computationally analogous to those used to study complex cells in V1 (Gaska et al., 1994). Thus, in time it should be possible to determine the extent to which second-order interactions are predictive of the responses of V4 neurons to arbitrary visual stimuli. Moreover, the classes of stimuli that may be revealed by these methods to evoke the most selective and robust responses in V4 will likely be potent stimuli, either individually or in combination, to neurons in TEO and IT that receive projections from V4. Thus, the further determination of the response specificity of V4 neurons building upon the present findings may provide the opening wedge for discovery of the principles of RF organization at still higher visual cortical levels.
D.A.P. and A.W.P. received support from the Department of Neurology, University of Massachusetts Medical School. M.A.R. was supported in part by Defense Advanced Research Projects Agency and Office of Naval Research grant N00014-95-1-0409. We thank Kumadini Misra for technical assistance, Dr Robert Lew and Stephen Baker for statistical analyses and David Pollen and Dr Jian-Bin Mao for helpful discussions. We are grateful to Drs Jack Gallant, Tomaso Poggio and Hugh Wilson for reading a pre-submission draft and for their constructive suggestions. We are especially grateful to the two anonymous reviewers for their constructive comments.