A fundamental question in sensorimotor control concerns the transformation of spatial signals from the retina into eye and head motor commands required for accurate gaze shifts. Here, we investigated these transformations by identifying the spatial codes embedded in visually evoked and movement-related responses in the frontal eye fields (FEFs) during head-unrestrained gaze shifts. Monkeys made delayed gaze shifts to the remembered location of briefly presented visual stimuli, with delay serving to dissociate visual and movement responses. A statistical analysis of nonparametric model fits to response field data from 57 neurons (38 with visual and 49 with movement activities) eliminated most effector-specific, head-fixed, and space-fixed models, but confirmed the dominance of eye-centered codes observed in head-restrained studies. More importantly, the visual response encoded target location, whereas the movement response mainly encoded the final position of the imminent gaze shift (including gaze errors). This spatiotemporal distinction between target and gaze coding was present not only at the population level, but even at the single-cell level. We propose that an imperfect visual–motor transformation occurs during the brief memory interval between perception and action, and further transformations from the FEF's eye-centered gaze motor code to effector-specific codes in motor frames occur downstream in the subcortical areas.
One of the most fundamental, yet illusive, questions in sensorimotor neuroscience concerns where, and how, signals defined in sensory space become spatially tuned commands for motor effectors (Sparks 1986; Flanders et al. 1992; Andersen et al. 1993; Pouget and Snyder 2000; Wurtz et al. 2001; Kakei et al. 2003; Smith and Crawford 2005; Crawford et al. 2011). This question has proved particularly difficult to answer in the gaze control system because of the normally high spatial correlation between gaze parameters, such as target location versus gaze end point location (Platt and Glimcher 1998; Snyder 2000), retinal coordinates versus gaze displacement coordinates (Crawford and Guitton 1997; Klier et al. 2001), and (in the head-unrestrained condition) gaze, eye, and head motion (Guitton 1992; Freedman and Sparks 1997a, 1997b; Gandhi and Katnani 2011; Knight 2012). This leaves three computational questions unanswered: When and where does the spatial transformation from coding stimulus location to coding movement-related parameters occur? When and where is visual information transformed from retinal coordinates into motor coordinates? And, how and where are gaze commands split into signals that result in the coordinated movement of the end-effectors, namely, eye and head?
One way to approach these general questions is to establish the specific signals encoded in key gaze control areas. One such area is the frontal eye fields (FEF), a dorso-lateral frontal lobe structure with reciprocal projections to many striate and extrastriate cortical areas including area V4, the lateral intraparietal cortex (LIP), supplementary eye field, and prefrontal cortex (PFC). FEF also makes reciprocal connections with subcortical areas involved in rapid gaze shifts, including the superior colliculus (SC) and brainstem reticular formation (Schiller et al. 1979; Stanton et al. 1988; Dias et al. 1995; Dias and Segraves 1999; Schall et al. 1995; Sommer and Wurtz 2000; Munoz and Schall 2004). Low current stimulation of the FEF in alert cats and monkeys evokes short-latency saccades in head-restrained conditions (Robinson and Fuchs 1969; Guitton and Mandl 1978; Bruce et al. 1985), and eye–head gaze shifts in head-unrestrained conditions (Ferrier 1876; Tu and Keating 2000; Chen 2006; Knight and Fuchs 2007; Monteon et al. 2010). Like most gaze control areas, FEF neurons show responses that are time-locked to visual stimuli (visual response) and/or saccade onset (movement response; Bizzi 1968; Mohler et al. 1973; Bruce and Goldberg 1985). Furthermore, these responses are spatially selective, that is, plotting them in two-dimensional (2D) spatial coordinates often yields well-organized visual and/or movement response fields (RFs; Mohler et al. 1973; Bruce and Goldberg 1985). However, it is unknown exactly what spatial codes are embedded within these FEF responses. Specifically, does FEF visual/movement activity encode visual target locations or desired gaze end points? What frames of reference are used to represent such codes? Is the FEF further involved in dividing these signals into specific eye and head commands?
The question of target location versus gaze end point coding has been addressed in the FEF (and other oculomotor structures interconnected with the FEF) using tasks in which the saccade end point is spatially incongruent from the visual stimulus. For example, monkeys can be trained to make saccades opposite to the visual stimulus (antisaccades; Everling and Munoz 2000; Sato and Schall 2003), or in a direction rotated 90° from the target (Takeda and Funahashi 2004). Such studies suggest that FEF visual responses are generally tuned to the direction of the visual stimulus, and movement responses are tuned for saccade direction (Everling and Munoz 2000; Sato and Schall 2003; Takeda and Funahashi 2004). However, it is not certain whether these movement responses encode saccade metrics, or a spatially reversed/rotated representation of the target (Everling and Munoz 2000; Zhang and Barash 2000; Medendorp et al. 2005; Munoz and Everling 2004; Amemori and Sawaguchi 2006; Fernandez-Ruiz et al. 2007; Collins et al. 2008). Consistent with the second possibility, it has been suggested that the movement responses in the FEF may code for the saccade goal rather than the metrics of the movement (Dassonville et al. 1992). Other methods to spatially separate the target from the gaze end point involve saccadic adaptation (Frens and Van Opstal 1997; Edelman and Goldberg 2002), weakening eye muscles (Optican and Robinson 1980), or the natural variability in gaze end points relative to the target (Stanford and Sparks 1994; Platt and Glimcher 1998; DeSouza et al. 2011), but, to date, these techniques have not been applied to the FEF.
Frames of reference have been tested by recording from visual or movement RFs from several different eye positions to see which spatial frame (eye or head) yields the most spatially coherent RF, that is, with the least variability in activity for the same spatial coordinates (e.g., Jay and Sparks 1984; Avillac et al. 2005). Head-fixed FEF studies tend to support an eye-fixed coding scheme (Bruce and Goldberg 1985; Russo and Bruce 1994; Tehovnik et al. 2000). However, the spatial frames for RFs in the FEF have not been tested with the head unrestrained. This is not a trivial step because in head-unrestrained conditions more frames are discernible (eye, head, and space), torsion (rotation about the visual axis) is much more variable (Glenn and Vilis 1992; Crawford et al. 1999), and inclusion of head motion can alter the codes observed in head-unrestrained conditions (Paré and Guitton 1990; Cullen and Guitton 1997). The reference frames for gaze have also been studied by analyzing the dependence of stimulation-evoked eye movements on initial eye position. Head-restrained stimulation studies have favored eye-centered codes with minor eye position modulations (Bruce et al. 1985; Tehovnik et al. 2000). Some head-unrestrained stimulation studies also favored eye-centered movement coding for gaze (i.e., final gaze direction relative to initial eye orientation; Tu and Keating 2000; Knight and Fuchs 2007), but others have favored intermediate (eye–head-space) reference frames (Monteon et al. 2013).
Finally, the role of the FEF in coding gaze direction, as opposed to eye and/or head movement signals, also remains controversial. To date, only one study has investigated this question by recording single-unit activity in the FEF during head-unrestrained gaze shifts, using regressions between FEF movement activity along the peak “on–off” axis of each neuron's directional tuning and behavioral data obtained from 2D behavioral recordings (Knight 2012). This study confirmed the role of the FEF in the production of coordinated eye–head gaze shifts, but also suggested that the movement responses of individual FEF neurons possess separate codes for gaze, eye, and head motion. Head-unrestrained stimulation of the FEF produces different results depending on the details of stimulation site, stimulus parameters, initial eye/head orientation, and behavioral state. In brief, FEF region stimulation can produce naturally-coordinated gaze shifts (Monteon et al. 2010, 2013), saccades followed by head movements or head movement alone (Chen 2006), or different amounts of eye and head movements, depending on initial eye position (Tu and Keating 2000; Knight and Fuchs 2007), as often observed in normal behavior (Guitton and Volle 1987; Freedman and Sparks 1997a, 1997b).
In short, controversies and/or gaps in knowledge continue to exist with respect to nearly every question that has been asked about the role of FEF in spatial transformations for gaze. Furthermore, it is often difficult to cross-reference previous results because each experiment focused on a subset of these questions in a different experimental preparation. In particular, to date, there has not been a comprehensive attempt to compare all of these questions in the visual versus movement responses of FEF neurons.
In the current study, we addressed these issues by fitting spatial models corresponding to all of the options described above to FEF visual and movement responses recorded in head-unrestrained monkeys (a more detailed description of these models is provided in Fig. 4 and accompanying text). Rather than using 1D regressions, we made nonparametric fits to the visual and/or movement RFs of FEF neurons, and determined which spatial coordinates (i.e., corresponding to the possibilities discussed above) gave the most coherent representation. Importantly, these coordinates were derived from 3D behavioral recordings, where the explained variance originated from untrained variations in behavior (Keith et al. 2009; DeSouza et al. 2011). The results show that FEF visual and movement responses encode different physical parameters (i.e., target position vs. gaze position) often within the same neurons, but always in eye-centered coordinates. This suggests a role for FEF in eye-centered visual-to-motor transformations, with other spatial transformations implemented downstream.
Surgical Procedures and 3D Gaze, Eye, and Head Recordings
All protocols were in accordance with the Canadian Council on Animal Care guidelines on the use of laboratory animals and approved by the York University Animal Care Committee. The data were collected from two female Macaca mulatta monkeys. Animals were prepared for chronic electrophysiological recordings and 3D eye movement recordings. Each animal underwent surgeries described previously (Crawford et al. 1999; Klier et al. 2003). We implanted the recording chamber, which was centered in stereotaxic coordinates at 25 mm anterial for both monkeys, and 19 and 20 mm lateral for monkeys S and A, respectively. A 19-mm-diameter craniotomy covered on the base of the chamber allowing access to the right FEF. A recording chamber was attached over the trephination with dental acrylic. Two 5-mm-diameter sclera search coils were implanted in one eye of each animal.
During experiments, animals were seated within a primate chair modified to allow free motion of the head near the center of three mutually orthogonal magnetic fields (Crawford et al. 1999). This, in combination with the scleral coils, allowed for 3D recordings of eye (i.e., gaze) orientation (horizontal, vertical, and torsional components of eye orientation relative to space). During experiments, two orthogonal coils were also mounted on the skull to provide similar 3D recordings of head orientation in space. Other variables such as the eye orientation relative to the head, eye, and head velocities, and accelerations were calculated from these quantities (Crawford et al. 1999).
Basic Behavioral Paradigm
Visual stimuli were laser projected onto a flat screen, 80 cm away from the animal. To separately analyze visually evoked and movement-related responses in the FEF, monkeys were trained to perform a standard memory-guided gaze task, which imposes a temporal delay between target presentation and movement initiation. In this task, the animal fixated on an initial central target position for 500 ms, before a single visual stimulus was briefly flashed for 80–100 ms on the screen serving as the gaze target. After the disappearance of the gaze target, the animal maintained fixation on the initial target for 400–800 ms until it was extinguished, cueing the animal to make a gaze shift (with the head completely unrestrained) to the remembered location of the target (Fig. 1A). If the gaze shift started after the go-signal, and the final gaze position fell within the spatial acceptance window for at least 200 ms, a juice reward was given to the animal, via a tube fixed to the head. A relatively large acceptance window (∼5–10° in radius proportional to the eccentricity of the center of the target array) was set to allow for the variability of memory-guided gaze shifts (Gnadt et al. 1991; White et al. 1994), which in turn was used in our analysis (see below). Further details of initial and final target placements, and gaze/eye/head kinematics, are shown in Figure 1B,C, and described in the following sections.
We recorded extracellular activity from FEF neurons using tungsten microelectrodes (0.2–2.0 mΩ impedance, FHC Inc.). The neural activity was amplified, filtered, and stored for offline cluster separation applying principal component analysis with the Plexon MAP system. The recorded sites were confirmed to be within the low-threshold FEF (<50 µA) using microstimulation criteria defined by Bruce and Goldberg (1985) in head-restrained conditions. Every effort was made to sample evenly from the entire medio-dorsal extent of the FEF in both animals. Consistent with previous studies (Stanton et al. 1989), we found a few sites outside of the arcuate sulcus, but most of these were excluded from the analysis (Fig. 2). In most recording sessions, the search for neurons was conducted when the animal was freely scanning the environment in a lighted room with the head free to move. Once a neuron had clear and stable spiking activity, the experiment began. In the first step of the experiment, the neuron's visual and/or movement RF was characterized while monkeys made memory-guided gaze shifts from a fixed central fixation location to randomly presented targets within an array of targets (5–10° apart), covering ±40° visual angle in all directions. Once the spatial extent of visual and movement RFs was roughly characterized, in the second step, an array of gaze targets was set to cover within and just outside of the RF of the neuron. Gaze targets were typically positioned in 4 × 4 to 8 × 8 arrays (5–10° apart) depending on the size and shape of the RF. Initial fixation target positions were randomized within a square window with width size ranging from 10° to 40° proportional to the size of the RF (Fig. 1B). For most neurons with RF that extended beyond 40°, the range of initial fixation targets was shifted from the center by up to 10° away from the RF to allow for larger retinal eccentricities. Importantly, the variability in initial target positions helped to increase the variability in initial 3D gaze, eye, head distributions and displacements (Fig. 1C) for our analysis (see below).
Data Inclusion Criteria
We only analyzed neurons that were clearly isolated and task-modulated. This included cells with clear visually evoked and/or presaccadic movement activity (Fig. 3). Cells that only exhibited anticipatory, postsaccadic (activity starting after saccade onset), or delay activity were excluded. This stage of neuron inclusion was based on the qualitative examination of poststimulus time histogram plots of individual neuronal responses.
In addition, individual trials were excluded offline, based on three behavioral criteria: First, a spatial criterion that included all trials (irrespective of whether or not final gaze position fell in the acceptance window during online monitoring of behavior) with the exception of trials with the final gaze position falling in the opposite direction of the gaze target or with gaze error exceeding 2 SDs beyond gaze error versus retinal error regression line (gaze errors were larger for larger retinal errors). Furthermore, trials were excluded based on a temporal criterion that excluded trials in which the subject made anticipatory gaze shift, either before or within 100 ms after the go-signal. Finally, trials in which gaze, eye, and head were not stable during the delay period were eliminated. Given that in head-unrestrained conditions, despite stable gaze on fixation target, the eye and the head can move (vestibulo-ocular reflex, VOR), the few trials (<3%) in each session in which the head was clearly moving (velocity >10°/s) during the delay period were excluded. After applying all of these criteria, on average, 221 (median = 198; SD = 132; min. = 59; max. = 657) trials per neuron were used for analysis.
Sampling Windows for Visual and Movement Activity Analysis
The “visual epoch” was defined as a fixed temporal window of 80–180 ms after target onset, corresponding to the early stages of sensory processing (i.e., the visual transient; Fig. 3, left column). The “movement epoch” was defined as a 100-ms perisaccadic period, ranging from −50 to +50 ms relative to gaze onset (Fig. 3, right column). This fixed window was chosen because it contained the high-frequency perisaccadic burst ramping up to the peak of movement activity (Fig. 3) and therefore (1) provided a good signal-to-noise ratio for our analysis method, and (2) most likely represented the period in which FEF activity influenced gaze shifts (the saccadic component of our gaze shifts on average lasted 140 ms and it takes about 20–30 ms for FEF signals to reach eye muscles; Hanes and Schall 1996). However, the full movement burst of the neurons in our sample on average started from 98 ms before saccade and lasted 85 ms after the end of the saccade, well into the VOR/head movement period. Therefore, we also did our analysis using the full movement burst to fully test the possibility that the movement signal was coding for head movement. The full-burst window was selected from the time point at which the spike density profile started to ramp up (at the inflection point on the spike density plot) till the time point at which the activity subsided to its lowest level (Figs 3, right column, and 9A,F).
Neuron Classification and Calculation of Visuomovement Index
Since some of our trials involved eccentric positions outside of visual/movement RF, and since we did not know a priori which spatial model to use for measuring the RF hot-spot activity, we characterized our neurons based on trials that showed the top 10% of activity in the fixed sampling windows, which we called “Vis10%” for the visual response and “Mov10%” for the movement response (red traces in Fig. 3). This roughly corresponds to trials toward the peak of the RF when represented in the correct spatial model. A neuron was considered to have either a visual or movement response (or both) if the Vis10% or Mov10% activity exceeded the firing rate in the 100-ms pretarget baseline by at least 25 spikes/s.
To quantify the relative strength of visually evoked versus gaze movement-related activity for some analyses, we calculated a visuomovement index (VMI) for each neuron as the difference of Vis10% and Mov10% divided by their sum, after subtracting off their trial-matched baseline activity. In the rare case where the pretarget baseline activity exceeded either Vis10% or Mov10%, a value of 0 was assigned. Thus, VMI was bound between −1 (pure vision) and +1 (pure movement).
Sampling Gaze, Eye, and Head Positions for Analysis
Eye and head orientations (relative to space) were recorded with a sampling rate of 1000 Hz, and other variables such as the eye orientation relative to the head and eye and head velocities were calculated from these quantities. For movement analysis, the onset of gaze (movement of eye in space) was selected at the time when gaze velocity exceeded 50°/s and gaze offset was marked as time point when velocity declined below 30°/s. Head movement was marked from the onset of gaze till the time point at which head velocity declined below 15°/s. For trials in which the head velocity never exceeded 15°/s, the head position was sampled at the time of the gaze marks.
Canonical Spatial Models Considered in This Study
Figure 4A,C graphically illustrates how we derived the 11 “canonical” models tested in this study. These models provide a formal means for testing between target versus gaze position coding and gaze versus eye versus head displacement/position, with each expressed in several possible frames of reference. Figure 4A shows the different spatial parameters involved in a memory-guided gaze shift. Most importantly, these include Target position (T) and final Gaze position (G). In our preparation, T and G could be expressed in three different reference frames, that is, relative to initial eye (e), head (h), or space/body (s) coordinates (Fig. 4B), resulting in six possible “Target” (Te, Th, and Ts) and “Gaze” (Ge, Gh, and Gs) models, as illustrated in Figure 4C. Other possible effector-specific “Displacement” or “Position” codes in our preparation include dG (final − initial gaze position in space coordinates), eye displacement (dE: final − initial eye orientation in head coordinates), dH (final − initial head orientation in space coordinates), final eye position in head coordinates (Eh), and final head position in space coordinates (Hs). The eye models were based on eye positions sampled at the end of the gaze shift (and thus did not include the VOR phase), whereas the head models included the entire head movement.
Note that some of these models are identical or linearly summate in a 1D analysis, but these mathematical relationships become more complex in 3D, head-unrestrained gaze shifts where one must account for both torsional variations in position (Fig. 1C) and the noncommutativity of rotations (Tweed and Villis 1987; Crawford et al. 1999; Martinez-Trujillo et al. 2004; Keith et al. 2009). For example, in 3D, Te (i.e., target in eye coordinates) and Ge (i.e., final gaze position in eye coordinates) are computed by rotating space-fixed vectors by the inverse of 3D eye orientation in space, rather than subtraction. Nevertheless, some of our models made similar predictions, for example, Ge and dG are nearly identical for gaze shifts up to 30° (Crawford and Guitton 1997; see Discussion), and all were interrelated in some way, so we obtained the largest dataset that we could for each neuron and employed the most powerful statistical approach that we could find to discriminate these models (Keith et al. 2009).
Intermediate Spatial Models
It has been suggested that visual–motor transformations may occur across neurons and through stages that involve intermediate (or hybrid) frames of reference (Stricanne et al. 1996; Avillac et al. 2005; Mullette-Gillman et al. 2005; Snyder 2005). Therefore, in addition to the 11 canonical spatial models described above, we also considered the possibility for FEF neurons coding for intermediate spatial models. Figure 4D provides a visualization of intermediate frames of reference [for a mathematical description of how these models were derived, see Keith et al. (2009)]. It shows the intermediate models between the eye-centered and head-centered frames with 9 intermediary frames of reference between the two canonical models, Te and Th (Fig. 4D, gray-dotted axes between eye and head frames), and two (of the 10) additional steps on either side beyond the canonical models (Fig. 4D, yellow-dotted axes). These additional steps were included (1) to allow for the possibility that individual neurons might encode such abstract spatial codes outside the canonical range (Pouget and Snyder 2000; Blohm et al. 2009), and (2) to avoid misleading edge effects where the preferred spatial models might incorrectly cluster at the canonical models. Just as for Te and Th models in which the activity of the neuron is plotted on all trial-matched target positions relative to eye and head, respectively, in each intermediate model, the activity profile of the neuron is plotted on all trial-matched positions corresponding to that intermediate model (for 50–50 hybrid model, the position relative to the black axis in Fig. 4D is used for the presented trial). Similar intermediate spatial models were calculated for each pair of target models (Te–Th shown in Fig. 4D, Th–Ts, Te–Ts; Fig. 10A,E), gaze models (Ge–Gs, Gs–Gh, Gh–Ge; Fig. 10B,F), displacement models (dG–dE, dE–dH, dH–dG; Fig. 10C,G), and position models (Gs–Eh, Eh–Hs, Hs–Gs; Fig. 10D,H). Each pair is depicted as one of the sides of the triangular representations in Figure 10. Important to note that unlike target- and gaze-related intermediate models which are describing intermediate frames of reference, the intermediate models between displacement and position models (Fig. 10C,D,G,H) are rather abstract and do not have a physically intuitive description, and have not been proposed before; nevertheless, we tested them here for the sake of completion.
In addition to the intermediate model continua described above, we extended our analysis to a continuum between the eye-centered target and gaze models: Te and Ge. Models along this continuum represented intermediate spatial models (same way as the intermediate models described before) between target and gaze models in eye-centered coordinates (see Results; Fig. 10). For instance, the RF model midway between Te and Ge was derived by plotting the trial-matched mid-points between target and gaze positions, as measured from our behavioral data and transformed into eye-centered coordinates.
Experimental Basis for Distinguishing the Models
To test between the models described above, they must be spatially separable, and this must be reflected somehow in neural activity. In our experiment, spatial separation of the models was provided by natural (i.e., untrained) variations in monkey's gaze behavior. For example, the natural variability in accuracy of gaze shifts, especially in memory-guided movements (Fig. 1B), allowed us to distinguish between target coding and coding for final gaze position. The variable contributions of eye and head movement to the gaze shift (Fig. 1C, right panels) allowed us to distinguish between effector-specific parameters (both displacement and final positions), and the distribution of initial 3D gaze, eye, and head orientations allowed us to distinguish between different egocentric frames of reference (although we could not distinguish between the body and space in our body-fixed experimental preparation). The models described in the previous section were computed from target locations and 3D coil signals using mathematical conventions that have been described previously (Tweed and Villis 1987; Crawford et al. 1999; Martinez-Trujillo et al. 2004; Keith et al. 2009). We then assumed (as in most previous studies) that this spatial variability would be reflected in different neural activity if one plots that activity against the correct spatial parameters (Jay and Sparks 1984; Avillac et al. 2005). The logic behind this approach, and the conventions we used to illustrate neural activity from individual trials, is schematically illustrated in Figure 4E. A target (red dot) at the hot spot of hypothetical neuron's RF is shown in the left panel, surrounded by 9 hypothetical gaze end points (black/gray dots). Corresponding neural responses (firing rate represented by the size of circle) are shown in the rightward table, plotted relative to target position (upper row) or relative to final gaze position (lower row). If a neuron is coding for this target location (left column), it would give the same response for each trial and these responses would coherently align in target coordinates (Fig. 4E; upper-left table cell), but would spread apart in gaze coordinates (Fig. 4E; lower-left table cell). If the neuron coded for gaze location, it would produce a different response for each trial, resulting in different (i.e., “spatially incoherent”) responses when plotted in target coordinates (Fig. 4E; upper-right table cell) but a graded hill-like RF when plotted in gaze coordinates (Fig. 4E; lower-right table cell). If a variety of different target positions and gaze end points were illustrated, these would yield four different RFs, with coherent maps in the upper-left and lower-right cells and incoherent maps in the other cells. Similar schematics can be constructed for any of the models considered here, with the prediction that one of these would yield the most coherent RF for the corresponding spatial code. Next, we describe a formal method for testing this on real data.
Spatial Model Analysis for Single Neurons
Our method was an extension of the schematic shown in Figure 4E: We plotted visual and movement RFs from our neurons in the spatial coordinates of each of the canonical (and intermediate) models tested, positioning the neural responses according to the spatial coordinates of the corresponding behavioral data. For visual RF mapping, we used eye and head orientations taken at the time of visual stimulus presentation, whereas for movement RF mapping we used behavioral measurements taken at the start of the gaze shift. (Actual examples of such plots are shown in Figs 5, 7, and 9 of the Results section.) We then computed residuals between the data points and the model fit using Predictive Residuals Sum or Squares (PRESS) statistics (described below), and compared the residuals to determine which model provided the best overall fit (i.e., the best RF representation). The detailed steps of this analysis follow.
Step 1: Nonparametric Fitting of the RFs
Since we did not know a priori the shape of the RF, we used nonparametric fitting to fit the data points. Since the size of the RF was not known and the spatial distribution of the sampled data points was different for different spatial models (e.g., smaller range for head models as opposed to target/gaze models), the nonparametric fits were obtained using Gaussian kernels with different sizes ranging from 2° to 15° bandwidths (14 different fits obtained for each model). This ensured that we are not biasing our fits in favor of a particular size and spatial distribution. Spatial models with smaller spread of positions (e.g., head models) would be fitted better using smaller kernels in comparison with spatial models with larger spread of positions. Furthermore, by virtue of employing a nonparametric fitting approach, the analysis was relatively insensitive to unusual biases and spatial distributions in either behavioral or neural data, or to differences in uniformity of spatial sampling of these data in different coordinate frames [see Keith et al. (2009) for further explanation], though in our dataset most RF representations had a relatively continuous spread of data points due to the variability in our behavioral paradigm.
Step 2: Calculating PRESS Residuals
Once the fits were made to activity profile distributed in different models, the quality of the fit in each model (at all kernel bandwidths) was quantified using PRESS statistics, which is a form of cross-validation (Keith et al. 2009; DeSouza et al. 2011). In short, the PRESS residual for each trial was obtained by removing that trial's data point from the data set, obtaining a fit using the remaining data points, and then taking the residual between the fit and the data point. The model (at the kernel bandwidth) that yielded the smallest mean PRESS residuals (which we referred to as the “best-fit model”) was identified as the best candidate for the neuron's spatial coding scheme. The assumption here was that if a neuron's activity is represented in a model based on the neuron's intrinsic code, the RF should be spatially more coherent than if represented in any other spatial model.
Step 3: Comparison Between Different Spatial Models
Once the best-fit model was identified, its mean PRESS residuals were then statistically compared with the residuals for other spatial models fitted at the same kernel bandwidth using a two-tailed Brown–Forsythe test [see Keith et al. (2009)]. Spatial models that had significantly higher mean PRESS residuals compared with the best-fit model were excluded as candidate coding schemes for that neuron. Similar procedures were used to test for best-fits along the intermediate model continua (Fig. 4D) generated for individual neurons.
For population analysis, we did a statistical comparison of the mean PRESS residuals across the entire neuronal population for different spatial models. For each neuron, PRESS residuals were normalized such that one of the models (here we took Th) had a mean PRESS of 1, so this way the relative goodness of fit between spatial models was preserved across all neurons (DeSouza et al. 2011). Although for some neurons more trials were used for RF analysis, we assigned an equal weight for each neuron for our population analysis, as we did not want the population results to be skewed in favor of neurons with a higher number of trials. As for the single-neuron analysis, the spatial model with the smallest population mean PRESS residuals was the best candidate spatial model describing the population activity. A two-tailed Brown–Forsythe test was performed between the population mean PRESS residuals for this model and the population mean PRESS residuals from other models, and any model with significantly higher mean PRESS residuals (P<0.05) was excluded as a candidate coding scheme for the population activity. Population analysis for the intermediate frames was done in a similar fashion.
We recorded from over 150 sites within the FEF of two rhesus macaques during head-unrestrained gaze shifts. Of these, 64 task-related neurons showed good isolation, and were confirmed to be in FEF using previously established head-restrained stimulation criteria (Bruce and Goldberg 1985). Of those, 57 met all of our criteria for analysis; 8 of which were classified as visual (V; Fig. 3A), 30 were classified as visuomovement (VM; Fig. 3B), and 19 were classified as movement (M; Fig. 3C) neurons. Figure 2 illustrates the anatomic extent of our included sites (filled circles), corresponding head-restrained saccade vectors evoked by head-fixed stimulation of these sites (arrows), other sites also identified as FEF through recording/stimulation (open circles), and the remaining sites that we explored (cross). Similar to previous studies, we found that our stimulation-confirmed FEF sites fell along an arc corresponding to shape and stereotaxic coordinates of the arcuate sulcus. In the majority of stimulation sites, the evoked saccades resembled a fixed vector to the contralateral side relative to the point of fixation. Stimulation at the most lateral sites (“small-saccade FEF”) evoked saccades as small as 2°, whereas at the most medial sites (“large-saccade FEF”) evoked saccades as large as 20–25°, which would likely correspond to much larger gaze shifts in the head-unrestrained condition (Martinez-Trujillo et al. 2004; Monteon et al. 2010). Also, as shown previously, we found that neurons on the lateral end of the FEF typically had small, bound (closed) RFs and those on the medial end of the FEF typically had large, unbound (open) RFs. Of the 38 neurons with visual responses, 23 had closed RFs and 15 had open RFs. Movement RFs were generally broader than visual RFs even within single VM neurons. Of the 49 neurons with movement responses, 30 had open RFs and 19 had closed RFs (15/19 M—cells had open RFs).
We separately analyzed visual and movement responses. For visual analysis, we fitted activity during the visual epoch (Fig. 3, blue window, on the left) and for movement analysis, we fitted activity during the movement epoch (Fig. 3, blue window, on the right) and full movement burst (Fig. 3, vertical lines, right column). Since we have variable initial gaze positions in our experimental paradigm, we also tested for gaze position-dependent modulation (i.e., gain field) effects so we could remove them before performing our residual analysis. However, in this study, we did not find significant gain field effects unlike the only other study that used this method in the SC (DeSouza et al. 2011). The following sections describe our results first for the visual, and then movement fits.
Analysis of Canonical Models
The RF analysis for an example V neuron is depicted in Figure 5. As described in Methods, the activity profile of the neuron (spike count in sampling window; Fig. 5A) was plotted in all 11 canonical representations and then fitted using different Gaussian kernels ranging from 2° to 15° bandwidths, and the quality of fit in each representation (at each kernel bandwidth) was quantified using PRESS residuals (Fig. 5B). For this neuron, the lowest PRESS residuals were obtained when the activity profile was distributed across target positions in eye-centered coordinates (i.e., Te), fitted with a Gaussian kernel of 4° bandwidth. Therefore, Te was the best-fit model for this neuron. Statistical testing (Brown–Forsythe test) between the PRESS residuals of Te and PRESS residuals of all remaining models at this kernel bandwidth (P-values shown in Fig. 5C) showed that all remaining models have significantly higher PRESS residuals compared with Te, leaving Te as the only candidate coding scheme and reference frame for this neuron.
It is also possible to visualize these trends intuitively. The RF plots of this V neuron are shown in three of the 11 representations: Ts, Te, and Ge (Fig. 5D–F). In Ts model, the activity profile of the neuron (firing rate in the sampling window for each trial is represented by size of circle) is distributed on a map determined by the angular direction of the targets as appeared on the screen (i.e., space-centered). The color field represents the nonparametric fits made to these data for the optimal kernel bandwidth. Note that the Ts model provides a rather poor description of variability in neuronal activity as indicated by a high degree of activity variability (circle size) for a given point on the map (i.e., low coherence), and also by the relatively large size of the residuals shown at the bottom of the panel (Fig. 5D, similar to Fig. 4E, top-right cell in the right panel). In contrast, when the RF was represented in its eye-centered counterpart (Te; best-fit model), like-sized circles clustered together (i.e., high coherence) and the residuals were much smaller (Fig. 5E). Note that Ts and Te are both spatial models based on target position and only differ in their frame of reference (eye vs. space). Putting the data in the “correct” frame of reference was not enough to obtain these results: for example, when the same data are mapped according to the Ge model (Fig. 5F), which is an eye-centered map based on gaze end points, the RF again becomes incoherent and the residuals are higher. Thus, the optimal RF map of a neuron is only obtained when the correct spatial code (e.g., target as opposed to gaze) and the correct frame of reference (e.g., eye as opposed to space frame) are used.
Figure 6A shows these results for all 8 of our V neurons, showing the percent of neurons with only a particular model as the sole candidate for the spatial code (black), neurons for which a particular model was the best model but at least one other model was not significantly ruled out (red), neurons for which a particular model was not preferred but was also not significantly excluded (yellow), and neurons for which a particular model was statistically eliminated (gray). As one can see, most V neurons showed a preference for Te, and in two (25%) of these neurons all models including eye-centered gaze models, dG and Ge, which are spatially similar to Te, were significantly eliminated. In another two of the 8 neurons, eye (in head) displacement (dE) or Ge was preferred, though Te remained as the candidate coding scheme. Therefore, in our visual population, there was a relatively strong preference for Te when compared with other models while the head-related models (dH and Hs) and Gs were eliminated for most neurons even at the individual-neuron level.
Figure 7 illustrates our analysis for an example VM neuron using the same conventions as in Figure 5, except this time only showing RF maps in the original spatial target (Ts) frame (Fig. 7D) and the model that provided the best fit (Fig. 7E). The representation that yielded the best fit (i.e., smallest PRESS residuals) was the Te model fitted with a Gaussian kernel of 3° bandwidth, but for this neuron the two eye-centered gaze models (Ge and dG) were not statistically ruled out (Fig. 7C). Te was the best-fit model for most (23/30) of our VM neurons, but only for one of these neurons all other models were eliminated. A few (7/30) neurons preferred other models that were spatially similar to Te (i.e., dE, Ge, dG, and Th), but the remaining models (dH, Hs, Eh, Gs, Gh, and Ts) were never preferred (Fig. 6B).
The results reported so far are for single neurons; however, it is important to know how neurons behave as a population. For visual population analysis, V and VM populations were separately analyzed (Fig. 8A,B). In both populations, Te was the best model describing population activity as the population mean PRESS residuals were the lowest for this model. However, more movement codes (Ge and dG) were significantly ruled out in the V compared with the VM population. Due to similarity in the overall trend between the two visual populations, we also combined them for the statistical analysis illustrated in Figure 8C. This analysis confirmed the preference for Te that was observed in many individual neurons. All other models were significantly excluded as candidate coding schemes for the visual population (<10−4, Fig. 8C), with the exception of the two eye-centered gaze models (Ge and dG, P = 0.075 and 0.051, respectively, Brown–Forsythe test). Thus, we have eliminated space-centered and head-centered models as well as eye or head movement-related models for the visual response.
Movement activity was quantified using both a neuron-specific window that included the full movement-related burst as well as a fixed temporal window (−50 to +50 ms relative to gaze saccade onset; rationale for this window described in Methods). Since (as we shall see) the full burst sometimes provided better separation between models, but otherwise both analyses yielded very similar results, the full burst was used as our default (i.e., in Figs 6 and 9–11).
Figure 9 shows the RF analysis for two example movement responses, one VM neuron (with a small and closed RF) and one M neuron (with large and open RF), using the same conventions used in Figure 5. For the VM neuron, the best-fit model was Ge fitted with a Gaussian kernel of 4° bandwidth. Once again, in the Ts plot, which shows the activity profile distributed over targets as appeared on the screen (i.e., space coordinates), there is a huge variability in neural activity for a given location for both neurons. But this time, unlike the visual response examples, the neuron's movement activity was best described by Ge: a model based on final gaze positions relative to the initial 3D eye orientation (i.e., eye coordinates; Fig. 9E). Statistical comparison between the best model (Ge) and all remaining models (Brown–Forsythe test) for the VM neuron (Fig. 9C) eliminated most models as candidate coding schemes (P < 10−5), with the exception of Te, dE, and dG. The dG RF plot (not shown) looked very similar to the Ge plot shown in Figure 9E, whereas the others diverged commiserate with their statistical ranking.
This example neuron was representative of most of the movement responses in our 30 VM neurons (Fig. 6C), which showed a distribution of preferences mainly among the eye-centered gaze (dG and Ge), and target (Te) models. Occasionally, other models were preferred, but this preference was never significantly greater than the gaze-related models. In some cases, Ge and dG were the only two candidate models, but these two models could not be separated from each other. The head-related models (dH and Hs), space-centered models (Gs and Ts), and head-centered target model (Th) were significantly eliminated in most neurons.
Similar trends were observed for M neurons, which often showed large open RFs. Figure 9F–J shows the RF analysis for an M neuron with such a field (incidentally, the behavioral data corresponding to this neuron are presented in Fig. 1B,C). Once again, the representation resulting in the lowest PRESS residuals and the most coherent RF map was the Ge representation (fitted with a Gaussian kernel of 5° bandwidth), though Te and dG (which both were very similar to Ge) remained as candidate coding schemes. In contrast, head-centered and space-centered models, as well as eye and head movement-related models, were eliminated as candidate coding schemes for this neuron (Fig. 9H). Across our 19 M neurons (Fig. 6D), dG, Ge, and Te were most preferred, whereas the head models (dH and Hs) were eliminated in most neurons (even with the prolonged burst accompanying the full head movement included in the analysis). Importantly, the gaze models (Ge and dG) were never excluded for any movement response even if these models did not yield the best fit.
When the two movement populations were pooled together (Fig. 8F), Ge and dG provided the best fits, whereas all other models (with the exception of Te and dE; P = 0.17 and 0.051, respectively, Brown–Forsythe test) were eliminated. Separate population analysis of VM and M neurons also provided very similar results (Fig. 8D,E). Similar to the visual population, all head- and space-centered models, as well as dH, and effector position models were significantly ruled out (P < 0.0001, Brown–Forsythe test). Noteworthy that we obtained essentially the same results using the fixed window in our movement epoch (−50 to +50 ms relative to gaze onset) with head models based on head movement during the gaze saccade (Fig. 8D–F, gray open circles) and full-burst window with head models based on full head movement (Fig. 8D–F, black circles).
Several other variations of the analysis were attempted. We categorized our neurons based on whether they had open or closed movement RFs, but did not find any notable difference between these subpopulations. We also repeated our analysis for each neuron only on the subset of trials in which the head contribution to gaze was at least 2° visual angle. This served to account for the possibility that some cells may exhibit different spatial codes depending on whether the head contributes to the gaze shift or not. In this analysis, once again Ge and dG were among the best models while head- and space-centered models, and head-related models (dH and Hs), were among the poorest models both at the single-neuron and population levels (results not presented).
Intermediate Spatial Models
So far, we have only tested between the 11 canonical spatial models described above. However, we also performed an intermediate model analysis to account for the possibility of spatial coding in intermediate frames of reference. Figure 10 depicts the distribution of the best-fit intermediate models (denoted by circles with diameter corresponding to the neuron count) for visual (Fig. 10, left column) and movement (Fig. 10, right column) activities across the tested intermediate models (see Methods for description of these models).
The results from this analysis revealed that there was a tight clustering of best-fit models (i.e., circles) around the Te model for visual activity, irrespective of neuron type (i.e., V vs. VM). Specifically, in 32/38 RFs, the intermediate models spatially closest to Te were the best-fit (Fig. 10A). The overall best-fit model for the visual population (i.e., the model giving rise to the lowest overall residuals) was located at an intermediate model near Te (Fig. 10A; purple square). None of the other canonical spatial models were contained within the 95% confidence interval (Fig. 10A–D, yellow highlights). However, there were several “outliers” from this range: some neurons showed their best-fit model closer to other canonical models, and some had best-fit model that was drawn away from head- and space-centered models even more than Te (so the best-fit model fell beyond Te away from Th and Ts; Fig. 10A). This is thought to arise when behavior is determined by the overall balance between members of the neuronal population, rather than individual neurons (Pouget and Snyder 2000; Blohm et al. 2009).
The movement responses did not show as tight clustering as the visual responses, showing a confidence interval spread across several of the intermediate frame continua that we constructed (Fig. 10E–G), and again with some individual outliers placed either in other continua or beyond these intermediate continua. But, in contrast to visual responses, the majority of movement responses had their best RF representation (i.e., best fit) among the gaze-related intermediate models and largely clustered around the Ge model (Fig. 10F). Some neurons, however, had their best-fit model around Te (Fig. 10E) or along the dG–dE continuum (Fig. 10G). Neurons with best-fit models near dG tended to be shifted toward dE, most likely because of the high degree of similarity between dE and dG in gaze shifts from a central range (Freedman and Sparks 1997a). VM and M neurons had a similar distribution of best-fits along these continua. In confirmation of the aforementioned results for canonical models, none of the neurons had their best-fit around dH or any effector position model (Fig. 10G,H). The overall best-fit for the movement population fell around Ge, and head- and space-centered canonical models, as well as dH and effector position models, were not contained within the confidence interval (i.e., these were significantly ruled out).
In summary, our intermediate model analysis showed that, despite variability in the position of the best-fit models within the population, they are not distributed haphazardly throughout this map (in Fig. 10) but are rather clustered around the eye-centered canonical models, namely Te or Ge/dG in agreement with the population analysis shown for the canonical spatial models (Figs 6 and 8). Importantly, there is an overall shift from Te clustering in the visual response (Fig. 10A) toward clustering about the eye-centered models derived from final measured gaze position (Ge and dG) in the movement response (Fig. 10F,G). These analyses suggest that (1) all neuronal populations in the FEF show a clear preference for an eye-centered code and (2) visual and movement responses are not only temporally locked with the stimulus and gaze shift (by definition), they also show different spatial codes; specifically, target versus gaze coding.
Target–Gaze Continuum Analysis
To summarize the results so far, most of the candidate spatial models (e.g., all models involving head control and all head- and space-centered models) have been eliminated. The dominant models at the population level—Te for the visual response (Fig. 8C) and Ge/dG for the movement response (Fig. 8F)—were all eye-centered and suggest a shift from target coding to gaze coding in the visual-to-movement responses. However, we have not yet demonstrated this distinction at a statistical level, or examined whether it also emerges at the level of individual neurons with both visual and movement responses. Also we wished to investigate whether neurons in the population exhibit a graded, as opposed to bimodal, preference for target versus gaze coding. To address these questions, we elaborated our intermediate model analysis to include a new continuum between (and beyond) target and gaze models (Fig. 11). We chose Te to represent target models because it was the clear “front runner” for the visual response analysis, and we chose Ge to represent the gaze models because it uses the same frame of reference as Te and yields the same results as the other front runner (i.e., dG). The resulting Te–Ge continuum was constructed based on 10 equally spaced intervals between Te and Ge (in eye coordinates), and 10 intervals extended on either end (see Methods). As an example of an intermediate model between Te and Ge, the RF plot based on the model denoted as “0” along this continuum would be obtained from activity profile distributed across the mid-points between target and final gaze positions.
First, we plotted the VMI (see Methods for calculation) of each neuron as a function of best-fit location along the Te–Ge continuum described above (Fig. 11A, top panel). There was no significant correlation between VMI and spatial coding of FEF neurons within visual responses (R = 0.63; P = 0.08, linear regression) or movement responses (R = −0.15; P = 0.28, linear regression); however, when examined across the range, one can see a trend for responses from V neurons (red) to mainly fall in the lower-left corner (preference for a model close to Te) and responses from M neurons (black) to mainly fall in the upper-right corner (preference for a model close to Ge), with VM responses (pink and gray) perhaps falling in the intermediate region.
Figure 11A also provides frequency histograms showing the distribution of best-fits for visual responses (middle panel) and movement responses (lower panel) along the Te–Ge continuum. The distribution of best-fits along this continuum was compared for each cell population and subpopulation (Kruskal–Wallis test followed by Mann–Whitney U-test with Bonferroni correction). There was no significant difference between the two visual populations (P = 0.67) or between the two movement populations (P = 0.57). However, comparison across activity types revealed a significant difference between the best-fit distributions of visual and movement responses in the FEF (P = 0.000134, Mann–Whitney U-test).
This difference was also evident when paired comparisons were made between the visual and movement activity of single VM neurons. This was tested by performing a neuron-by-neuron comparison of visual versus movement best-fit locations along the Te–Ge continuum (Fig. 11B). Most individual VM neurons showed a shift along the Te–Ge continuum (toward Ge) between their visual and movement responses. This shift was statistically significant for the population (P = 0.0016, Wilcoxon test). Thus, this analysis provided a neuronal correlate for sensorimotor transformation not between population of neurons within the FEF, but also within individual VM cells.
This study is the first to directly test between the entire possible set of visuospatial and gaze movement representations within the FEF in the same dataset during naturally variable head-unrestrained gaze shifts. Our results eliminate a number of candidate models (at least within the task parameters that we used), and provide clear evidence for FEF being involved in a spatiotemporal transformation of sensory into movement representations, within an eye-centered coordinate frame. Specifically, we have shown that, in the temporal gap between its visual and movement responses, the FEF (in conjunction with its network connections) transforms the location of visual targets into gaze movement commands, both at the cell population level and at the single-neuron level.
Visual Versus Movement Spatial Coding in the FEF
Our results suggest that early visual activity (80–180 ms after target onset) in the FEF codes for the spatial location of visual stimuli in eye-centered coordinates (Te). This fits well with the documented literature on visually evoked responses in the FEF that suggests the early visual response in the FEF is involved in visual detection of stimuli regardless of their task relevance (Thompson and Schall 1999), and FEF serving as an attention priority map (Thompson and Bichot 2005). These factors were not directly tested in our paradigm, but it makes sense that these target-related computations would be done with target represented in an eye-centered reference frame (Te), rather than a movement code in some other frame. V and VM subpopulations show similar projections to their downstream structures including the SC and the pons (Segraves 1992; Sommer and Wurtz 2000) and overall showed similar results in our analysis. However, these subpopulations have different biophysical and morphological properties (Cohen et al. 2009), and so might be expected to show different codes. When they were separately analyzed (Fig. 8A,B), the V population showed a stronger preference for Te, perhaps suggesting a more direct or exclusive visual input.
Sato and Schall (2003) showed that, in an antisaccade task (where the gaze is opposite to the target location), the visual response of about one-third of visually responsive FEF cells (type II cells; Sato and Schall 2003) codes for the location of the saccade rather than the visual target. We found a minority of individual visually responsive neurons (22%) that showed a preference for gaze parameters, but this preference over target coding never reached significance; so we cannot claim that this explains Sato and Schall's (2003) results. The other explanation, which we prefer, is that all FEF visual cells encode Te, but in some (type II), this can undergo a cue-dependent transformation to encode a mentally reversed target representation (Zhang and Barash 2000; Medendorp et al. 2005; Amemori and Sawaguchi 2006; Fernandez-Ruiz et al. 2007; Collins et al. 2008).
In contrast to the clear-cut visual target code observed in FEF visual activity, the movement activity of both VM and M neurons showed a somewhat more distributed coding scheme (Fig. 10), but overall preferentially coded for final gaze position relative to initial eye orientation (Figs 8 and 10) with a significant shift away from a target code toward a gaze scheme (Fig. 11). It has been shown that FEF predominantly projects to subcortical structures in the brainstem (mainly SC and pons; Schiller et al. 1979; Stanton et al. 1988; Dias and Segraves 1999), but also sends projections to the early visual areas (such as area V4; Stanton et al. 1995; Moore and Armstrong 2003). We did not test where these neurons project to, so we cannot be certain that all FEF movement signals analyzed in this study represent the output of FEF to downstream brainstem structures that in turn drive the gaze shift. But given that the majority of movement responses in our sample preferred gaze coding, it is fair to assume that the subcortical projection also predominantly contains a gaze code.
Similar to previous studies in head-unrestrained conditions (Bizzi and Schiller 1970; DeSouza et al. 2011; Knight 2012), the movement responses in some of our neurons were relatively prolonged compared with head-restrained conditions. Although we cannot preclude the possibility that some of this late movement response contained efference copy signals from downstream structures (Bruce and Goldberg 1985; Sommer and Wurtz 2004), we took several precautions to minimize the inclusion of such signals and to mainly include signals that are more likely to contribute to the gaze movement. First, we only included cells with clear pre-saccadic activity, and eliminated neurons with activity starting after saccade onset. Secondly, we also did our analysis on a perisaccadic time window that only included responses up to 50 ms after saccade onset. Since the latency for FEF signals arriving at eye muscles is 20–30 ms (Hanes and Schall 1996), and saccades in our dataset were on average 140 ms long, it is likely that at least this early movement activity of these neurons directly contributed to the gaze shift. Importantly, we did not see a change in the preferred coding scheme when this earlier perisaccadic window was used instead of the full movement burst.
The preference for a gaze code in the movement response is consistent with previous antisaccade studies, which suggested that movement activity of FEF neurons mainly codes for the direction of the saccadic eye movement rather than the location of the visual stimulus (Everling and Munoz 2000). Our study strengthens this conclusion, because it does not have the interpretive limitations of the antisaccade task. First, although the antisaccade task is successful in introducing a spatial dissonance between the sensory and movement components of the task, it requires nonstandard spatial transformations that have been shown to engage quite different patterns of neural activity throughout the cortex (Sweeney et al. 1996; Grunewald et al. 1999; Matthews et al. 2002; Brown et al. 2007; Hawkins et al. 2013). Furthermore, it is possible that, in this task at some point within the visuomotor pathway, the representation of the target is reversed before generating a gaze command (Zhang and Barash 2000; Medendorp et al. 2005; Fernandez-Ruiz et al. 2007). Therefore, this technique by itself does not allow for the distinction between gaze and target coding. In this study, using our RF mapping method, we dissociated between the sensory and movement coding in a task that involved standard spatial transformations, and we found that the movement response preferentially codes for the final location of the gaze. There was no notable difference in spatial coding between the movement responses of VM and M populations. This, however, does not suggest that these neuron types necessarily have the same function in the gaze system (Ray et al. 2009).
It is often assumed that gaze inaccuracies can be attributed to noise arising somewhere within the visuomotor pathway (Churchland et al. 2006; Faisal et al. 2008). The errors observed in our memory-guided delay task could not arise solely from noise in downstream transformations, because this would not cause the preference for gaze coding over target coding that we observed in our FEF movement responses. This finding suggests that at least some of the neural noise contributing to our gaze errors arose from nonvisual activity occurring in the interval between the visual and movement burst. This could include noise arising from functions such as target selection and attention (Basso and Wurtz 1997; Platt and Glimcher 1999), motor functions such as the mechanisms that trigger saccades at the go-signal (Churchland et al. 2006), and the cumulative noise expected to arise in the recurrent connections required to maintain working memory in the stimulus–response interval (Compte et al. 2000; Chang et al. 2012; Wimmer et al. 2014). The latter part of this proposal is consistent with the notion of memory-based spatial transformations within the FEF (Gnadt et al. 1991).
Our results strongly suggest that the FEF movement activity is coding for the gaze movement vector rather than the eye and head components of gaze independently. Among the effector-related models, gaze-related models (Ge and dG) were clearly preferred, with all other models statistically eliminated at the population level, with the exception of the eye-in-head displacement model (dE), which was marginal. The latter is likely because dE becomes very similar to dG when head movements are not very large (Freedman and Sparks 1997a), which was often the case in the gaze shifts that we tested, especially for “near” RFs.
In an early study by Bizzi and Schiller (1970), neurons were found in the FEF that discharged exclusively during horizontal head movements, but in this study spatial coding of neurons was not analyzed (Bizzi and Schiller 1970). The only other study to address the question of gaze versus eye or head coding in the FEF during head-unrestrained gaze shifts (Knight 2012) suggested that about half (13/26) of the neurons in the dorsomedial FEF code for the head movement amplitude during the saccade. However, we did not find any neuron, including FEF neurons in the most medial portion of our recorded sites (Fig. 2), coding for head position or displacement, and such head-related models were always excluded at the population level. This difference could be due to task differences. In our paradigm, gaze shifts were initiated from a central range of positions and made toward the full 2D range of the RF, we relied on endogenous variability to dissociate between eye and head contributions, accounted for noise related to initial 3D eye and head orientations, and employed a statistical analysis that made no assumptions about linearity. Knight (2012) performed a 1D linear analysis based on a paradigm that dissociated between head and gaze by comparing similar-sized gaze shifts starting from different initial gaze positions (which correlates with different head contributions to gaze; Freedman and Sparks 1997a, 1997b; Knight 2012). Some of our neurons might have also coded for head movement in this paradigm; one cannot say without testing this. However, there is evidence that eccentric gaze positions are associated with head position signals (Monteon et al. 2012), and gaze position-dependent gain field modulations become more prominent at eccentric gaze positions (Andersen and Mountcastle 1983; Cassanello and Ferrera 2007; Knight 2012). Therefore, linear correlation might conflate such signals with head movement signals in a paradigm where initial gaze/head orientation correlates with head contribution to gaze. Finally, we recorded from approximately twice as many neurons and on average analyzed approximately 10 times the number of trials for each neuron.
Although position-dependent gain fields have been previously reported in the FEF in both head-restrained and head-unrestrained conditions (Cassanello and Ferrera 2007; Knight 2012), in our dataset we did not detect significant gaze position-dependent effects. This is most likely because the initial range of positions in our dataset was not optimized to detect gain fields. It is possible, however, that undetected gain field modulations can account for a proportion of the noise in our RF fits, along with other nonspatial factors that we did account for such as trial-to-trial variations in attention and motivation (Basso and Wurtz 1997; Platt and Glimcher 1999).
Our data agrees with most studies that quantified eye–head coordination in gaze shifts evoked during FEF stimulation. Other than some exceptions (Chen 2006), the majority of FEF microstimulation studies support a gaze code (Guitton and Mandl 1978; Tu and Keating 2000; Elsley et al. 2007; Knight and Fuchs 2007; Monteon et al. 2010, 2013). Overall, these studies suggest that the default mechanism for decomposing gaze commands into separate eye and head commands resides in the brainstem/cerebellum (Segraves 1992; Paré and Guitton 1998; Quaia et al. 1999; Isa and Sasaki 2002; Sparks 2002; Klier et al. 2007; Gandhi and Katnani 2011), but other cortical neurons appear to modulate this mechanism so that the head can contribute differently to gaze in different contexts (Constantin et al. 2004; Monteon et al. 2012).
Reference Frames: Eye-Centered Dominance in the FEF
To our knowledge, this is the first single-unit study to address the question of reference frame coding in completely head-unrestrained gaze shifts. Our results point to the dominance of eye-centered coding in the FEF. Our population analysis excluded all models that relied on a head-centered or space/body-centered frame of reference (whether for coding target, gaze, eye, or head motion). This was most clear-cut in the visual code, where every model but Te was statistically eliminated. The preferred movement codes (Ge and dG) are also eye-centered in the sense that they share a coordinate origin at the fovea. The difference between these models is that Ge (final gaze position relative to the eye) has coordinate axes fixed on the retina (eye is a sphere with rotational geometry), whereas dG (gaze displacement; same as gaze position in fixation-centered coordinates) has coordinate axes fixed at the point of fixation in space (Crawford and Guitton 1997). Unfortunately, we never found a statistical preference for Ge versus dG for individual neurons or our populations, so we cannot exclude the possibility that either or both are used in the FEF. It is likely that we could not discriminate between these models because (1) our method works best at discriminating very similar models in neurons with small, closed RFs, (2) the geometric differences between dG and Ge only become pronounced for very large gaze shifts (Klier et al. 2001); and (3) in this range, we only recorded large, open movement RFs. In theory, a Ge movement code would simplify transformations both from the Te visual code and into the retina-centered codes reported in the SC (Klier et al. 2001; DeSouza et al. 2011). In contrast, dG would require position-dependent transformations between these codes, at least for very large saccades (Crawford and Guitton 1997), but might be more appropriate to drive reticular formation burst neurons. However, testing between these possibilities would require some more experimental design, such as measuring the RFs from consistently deviated torsional eye positions (Daddaoua et al. 2014). In general, our eye-centered results (Te for visual response and Ge/dG for movement response) agree not only with unit-recording studies in the FEF in head-restrained animals (Bruce et al. 1985; Russo and Bruce 1994; Cassanello and Ferrera 2007), but also in most other visuomotor areas (Mays and Sparks 1980; Colby et al. 1995; Russo and Bruce 1996; Tehovnik et al. 2000; Avillac et al. 2005; Mullette-Gillman et al. 2005; DeSouza et al. 2011). This suggests that if any transformations into other head- or space-centered codes occur, they occur downstream from the FEF.
One way to examine this question is through microstimulation. Microstimulation of the FEF in head-unrestrained conditions yields a continuum of eye-centered to head-centered gaze output, depending on the site of stimulation (Monteon et al. 2013). This does not necessarily contradict our current results. Theoretical studies suggest that visual RFs reveal the frame of the sensory input to the neuron, whereas microstimulation reveals the frame of reference of the target neuron population that is activated by the output of that same area (Smith and Crawford 2005; Blohm et al. 2009). Since the FEF projects to both the SC and reticular formation (Segraves 1992; Freedman and Sparks 1997a; Paré and Guitton 1998; Isa and Sasaki 2002; Sparks 2002; Stuphorn 2007; Walton et al. 2007), and the latter may control eye and head motion using a combination of head- and body-centered frames (Klier et al. 2007), it is not implausible that FEF neurons with eye-centered activity might influence head-unrestrained gaze behavior in multiple frames (Martinez-Trujillo et al. 2004; Monteon et al. 2013).
Role of FEF in Spatial Transformations for Gaze Control
The schematic model in Figure 12 summarizes our findings and the conclusions discussed above. Visual input into the FEF, encoding target location in eye coordinates, is sent to FEF from parietal and temporal areas such as the LIP and extrastriate visual cortex (Schall et al. 1995). This would give rise to the eye-centered target code (Te) observed in our visual data. In our memory-guided task, this target location signal enters a recurrent visual working memory network [comprised of structures such as the posterior parietal cortex (PPC), FEF, and dorso-lateral PFC; Fuster and Alexander 1971; Ikkai and Curtis 2011; Funahashi 2013]. As noted above, noise in these recurrent connections and possibly other cognitive/motor functions likely causes the divergence between the visual and movement signals (Compte et al. 2000; Faisal et al. 2008; Shenoy et al. 2013). The findings that the FEF (and likely other cortical gaze control structures) uses simple eye-centered visual and movement codes suggest that this is advantageous for the other cognitive functions they control, and conversely, that the complexity of these structures is related to additional functions rather than reference frame transformations (Olson and Gettner 1995; Cohen and Andersen 2002; Hutton 2008; Purcell et al. 2012; Schall 2013).
Finally, our data suggest that, in behaviors that we tested, the gaze-related output of the FEF is decomposed by default into separate signals for eye and head control downstream—each with their own reference frame transformations. This, however, does not preclude a role for frontal cortex in eye–head coordination during more complex context-dependent behaviors (Constantin et al. 2004; Monteon et al. 2012). Thus, this model fully accounts for the visuospatial transformations performed by FEF, at least within a set of circumstances similar to those studied here.
Is this model of visuomotor transformation unique to the FEF? In other words, is this transformation happening only within the FEF? We think that this is unlikely as similar neural response types and spatial codes have been observed in related structures such as the SC (Schlag-Rey et al. 1992; Munoz and Wurtz 1995; Freedman and Sparks 1997a; Everling et al. 1999; DeSouza et al. 2011), PPC (Gottlieb and Goldberg 1999; Steenrod et al. 2013), PFC (Funahashi et al. 1991, 1993), and other cortical and subcortical areas (Schlag and Schlag-Rey 1987; Watanabe and Funahashi 2012; Funahashi 2013). Therefore, the transformation reported in this study might be occurring concurrently in a distributed network of interconnected structures, and not solely performed within the FEF (Paré and Wurtz 2001; Wurtz et al. 2001; Munoz and Schall 2004). However, this question can only be answered by similar testing in all of these structures.
This project was supported by a Canadian Institutes for Health Research Grant. A. Sajad was supported by an Ontario Graduate Scholarship. J.D. Crawford was supported by the Canada Research Chair Program. Funding to pay the Open Access publication charges for this article was provided by Canadian Institutes for Health Research.
Conflict of Interest: None declared.