Decisions based on uncertain information may benefit from an accumulation of information over time. We asked whether such an accumulation process may underlie decisions about the direction of motion in a random dot kinetogram. To address this question we developed a computational model of the decision process using ensembles of neurons whose spiking activity mimics neurons recorded in the extrastriate visual cortex (area MT or V5) and a sensorimotor association area of the parietal lobe (area LIP). The model instantiates the hypothesis that neurons in sensorimotor association areas compute the time integral of sensory signals from the visual cortex, construed as evidence for or against a proposition, and that the decision is made when the integrated evidence reaches a threshold. The model explains a variety of behavioral and physiological measurements obtained from monkeys.
A hallmark of higher brain function is a capacity to act and sense on different time scales. This capacity allows organisms to acquire sensory information from one or more sources, combine information across time, and alter their behavior some time later, perhaps contingent on other intervening input. This contrasts with simpler reflexive behaviors in which sensory information precipitates an immediate motor response. From this perspective, the capacity for higher brain function would seem to rest on the brain’s ability to accumulate, combine and preserve information over time.
The persistent spike activity that is commonly observed from neurons in the association cortex is thought to provide a neural substrate by which information can span such time gaps. It has been implicated in working memory (Fuster and Alexander, 1971; Fuster, 1973; Miyashita and Chang, 1988; Funahashi et al., 1989), motor planning (Evarts and Tanji, 1976; Bruce and Goldberg, 1985; Gnadt and Andersen, 1988) and decision making (Kim and Shadlen, 1999; Shadlen and Newsome, 2001; Hernandez et al., 2002). A reasonable hypothesis is that such persistent activity represents the outcome of a process of integration with respect to time — that is, the accumulation and storage of information. By analogy to neural integrators in the brainstem that convert eye velocity to position (Robinson, 1989; Fukushima et al., 1992), our hypothesis is that cortical neural integrators combine evanescent sensory data to generate an evolving conception about the state of the world, which can then be used to plan appropriate behaviors.
In this paper we explore the idea that the accumulation of sensory information underlies a simple perceptual decision. We review evidence suggesting that rhesus monkeys make decisions about the direction of random dot motion by integrating, with respect to time, the information they receive through direction-selective neurons in the visual cortex. We test this idea by developing a model of a neural circuit that we believe may perform such a computation. The model encompasses three stages of neural processing: (i) the representation of visual motion by ensembles of direction-selective neurons; (ii) the representation of a decision variable by ensembles of neurons that accumulate input from the first stage; and (iii) a comparison to threshold, which terminates the process and determines the model’s choice. The neurons that comprise the first stage are based on the known properties of neurons in area MT. The second and third stages of the model predict the response properties of neurons in LIP and behavioral measurements obtained from monkeys. Our results show that temporal integration can explain a fairly wide range of behavioral and physiological observations, suggesting that this simple computation may play an essential role in sensorimotor decisions.
The processes underlying sensorimotor decisions in humans and monkeys have been investigated using a motion discrimination task (Fig. 1) (Britten et al., 1992; Britten, 2003). The sensory stimulus in the task consists of a dynamic display of dots which appear and disappear at random locations within a 5–10° circular aperture. A fraction of these dots are displaced at some fixed offset to impart an overall sense of motion in one direction or its opposite (e.g. left versus right). The subject’s goal is to discriminate between these two alternatives. Rhesus monkeys are trained to indicate their choice of direction by making a saccadic eye movement to one of two targets, corresponding to the two motion directions. If the fraction of moving dots, termed the percent coherence, is sufficiently high, it is easy to identify the correct direction. By controlling the percent coherence, the task can be made arbitrarily difficult or easy.
In this study we refer to two versions of the random dots task. In the reaction time (RT) task (Fig. 1A), subjects are allowed to report their decision as soon as they are ready, once the random dots appear. In the fixed duration (FD) task (Fig. 1B), subjects view the random dot motion stimulus for a specified duration (usually 1 or 2 s) that is under the control of the experimenter. In the FD task, motion viewing is typically followed by a memory/delay period before the subject is allowed to respond.
The motion information in the random dots task is represented by direction selective neurons in the extrastriate visual cortex, especially area MT (also known as area V5). When random dot motion appears in the receptive field of an MT neuron, there is a large initial burst followed by a sustained response whose magnitude depends on the strength and direction of motion (Fig. 2A). Evidence from microstimulation, lesion, and neural recording experiments demonstrate that such responses underlie the monkey’s judgment in the random dots task (Newsome and Paré, 1988; Salzman et al., 1990; Britten et al., 1992, 1993, 1996; Bisley et al., 2001; Britten, 2003). In particular, the animal’s ability to discriminate weak motion appears to be limited by variability in the response of MT neurons (Parker and Newsome, 1998). The responses of MT neurons provide the evidence upon which the monkey bases its decision about direction, but they do not gather this evidence together to form a decision. Their activity fluctuates with each passing random dot and the evidence they provide is as evanescent. To reach a decision, other neurons must ‘read out’ the evidence from area MT.
Less is known about how sensory responses are read out and interpreted in order for the brain to reach a categorical decision. In the motion discrimination task described above, a decision about motion is linked to an eye movement response. Therefore, neurons that represent a plan to move the eyes to a particular place in space are potential sites for the representation of a decision about motion direction. Such neurons have been found in the parietal cortex, prefrontal cortex and superior colliculus (Wurtz and Albano, 1980; Gnadt and Andersen, 1988; Funahashi et al., 1989; Glimcher and Sparks, 1992). These neurons exhibit persistent activity when the monkey is instructed to plan an eye movement to a restricted part of the visual field, termed the response field (RF) (Fig. 2B). In the random dots task, these neurons have been shown to represent information about the monkey’s decision (Shadlen et al., 1996; Kim and Shadlen, 1999; Horwitz and Newsome, 2001; Shadlen and Newsome, 2001; Roitman and Shadlen, 2002). Here, the direction of motion effectively instructs an eye movement to one of a pair of choice targets (Fig. 1). When the monkey decides that the direction is the one indicated by an eye movement to the RF, the spike rate gradually increases (Fig. 2C). For the opposite decision, the spike rate declines during the period of motion viewing. Such activity represents the evolving plan of how to answer the question, ‘Which direction is the motion?’
In this study we use simulated neural responses to test the notion that temporal integration of sensory signals from area MT forms the basis for direction discrimination in the random dots task, and that neurons with persistent activity preceding eye movements represent this integration. For the latter, we focus on neurons in the lateral intraparietal area (LIP), which have been studied using both FD and RT versions of the task. We begin by simulating the well-described responses of MT neurons to random dot motion, and we model LIP responses as a simple time-integral of the difference from opposing pools of MT neurons. We simulate two ensembles of LIP neurons, one for each of the possible eye movement responses. The decision is made when the activity of one of the LIP ensembles exceeds a critical value, that is, when the evidence reaches a threshold. The model furnishes several novel interpretations of behavioral and physiological measurements obtained using the random dots task.
The goal of this modeling exercise is to investigate how well integration of sensory signals can account for behavioral and neural data obtained in the random-dot motion task. We address this question by simulating the responses from two brain regions: the middle temporal area (MT or V5) and the lateral intraparietal area (LIP) (Fig. 3). Each brain region consists of an ensemble of simulated cortical neurons whose firing rates are determined by the model but whose actual spike times are random.
The model is divided into three stages: representation of the evidence, accumulation of the evidence into decision variable, and comparison of the decision variable to threshold (choice). In the first stage, we simulate the responses of two opposed pools of direction selective MT neurons to random dot motion of varying motion strength. For simplicity, all simulations assume that the motion is either to the right or to the left. Each neuron in the two pools produces a sequence of spikes with an expected rate proportional to the strength of motion, based on values from Britten et al. (1993). The averaged spike rate from the two pools furnishes the output of stage 1.
In the second stage we simulate the responses of two pools of LIP neurons. One of the LIP pools represents a plan to choose the right choice target (Fig. 3, LIPright choice), whereas the other represents the opposite plan (Fig. 3, LIPleft choice). Unlike the MT neurons, the expected firing rates of these neurons are time-dependent, determined by the time integral of the difference in the output of the right and left MT pools. The expected LIP firing rate is calculated by integrating the difference in spike rate signals from MT starting from when the coherence-dependent MT response begins (δMT), then delaying the result by δLIP; we do not simulate the LIP response before the integration begins at time δMT + δLIP. Because of the temporal fluctuations in the responses of the MT pools [ and ; see equations A1 and A2, Supplementary Material], the expected LIP responses [ and , equations A3 and A4, Supplementary Material] exhibit random deviations around the motion-dependent drift, similar to a biased diffusion process. As with the MT stage, the averaged spike rate from the two pools of LIP neurons furnishes the output of this stage of the model.
For simulations of the reaction time (RT) task, the LIP responses are compared to a decision threshold. The two ensemble average spike rates are smoothed using a first order filter with time constant τLIP = 0.1 s in order to tame the moment-to-moment fluctuations. The smoothed LIP signals race against each other to provide the weight of evidence for their preferred choice direction. The first to reach the decision threshold, θ, determines the target choice and the decision time. A random time is then added before saccade initiation. Note that the reaction time in a trial is the sum of decision time plus the non-decision intervals [saccadic latency, δMT and δLIP, amounting to ∼300 ms non-decision time on average (Luce, 1986)].
For simulations of the fixed duration (FD) task, the decision is generated in one of two ways. In the first method, LIP signals are allowed to integrate for the full duration of the simulation. The decision in this case is determined by simply comparing the value of the smoothed LIP signals corresponding to the two possible choices: whichever one is higher at the end of the trial determines the choice. In the second method, a decision threshold (θ) is applied, and if one of the LIP signals crosses θ, the decision is determined as in the RT simulation. If neither LIP signal crosses θ by the end of the simulation, the decision is determined by whichever of the LIP signals is higher.
A complete description of the model parameters along with a justification of the values we used can be found in the Supplementary Material. Importantly, although there are many parameters, most are constrained by measurement. The threshold, θ, is the only parameter in the model that was fit to match behavioral accuracy (see Supplementary Material).
We developed a model consisting of ensembles of spiking neurons in order to simulate a process of decision formation in a two-alternative forced choice direction discrimination task. The model represents motion ‘evidence’ in the response of ensembles of simulated MT neurons, and it represents the accumulation of this evidence in the response of ensembles of simulated LIP neurons. The commitment to one or the other alternative is based on the read out of the ensemble spike rate signals in area LIP. Figure 3 shows examples of the smoothed ensemble-average firing rates for MT (blue) and LIP (red). The filters used to smooth the firing rates in this figure, as well as the relative height of θ, are the same as those used to produce the predictions in this study. Note that the smoothed LIP signals, which function to determine the time and direction of the decision, possess a sizeable degree of temporal noise resulting from both the variability of neural spiking and a weak tendency for pairs of neurons to emit spikes within ±10 ms of one another (Mazurek and Shadlen, 2002). This temporal noise is not a consequence of variability in the visual stimulus or in the input from MT, but rather it is inherent in the decision signal.
The results are presented in three parts. First, we compare the predictions of the model to behavioral and physiological data obtained in the reaction time (RT) direction discrimination task, where the subject responds as soon as they arrive at a decision. We then test whether the assumptions of the model hold in the fixed duration (FD) task, where subjects must view the motion stimulus for a specified duration. Finally, we examine the predictions of the model for the relationship between single unit responses and behavior.
Reaction-time Direction Discrimination
The combined measurement of RT and perceptual choice permits an examination of the neural responses that underlie decision formation in the time frame used by the animal to solve the task. The task furnishes two behavioral measures, choice and response time, which must be accounted for by a model of the decision process.
Behavior: Accuracy and Decision Time
The model in Figure 3 works by integrating the sensory evidence — the difference in ensemble spike rates from area MT representing the alternative directions — toward a threshold level (θ). The level of this threshold controls the model’s accuracy: a higher threshold requires more evidence to be accumulated before a response, leading to fewer errors than would be obtained with a lower threshold. Importantly, θ is the only parameter in the model that was fit to behavioral data. We adjusted θ to match the overall accuracy seen in the monkey. At this level, the model produces a psychometric function with similar shape to that measured experimentally (Fig. 4A).
To achieve this level of accuracy in the model, θ = 55 spikes/s, which is 15 spikes/s above the simulated baseline (see Fig. 5B). The particular values for spike rate are not critically important for explaining accuracy since they reflect our choice for the baseline spike rate ( ) and the scaling of the integral (k), which were chosen to mimic the response of neurons in LIP (see next section). More fundamentally, the excursion from baseline to threshold amounts to a net excess of ∼3 spikes per MT neuron from the rightward- or leftward-preferring pool. This characterization of the decision threshold holds regardless of the particular values for and k. Importantly, the same value for θ is applied for all motion strengths.
With the decision threshold fixed to approximate the monkey’s overall accuracy, the model predicts the amount of time it takes to reach a decision. Reaction times for monkey and model are depicted in Figure 4B as a function of motion strength (the chronometric function). For correct choices (Fig. 4B, solid lines), the predicted mean RTs closely match the mean RTs observed in the monkey. Of course, the total RT includes delays besides the decision time (i.e. visual latencies δMT and δLIP and saccade preparation time) whose properties are not well understood, so the precise values of RT predicted by the model should be taken as approximations. More significant is the observation that RTs in the model decrease with motion strength in a similar manner to that seen in the data, despite the fact that the non-decision times in the model are independent of coherence. This suggests that the dependence of RT on motion strength arises primarily from different rates of evidence accumulation at different coherences.
While integration to threshold accounts for accuracy and RT on correct choices, it fails to account for the pattern of RT on error trials. As shown in Figure 4B (gray curves), the model predicts that error RTs will be faster than correct RTs at each coherence, which is inconsistent with data (black curves). These fast errors appear to be caused by early threshold crossings, which are caused by the instantaneous fluctuations in the model LIP activity. We will consider possible solutions to this shortcoming in Discussion. Such errors constitute only ∼14% of trials.
According to our hypothesis, neurons that represent the read-out of area MT compute the integral of the difference between opposing direction signals. Here we examine how well this idea predicts neural responses seen in area LIP. Figure 5 shows averaged responses from 54 neurons in area LIP recorded during the combined direction-discrimination RT task (Roitman and Shadlen, 2002). Recall that the task is arranged so that one of the choice targets is in the neuron’s RF. The responses increase gradually before saccades to the target in the RF, and they decrease gradually before saccades to the target outside the RF (Fig. 5A). The rate of increase or decrease initially depends on the motion strength: when responses are aligned to the onset of the motion (Fig. 5A, left panel), the spike rates diverge more rapidly for strong motion than for weak motion. This part of the graph emphasizes the epoch of decision formation. When we focus instead on the epoch immediately preceding the behavioral choice (Fig. 5A, right panel), the dependence of the responses on motion strength appears greatly reduced. Indeed, on trials when the monkey chose the target in the RF (solid curves), the responses appear to converge to a common point, independent of the motion coherence. In summary, the data show that (i) different strengths of motion result in different rates of accumulation; and (ii) the responses converge to a common point before decisions that motion is toward the RF but not before decisions in favor of the opposite direction. These properties are explained by the model.
The model produces a family of predicted LIP responses that displays many of the same properties as seen in neural recordings from LIP. Figure 5B shows the simulated responses of LIP neurons that would signal a rightward choice. In other words, the RF of these model neurons is aligned to the rightward choice target. When aligned to the onset of motion (Fig. 5B, left panel), the average spike rates of these model neurons trace out ramp-like linear trajectories. The slopes of the ramps are positive (increasing) for trials leading to rightward choices and negative (decreasing) for leftward choices, and the ramps are steeper for stronger motion than for weaker motion. When the responses are aligned to the time of the motor response (Fig. 5B, right panel), the increasing responses for rightward choices look nearly identical, whereas responses associated with leftward choices do not achieve a stereotyped level of activity but instead retain their dependency on motion strength.
These properties are explained by the underlying computations in the model. Responses aligned to the beginning of the trial reflect the integral of the difference between rightward- and letward-preferring MT ensemble responses. Because the magnitude and sign of the MT difference signal varies with the strength and direction of motion, the average rate of rise or decay in the LIP response also depends on motion strength and direction. In contrast, during the time period immediately preceding the motor response, the responses of rightward-choice LIP neurons during rightward-choice trials appear similar for all motion strengths. This is because the responses are constrained to cross a common decision threshold ∼100 ms before saccade initiation regardless of whether the threshold was approached slowly or quickly. Indeed, the precise moment of this crossing is often determined by a brief upward fluctuation in the ensemble LIP signal. On the other hand, rightward-choice neurons are not constrained to cross a common threshold during leftward choices because decision time in that case is governed by the opposing population of LIP neurons, whose RFs align with the left-choice target. Thus, the average responses from rightward-choice neurons (dashed curves) retain a dependency on motion strength even at the end of the decision process culminating in a leftward choice. The model thus explains this puzzling asymmetry that is evident in the data (Fig. 5A).
One somewhat surprising observation is that the simulated response rates achieve peak values well above the decision threshold θ (Fig. 5B, right panel). This occurs because the LIP ensemble activity is smoothed in the formation of the decision signal, which is then compared to decision threshold. Such smoothing delays the growth of the decision signal relative to the spike rates of the neurons within LIP. As a result, the spike rates of single LIP neurons continue their trajectory beyond the threshold by the time the decision is made. The degree of overshoot depends on the value of τLIP, but the effect is seen with even modest smoothing. For τLIP =100 ms, the model achieves spike rates of 65–70 spikes/s at the end of the simulation, ∼100 ms before the saccade, similar to the spike rates seen in the data at this time.
Fixed-duration Direction Discrimination
Up to this point we have mainly discussed the reaction time version of the random dot motion task, where subjects report their decision as soon as possible. Random dot motion discrimination has been studied more extensively under conditions in which the subject views the motion stimulus for a duration that is controlled by the experimenter. In these ‘fixed duration’ (FD) experiments, it is commonly assumed that the subject makes a decision using all the available motion information — that is, that the decision occurs after the motion is turned off. This idea receives experimental support, especially for short stimulus durations in the range of 100–800 ms (Gold and Shadlen, 2000; see also Britten et al., 1992; Burr and Santoro, 2001). This assumption leads to one obvious prediction: accuracy in 1 s FD experiments should be better than the accuracy in RT experiments because the animals typically view motion for less than a second in the RT task.
However, the experimental evidence contradicts this prediction. Figure 6A shows psychometric functions obtained from experiments in which monkeys alternated blocks of trials in the RT task with blocks of trials in a 1 s FD task. The fits are very similar; in fact, performance on the FD task is slightly worse (P < 0.05). This is remarkable considering that the monkey took an average of only 680 ms to initiate an eye movement response in the RT task. Subtracting out visual latencies and motor preparation time, this finding implies that the monkey performed about equally well in the RT task as it did in the FD task even though it had about half the viewing time. Clearly, this is at odds with the most straightforward version of the integration model. If the monkey has twice the viewing time, we would expect an improvement in signal to noise of ∼√2, which would translate to a left shift of the PMF by about this same amount (Fig. 6B).
The remarkable similarity between PMFs on the RT and 1 s FD tasks raises the possibility of a common mechanism. Suppose that instead of using all available evidence during the 1 s motion viewing period, the brain accumulates evidence to a threshold, just as in the RT task, and thereafter stops incorporating motion evidence into the decision variable. This idea, first proposed by Link (Link and Heath, 1975; Link, 1992), leads to several predictions about behavior and neural activity.
First, the hypothesis that both RT and FD tasks use the race-to-threshold mechanism predicts that the performance in the FD task should be similar to that obtained in the RT task so long as the decision threshold (θ) is the same (Fig. 6B; compare RT prediction against the FD prediction for integration to threshold). In fact, accuracy should be slightly worse on the FD task because a decision must be rendered from 1 s of stimulus viewing, even if the evidence is <θ. In principle, there is no reason to set the threshold to the same level for both FD and RT tasks, but we suspect that this occurs because the monkeys performed RT and FD experiments in alternating blocks of trials on the same day. The critical point is that even when given a fixed amount of time to solve the task, the monkey might operate in the same fashion as in the RT task. We next consider the implications of this hypothesis for neural activity in LIP.
Figure 7A (adapted from Roitman and Shadlen, 2002) shows the responses of LIP neurons recorded in 1 s FD experiments; these are the same neurons whose responses in the RT task are plotted in Figure 5A. The responses during FD trials begin with trajectories that are similar to those recorded in RT trials, but as time ensues, the response begins to flatten, eventually reaching a plateau rate whose magnitude depends on the strength of the motion. The pattern is best seen in the larger data set of Shadlen and Newsome (2001), shown in Figure 7B, using viewing durations of 0.5, 1 and 2 s.
The model predictions are shown in Figure 7C. To obtain these predictions we used the same architecture as in the RT simulations, with one exception: when the evidence reaches the decision threshold (θ), the accumulation stops, and the spike rate is fixed at the level it has achieved. Under these assumptions, the responses of model neurons demonstrate the features described above (Fig. 7A,B). The fact that LIP responses plateau at different rates for different coherences may seem counterintuitive: why are the trajectories not fixed at the value of the threshold (θ)? First, the decision is derived from the smoothed output of LIP. The inherent delay between instantaneous measured activity in LIP and the detection of a threshold crossing allows the LIP response to continue to rise beyond the threshold before the decision is made. This effect is more pronounced the higher the motion strength. Secondly, the last phase of the decision is often dominated by transient noise impulses. These transients occur with different latencies from motion onset and are thus concealed within the response averages aligned to the beginning of the trial. That is why, at low motion strengths, the averaged responses remain below threshold despite the fact that on any single trial, this threshold was crossed. Together, the results in Figures 5 and 7 illustrate that the same computation — integration to threshold — can produce different patterns of single-unit responses depending on the demands of the task.
Relationships between Single Neurons and Behavior on Single Trials
One of main dividends of the style of model that we have developed is that it portrays the activity of neurons as sequences of spikes that resemble spike trains from neurons in areas MT and LIP, incorporating their associated variability. We can therefore use the model to formulate quantitative predictions about the relationship between neural activity and decisions on a trial-by-trial basis. A basic assumption is that neural signals are represented by the average spike rate from a large ensemble of neurons with similar response properties. Consequently, the impact of any one neuron on the animal’s behavior is relatively small. In this section, we examine this assumption quantitatively. First, we consider how trial-to-trial variability in the response of single MT neurons relates to the decisions on those trials and to the monkey’s overall performance. Then, we examine how the variability of single neurons in LIP relates to the monkey’s RT on individual experimental trials and to the monkey’s overall performance.
Single Neurons in Area MT
In their original studies, Newsome and colleagues made two quantitative comparisons between MT neurons and behavior. First, they compared the sensitivity of single neurons to the monkey’s behavioral sensitivity in a 2 s FD task (Newsome et al., 1989; Britten et al., 1992). To quantify the neural sensitivity, they compared the distributions of spike counts measured in response to preferred- and null-direction motion. They calculated the probability that a spike count drawn from the distribution of preferred direction responses would exceed a spike count drawn from the distribution of null direction responses. They thus predicted the level of performance that would be achieved if the monkey could compare responses from a pair of neurons tuned to opposite directions of motion, that is, the recorded neuron and one just like it but for the opposite direction preference. This analysis showed that single neurons are surprisingly sensitive to random dot motion: the predicted accuracy versus coherence function from one neuron, which they termed the neurometric function, was strikingly similar to the monkey’s psychometric function on average (Fig. 8A). This result implies that the monkey could achieve observed levels of accuracy using only a single pair of MT neurons with opposed direction preferences.
In the second comparison, the authors measured the correlation between the response of an MT neuron and the monkey’s choice on a single trial (Britten et al., 1996). The trial-by-trial correlation between MT responses and choices was quantified using a measure called the choice probability [CP; also called sender operating characteristic, or SOC, in earlier publications (Newsome et al., 1989; Celebrini and Newsome, 1994)]. To obtain the CP, the spike counts are compiled from each repetition of the same stimulus (i.e. same motion coherence and direction). The spike counts are then partitioned into two distributions, based on whether the animal chose the direction corresponding to the preferred direction of the motion (Fig. 8C). The separation of these distributions is indexed by the CP, which is the probability that a random draw of values from the two distributions would show the right ordinal relationship (i.e. larger for preferred direction choices). A CP value of 0.5 would indicate that the distributions overlap perfectly, hence the neuron’s response is uncorrelated with the monkey’s decision, while a CP value of 1 would mean that every spike count associated with a preferred choice was larger than every spike count associated with a null choice. The studies showed a weak but significant relationship between the variable spike counts from the neurons and the monkey’s choices. Across the population of MT neurons recorded in the task, the average CP was 0.54–0.59, depending on monkey.
Together, these two results presented a paradox. The similarity between the psychometric function and the neurometric function based on a pair of neurons seemed to suggest that the brain does not combine signals from many MT neurons: more neurons would improve the monkey’s accuracy to an unrealistic level. On the other hand, if this were truly the case, the few neurons underlying the decision should be highly correlated with the monkey’s choice. A decision based on a comparison between two neurons would yield a CP of ∼0.85 (Newsome et al., 1989; Shadlen et al., 1996), whereas the average CP was closer to 0.5, suggesting a very weak relationship. Thus the data appear incompatible with either a population-coding model or a model where a few MT neurons determine the decision.
Two solutions to this paradox were proposed by Newsome and colleagues (Shadlen et al., 1996). First, one could assume that the brain uses many MT neurons whose preferred directions are not as well matched to the random dot motion as the neurons that were studied by Newsome et al. (1989), where the motion was precisely aligned to the preferred direction of the neurons. Pooling from such a mixture of optimal and sub-optimal neurons would lead to less improvement in accuracy. However, this resolution was not sufficient to explain the CP. Even with arbitrarily large pools, and regardless of the mix of optimal and sub-optimal neurons, single neurons would be expected to retain a higher degree of correlation with the monkey’s decision than observed (the predicted CP was ∼0.65). This is because the neurons in the pools are thought to covary weakly in their responses. To reduce the predicted CP, they assumed that some additional noise is added during read-out of the MT signal; they referred to this as pooling noise. By adjusting the mixture of neurons and pooling noise, the model could account for both the accuracy of the monkey and the trial-by-trial relationship between single neurons and decisions.
The model in Figure 3 offers a simpler account of these results. When the model is allowed to integrate for a full 2 s, it yields the same problematic predictions as the Shadlen et al. (1996) pooling model: a level of sensitivity that is about two times better than single neurons and the monkey (Fig. 8B), and CP values that are higher than those observed in the monkey (Fig. 8E). Suppose, however, that instead of accumulating evidence for the full duration of the motion stimulus, the monkey reaches a decision when the evidence reaches a threshold, θ, as in the RT task (see Fixed-duration Direction Discrimination). According to this idea, MT spikes would only contribute to the decision during the portion of the trial before the threshold crossing. Importantly, the experimenter does not know when this time occurs and thus includes all spikes from the full 2 s recording into calculations of neural sensitivity and CP.
This alternative decision model resolves the paradox posed by the sensitivity of single MT neurons and the weak trial-by-trial correlation between their variable responses and the monkeys’ decisions. First, the model no longer attains the 2-fold improvement over the measured sensitivity of single neurons because the decision process uses only a fraction of the 2 s viewing time. In fact, the predicted psychometric function is now approximately the same as the neurometric function calculated from just one typical MT neuron using 2 s of spike discharge (Fig. 8C). This correspondence reflects the fact that the decision takes ∼500 ms on average, which is only one-quarter of the total viewing time, so the model’s accuracy is factor of 2 lower than the theoretical maximum. Secondly, the predicted CP in this model is reduced to 0.57 (Fig. 8F), which is within the range of values calculated by Britten et al. (1996)). Again, this occurs because many of the MT spikes used to compute the CP do not actually contribute to the decision, and thus they show no correlation with the choice on a given trial. Hence, this model advanced here can account for all sources of noise between the sensory representation in MT and the decision. Moreover, as shown in Figure 7, it explains features of the LIP responses recorded in FD experiments.
The model makes one prediction that is contradicted by the data. According to our idea, spikes recorded late in the trial are less likely to affect the monkey’s choices because they are often recorded after the monkey has made its decision. Assuming, as we have, that these spikes are independent of those occurring earlier in the trial, we would expect that they would exhibit no correlation with the monkey’s choices. The data from Britten et al. (1996) appear to contradict this prediction (see their figure 11). We will return to this issue in Discussion.
Single Neurons in Area LIP
Our model instantiates the idea that LIP represents the accumulation of evidence for choosing the direction of motion associated with the choice target in the neuron’s response field. As with MT, we assume that ensembles of noisy, weakly correlated, spiking neurons supply the signals used for making decisions about the direction of motion. Therefore, the same intuitions should apply: responses from single LIP neurons should covary weakly with the ensemble signals to which they contribute and therefore with the monkey’s behavioral response.
The LIP neurons studied by Shadlen and Newsome (2001) and Roitman and Shadlen (2002) exhibit responses that can be interpreted as motor planning signals. Not surprisingly, in the 100 ms epoch just before the monkey makes an eye movement, the responses are clearly associated with the eye movement response. Indeed, for many neurons, it is possible to predict with 100% reliability the monkey’s choice based on the spikes in that trial. A more interesting quantity is the spike rate in the epoch ending 100 ms before eye movement initiation in the RT task (Fig. 9A), about the time when the evidence reaches the threshold for commitment. Although the ensemble response in LIP determines the monkey’s choice, individual neurons predict the monkey’s choices with only modest reliability (mean CP = 0.73). Interestingly, the model yields a very similar CP value for single neurons in the same epoch (Fig. 9B).
The model also predicts an association between the responses of LIP neurons and the monkey’s reaction time on a given trial. We would expect the rate of rise of the LIP spike rate to be inversely correlated with RT: steeper changes in LIP should lead to shorter RT. The challenge is to estimate the rate of rise of the spike rate from the sequence of spikes on a single trial. Roitman and Shadlen (2002) did this by finding the ramp that best explained the data (spike train) in the maximum likelihood sense. They used only spikes occurring from 200 ms after motion onset to 100 ms before the saccade. An example of one of their fits is shown in Figure 10A.This analysis yields a spike rate slope value on each trial, which can then be compared to the reaction time on that trial.
Figure 10B shows a scatter plot of RTs and the best estimate of spike rate slopes from one neuron using 15 trials in which the monkey viewed an intermediate motion strength (6.4% coherence) in the direction that led to a correct choice (an eye movement to the choice target in the neuron’s RF). A weak but measurable relationship is apparent between the slope of the spike rate and the reaction time (r = –0.08), with larger slopes leading to shorter reaction times. We repeated this analysis for each neuron, using only those trials in which the monkey made a correct choice to the RF. The histogram of r values obtained in this fashion is shown in Figure 10C. Across the population of 36 LIP neurons, the average correlation was r = –0.13 (P < 0.05). Using data sets of comparable size to those obtained in LIP, the model yields a similar inverse correlation between RT and estimated ramp slope (mean r = –0.13; Fig. 10D). The relatively close match between the predicted and observed correlation levels suggests that the ramping responses seen in single LIP neurons reflect an accumulating decision signal being carried in the average rate of a local ensemble of similar neurons.
Integration with respect to time is thought to play a role in neural computations involved in navigation and motor control. In the brainstem oculomotor system, integration converts a velocity ‘pulse’ to a persistent ‘step’ in spike rate, which is used to hold the gaze in an eccentric position (Robinson, 1989). Similarly, neurons that represent head position are thought to incorporate the time integral of angular velocity of the head (Fukushima et al., 1992; Seung et al., 2000; Aksay et al., 2001; Goldman et al., 2002). Indeed, wherever one observes persistent neural activity in the brain, it seems reasonable to hypothesize that it represents the integration of a transient input. However, unlike the previous examples, persistent activity in higher brain centers can be elicited under a wide range of conditions, and is subject to modulation by a number of factors. How does integration contribute to higher brain function? The experiments described and simulated in this paper represent an attempt to begin to answer that question. For the problem of discriminating between two categorical interpretations of a stochastic sensory stimulus, it seems likely that integration allows the brain to accumulate evidence over time, thereby improving the fidelity of perception and accuracy of decision making.
Our model instantiates the idea that the decision in the random dots task is based on the time integral of motion evidence represented by neurons in area MT. The simulations described in this paper do not test the idea, but instead explore its implications. In that sense, the model exposes important features of the data, furnishes novel interpretations, and makes predictions about future experiments (see Supplementary Material). In what follows, we provide some intuition for the structure of the model, explain the rationale behind several assumptions, and explore its successes, limitations and alternatives.
Structure of the Model
A substantial body of experimental results indicates that neurons in area MT represent the critical sensory signals that monkeys use to base their judgment of random dot motion (for reviews, see Parker and Newsome, 1998; Britten, 2003). This evidence fluctuates in time, reflecting both the variability in the dynamic stimulus and the noise in the neural response (Mazurek and Shadlen, 2002). It must be accumulated in order for the brain to reach an accurate decision. The model suggests that this integrated motion evidence is carried in the responses of sensorimotor association areas such as LIP. When the accumulated evidence reaches a threshold level, the decision process terminates. The higher the threshold, the longer it will take to reach a decision but the more accurate the decision will be, on average. The model thus takes us from the representation of motion to a commitment to a choice.
The model’s structure incorporates several assumptions about the nature of the neural signals in these computations. These are detailed in the Supplementary Material along with a description of the mathematical operations used to perform the simulations.
What Do We Learn from the Model?
Integration of MT signals to a decision threshold explains the effect of motion strength on the monkey’s behavior in the reaction time (RT) task (Fig. 4). It reconciles the accuracy of the monkey’s decisions with the amount of time required to make them. Stronger motion produces a rapid accumulation of evidence toward the correct decision threshold. Weaker motion produces a more random accumulation that meanders between either decision threshold, leading to a mixture of correct and incorrect responses. For the 0% coherent motion, the accumulation is equally likely to ‘diffuse’ toward either threshold.
Accumulation to threshold also accounts for the psychometric function observed in the fixed duration (FD) task, where the experimenter controls the stimulus duration (Fig. 6). This idea resolves a puzzling observation: monkeys achieve similar level of accuracy on RT and 1 s FD tasks (Fig. 6A), despite the longer viewing times provided in the FD experiment. We propose that this surprising behavior occurs because the monkey employs a threshold of evidence for committing to a decision in the FD task, just as in the RT task (Link and Heath, 1975; Meyer et al., 1988; Ratcliff, 1988; Link, 1992). The similarity in performance suggests that the monkeys used the same level of the decision threshold (θ) in the two tasks.
This idea does not imply that monkeys cannot improve their performance with increasing viewing times. If the viewing duration is relatively short, the integrated evidence may fail to reach threshold. In that case the optimal solution is to respond based on the available evidence, which is expected to increase in reliability as a function of √t. Behavioral evidence for such integration can be found in FD experiments (Britten et al., 1992; Gold and Shadlen, 2000; Gold and Shadlen, 2003).
Integration to threshold explains the time course of the responses from LIP neurons in both RT and FD tasks (Figures 5 and 7). This observation reconciles a remarkable discrepancy in the experimental data: the responses during motion viewing exhibit different time courses depending on whether the monkey is performing the RT or the FD task (Roitman and Shadlen, 2002). Our simulations demonstrate that integration to threshold can account for the time course of the responses in both of these situations, predicting both ramp-like increases and decreases in the RT task and saturating functions of time in the FD task. The model also reconciles some puzzling observations about the LIP activity: (i) the convergence of rising responses preceding eye movements to the RF in the RT task (Fig. 5A,B, solid curves, right panels); (ii) the absence of convergence of declining responses before eye movements to away from the RF in the RT task (Fig. 5A,B, dashed curves, right panels); (iii) the absence of convergence of rising responses (i.e. the coherence-dependent saturation) in FD trials (Fig. 7); and (iv) the apparent failure of averaged responses in FD experiments to reach a common threshold.
Because the model explicitly represents the activity of single MT and LIP neurons, it lends insight into the relationship between behavioral measurements and single neuron recordings obtained in the laboratory. By mimicking the spike discharge of single MT neurons, the model explains the observed relationship between MT responses and the monkey’s decisions on a trial-to-trial basis (choice probability; Fig. 8D–F). By mimicking the spike discharge of single LIP neurons, the model explains the observed relationship between LIP responses and behavior in the RT task (Figures 9 and 10). The explanation of these single-trial comparisons, which are only available from combined psychophysics–neurophysiology experiments, are the main dividend of our modeling exercise. Finally, the idea that evidence is integrated to a threshold, even when given a fixed viewing duration, serves to reconcile the exquisite sensitivity of single MT neurons with the overall sensitivity of the monkey to random dot motion (Fig. 8A–C).
Model Limitations and Failures
A major limitation of our model is that it fails to specify how key computations are achieved by neurons. It offers no insight into the neural mechanisms that would underlie integration, or direction selectivity, or even addition and subtraction. Nor does the model attempt to explain the mechanism by which the LIP activity is compared to the decision threshold. The biophysical and circuit properties underlying operations such as integration are an active topic of investigation at the theoretical and experimental level (Camperi and Wang, 1997, 1998; Seung et al., 2000; Aksay et al., 2001; Koulakov et al., 2002). Our model circumvents these issues by performing simple mathematical calculations on spike rate signals, but obviously this leaves open the question of how the brain actually accomplishes the calculations.
Similarly, our model provides little insight into the control of the integration process. For example, we do not attempt to explain the early dip and recovery of neural activity that occurs just after onset of random dot motion. These ‘dips’ are a common feature of neurons with persistent activity (Sato et al., 2001), and we suspect that this phenomenon might play a role in initiating the integration, perhaps serving as a reset. Similarly, we assume that LIP stops receiving sensory input after the decision threshold is crossed, even when motion information remains present as in the FD task. This may be plausible in the context of the random dot motion task, but it is undoubtedly an oversimplification. Clearly, neural integration for high-level cognitive behavior requires complex controls and is susceptible to numerous influences, and our model makes no attempt to capture these various factors.
Finally, neither our data nor our model address the question of whether integration actually occurs in LIP or elsewhere, only to be relayed to LIP. Several brain areas show decision related activity similar to what is observed in LIP, including the prefrontal cortex and the superior colliculus (Kim and Shadlen, 1999; Horwitz and Newsome, 2001). It is unknown whether any or all of these areas are responsible for integration of the evidence from MT, or to what degree their responses simply reflect this process.
In addition to its limitations, the model makes several predictions that are known to be incorrect. The most serious discrepancy between model and data is in the RT associated with error trials. The model predicts relatively fast error responses, whereas monkeys tend to take slightly more time to respond on error trials than on correct trials at each coherence level (Fig. 4). Diffusion or random walk models, which are related to ours, predict that errors and correct trials share the same chronometric function (RT versus motion strength) (Luce, 1986). Our model produces fast errors because of variability in the LIP signal, resulting in early threshold crossings. How might these fast errors be remedied? Ratcliff and Rouder (1998) demonstrated that the inclusion of trial-to-trial variability in the effective stimulus strength produces slower errors in diffusion models sharing features with ours. This source of variability is not incorporated in our model: the expected MT response is identical across trials sharing the same motion strength, which is clearly an oversimplification. Remedying this might bring the predicted error RTs closer to those observed in the data. In addition, we should note that monkeys performing the discrimination task probably make errors for a wide variety of reasons, many of which are not captured by the integration model (e.g. lapses in attention, blinks). Hence, error RTs may be harder to explain than other aspects of the behavior.
There is at least one other prediction of the model that is contradicted by data. In simulations using a fixed 2 s duration, the model posits that MT inputs are ignored after LIP crosses the decision threshold. This predicts that MT spikes occurring early in the viewing period should correlate with the monkey’s decision more strongly than spikes occurring later in the trial, which is contrary to the findings of Britten et al. (1996). One possible explanation may be that the monkeys in the Britten et al. study began integrating MT spikes at a variety of latencies during the 2 s motion viewing period. This is consistent with our observations that the animals require much less than the 2 s to reach a decision. Some support for this idea can be found in Shadlen and Newsome’s study of LIP (Shadlen and Newsome, 2001). They found that exposure to viewing durations shorter than 2 s caused an acceleration in the build up of LIP activity for trials of all durations. Direct measurements of MT neurons in monkeys trained to perform both FD and RT tasks may help to clarify these issues.
Model Extensions and Alternatives
The analyses presented in this paper show that time integration of sensory evidence to a threshold accounts for a wide variety of behavioral and physiological observations in monkeys trained to discriminate the direction of random dot motion. Several extensions to the basic idea deserve consideration. Variability in the parameters of integration, such as the starting time, the baseline spike rate, and the drift rate, may be helpful in remedying the failures and improving the generality of the model, especially when incorporating prior biases into the decision (Carpenter and Williams, 1995; Ratcliff and Rouder, 1998). Similarly, the decision threshold may be variable and is likely to change as a function of time. A dynamic decision threshold embodies the idea that commitment to one or another behavioral option may need to occur by some time point or with some degree of urgency (Reddi and Carpenter, 2000). In general, the decision threshold is likely to incorporate knowledge of the accuracy that has been achieved, the passage of time, and the rate at which reward is attained (Gold and Shadlen, 2002). These factors will need to be incorporated into a more complete computational model.
The main alternative to temporal integration is a decision process that is not based on an accumulation of information but instead on extreme values in the sensory data, considered as a passing stream. This type of process is often termed ‘probability summation’ (Watson, 1979). Like integration, probability summation predicts that accuracy and decision time will depend on motion strength. However, probability summation would not predict the existence of neurons like those in LIP that represent a decision variable that evolves gradually. Rather, it predicts that at any moment, the brain is either committed to a decision or it has yet to detect the decisive sensory data. The pattern of neural responses seen in LIP fails to support this alternative, but individual neurons could in principle undergo the kind of state change predicted by this model on individual trials. The idea is that the average of many such switches from an intermediate spike rate to a high spike rate (or a low spike rate for the opposite choice) at different times could give rise to the ramp-like trajectories seen in Figure 5. The evidence at present favors integration over probability summation (see Gold and Shadlen, 2000; Roitman and Shadlen, 2002), but further experiments will be necessary to clarify the matter
We have examined the proposal that neural integration of time-varying sensory signals underlies the formation of decisions about the direction of random dot motion. The model explains a variety of behavioral and physiological observations, and its deficiencies are relatively minor. The model provides no insight into how integration comes about, but it does help to answer two fundamental questions: what is integrated and to what end? The answer to the first question is sensory data construed as evidence for versus against a proposition. The answer to the second question is that the evidence is integrated to a threshold level, and the crossing of the threshold signals a commitment to a proposition or behavioral response. For the random dots task, the integral is represented by neurons in the parietal cortex and other visual-oculomotor association areas. A common property of these neurons is their ability to discharge in a persistent fashion and thus carry their representation of space or motor intention forward through time. Persistent activity is observed in many cortical areas, where it is has been associated with memory traces (Fuster and Alexander, 1971; Fuster, 1973; Miyashita and Chang, 1988; Funahashi et al., 1989) and motor planning (Evarts and Tanji, 1976; Tanji and Hoshi, 2001). It is tempting to view such activity as a hallmark of integration, furnishing the brain with the capacity to evaluate propositions about the state of the world in order to plan appropriate behavior.
Supplementary material can be found at: http://www.cercor.oupjournals.org
Supported by HHMI, NIH grants RR00166 and EY11378. M.E.M. was supported by an NIH training grant (GM07108), a Poncin grant, and an ARCS fellowship. We thank Carlos Brody, Alex Huk, Joshua Gold, and John Palmer for helpful comments and discussion.