Striatal Dopamine Signals and Reward Learning

Abstract We are constantly bombarded by sensory information and constantly making decisions on how to act. In order to optimally adapt behavior, we must judge which sequences of sensory inputs and actions lead to successful outcomes in specific circumstances. Neuronal circuits of the basal ganglia have been strongly implicated in action selection, as well as the learning and execution of goal-directed behaviors, with accumulating evidence supporting the hypothesis that midbrain dopamine neurons might encode a reward signal useful for learning. Here, we review evidence suggesting that midbrain dopaminergic neurons signal reward prediction error, driving synaptic plasticity in the striatum underlying learning. We focus on phasic increases in action potential firing of midbrain dopamine neurons in response to unexpected rewards. These dopamine neurons prominently innervate the dorsal and ventral striatum. In the striatum, the released dopamine binds to dopamine receptors, where it regulates the plasticity of glutamatergic synapses. The increase of striatal dopamine accompanying an unexpected reward activates dopamine type 1 receptors (D1Rs) initiating a signaling cascade that promotes long-term potentiation of recently active glutamatergic input onto striatonigral neurons. Sensorimotor-evoked glutamatergic input, which is active immediately before reward delivery will thus be strengthened onto neurons in the striatum expressing D1Rs. In turn, these neurons cause disinhibition of brainstem motor centers and disinhibition of the motor thalamus, thus promoting motor output to reinforce rewarded stimulus-action outcomes. Although many details of the hypothesis need further investigation, altogether, it seems likely that dopamine signals in the striatum might underlie important aspects of goal-directed reward-based learning.


Re w ar d-based Reinforcement Learning
The brain constantly receives sensory input while governing motor output ( Figure 1 ).Sensory input to the brain provides both external and internal information.Information about internal states, such as thirst, provides motivation for goal-directed behavior.External cues may help animals learn when, where, and what to do in order to obtain a r ew ar d, such as ho w to obtain water if thirsty.If a specific action is r ew arded in a gi v en sensor y context, then it might be important for an animal to learn and reinforce such a stim ulus-action-r ew ard coupling.For such learning to occur, neuronal circuits in the brain must change so that the relevant sensory neurons signal to the correct motor neurons in order to execute the appropriate goaldir ected sensor y-to-motor tr ansformations.Suc h r ew ard-based sensorimotor learning is not a tri vial pr ocess for neur onal networks because: (i) r ew ard is necessarily delayed r elati v e to action initiation and sensory processing; (ii) animals constantly receive m ultimodal sensor y information and ar e constantl y in motion; and (iii) primary rewards are typically sparse in natural conditions.Thus, assuming that animals are trying to maximize their futur e-obtained r ew ard, the brain should learn and reinforce the sequence of sensorimotor events yielding the highest r ew ard pr oba bility.
Experimentally, in trial-based reward learning, subjects are typicall y pr esented with sensor y information and need to perform a specific action in order to obtain a r ew ard.The r ew arded stim ulus, the sensor y context, and the r equir ed action might not be known to the subject in the first place and needs to be discov er ed thr ough trial-and-err or learning.Thus, ther e m ust be neuronal signals in the brain that encode the value of each trial outcome (r ew arding, av ersi v e, or neutral).If the outcome is unexpectedly positive/ne gative , then brain circuits should be modified to r einforce/r educe the link between these successful/unsuccessful sequences of sensory events and motor commands.
Interesting insights into the types of signals that could dri v e such bidirectional modulation come from the conversation between the fields of neuroscience and reinforcement learning.In the latter, r ew ards ar e used to learn associations between actions and outcomes and to use that information to maximize total futur e r ew ards.While many r einforcement learning methods are successful at learning such associations, temporal difference (TD) learning is of particular interest for neuroscience 1 , 2 due to its capacity to assign credit to sequences of events leading to r ew ard and its sensiti vity tow ard the temporal relationships between stimulus and outcomes.
In early models of reward-based learning, the primary focus w as on esta b lishing associations between conditioned stim uli and actions leading to r ew ard.These models, initiall y formalized by Bush and Mosteller, 3 , 4 proposed that classical associations str engthen thr ough the iterati v e computation of an error term that captures the difference between the reward an animal r ecei v ed at the current trial and what was experienced following pr evious pr esentations of the stim ulus wher e A i r e pr esents the str ength of the association between the conditioned stimulus and the action at trial i; r i r e pr esents the obtained r ew ard at trial i; and α ∈ [ 0 , 1 ] is the learning r ate .
This form ulation w as famousl y employed in the Rescorla-Wagner model of P avlo vian conditioning, 5 but was limited in scope as it boils down to computing a weighted av era ge of past r ew ards.As such, it is una b le to account for a wide array of phenomena commonly observed in psychology, where various events occurring within a trial con ve y distinct information a bout r ew ard av aila bility.In particular, it fails to learn secondorder conditioning, that is, learning that if a primary cue predicts r ew ard and the occurrence of a secondary cue predicts the primary cue, then the secondary cue becomes pr edicti v e of r ew ard and the appearance of the primary cue does not convey new information.A fundamental shift in understanding r ew ardbased learning occurred with the introduction of TD learning by Sutton and Barto to address these problems. 1 , 2Temporal difference learning departs from Rescorla and W agner' s approach in tw o ways.F irst, it breaks down the trial structure into a series of n discrete time steps (referred to as states), s = 1 ≤ i ≤ N, Figure 1.Learning sensory-to-motor transformations from rewards.Animal behavior is determined by incoming sensory information, innate neuronal circuits, shortand long-term memories, and internal states.In part, actions are tuned to maximize r ew ard.Animals can learn to obtain r ew ards by responding with appropriate goal-directed motor output to relevant reward-predicting sensory input in specific contexts through trial-and-error reward-based learning.Reward signals (blue) are thought to dri v e synaptic plasticity in neuronal circuits, such that r elev ant sensor y signals in sensor y neur ons (S , or ange) dri v e appr opriate motor output contr olled by motor neurons (M, green) in order to receive reward.ena b ling learning not only at the end of a trial after reward delivery, but at each moment.Second, instead of focusing on learning the value of past events, TD learning formulates the problem as predicting the value of future ones.At each time point t, where one of the states s is visited, it makes a prediction of the expected future reward, referred to as the value of that state.After learning occurs, these predictions will converge to the expected sum of the curr entl y expected r ew ard and the discounted r ew ard of all futur e time points where V t is the value at t; r t is the obtained r ew ard at t, and γ ∈ [ 0 , 1 ] is the discount rate ensuring that the sum is finite by discounting r ew ards coming far in the futur e ov er nearby ones and E is the mathematical expectation.
The cor e conce pt in learning these values ( V t ) lies in the discr e pancy between the obtained and the predicted reward.The difference is termed reward pr ediction err or (RPE) and serves as the teaching signal used to adjust V t where V t is the value of the state visited at time t and r t is the obtained r ew ard at that time.When an unexpected r ew ard is obtained, it creates a discrepancy between the reward currently expected at that state and the r ew ard actuall y obtained.This positi v e differ ence leads to what is commonl y r eferr ed to as a positive RPE.If a punishment is deli v er ed or the obtained r ew ard is lower than expected, then a negative RPE is generated.Both positi v e and negati v e RPE ar e used in the adjustment process of V t according to the following update rule, with α indicating the learning rate: To learn that the presentation of a cue predicts a reward, the system adjusts the value of the cue so that an RPE is generated at the cue presentation.This is achieved by treating cue presentation and r ew ard deli v er y as two distinct time ste ps and, critically, by estimating at the time of stim ulus pr esentation not only the immediate reward (which never comes at that time), but also the r ew ard expected at the later step of delivery.In other words, this process allows the generation of RPEs not only when an unexpected r ew ard is obtained but also when unexpected information signaling a r ew ard is r ecei v ed, effecti v el y pr opa gating backwards in time the RPE of the r ew ar ded state to war d the pr edicti v e cue.][8][9] It is crucial to note that while r ew ards can be deli v er ed fr equently in laboratory settings, typically in the form of food or w ater to hungr y or thirsty animals, in natur e, physical r ew ards can be v er y sparse , whic h makes learning difficult for animals as well as for artificial agents.Recently, there have therefore been efforts to expand the definition of r ew ard and intr oduce other concepts that could serve as additional learning factors such as curiosity, 10 surprise, 11 or novelty. 12

Midbrain Dopamine Neurons Signal Reward Prediction Errors
The most pr ominentl y described r ew ard signal in the mammalian brain comes from midbrain dopaminergic neurons located in the substantia nigra pars compacta (SNc) and the adjacentl y l ying v entr al te gmental area (VTA) ( F igure 2 A).These midbrain dopaminergic neurons strongly project to the dorsal and ventral striatum, with SNc dopamine neurons projecting to the dorsal striatum and VTA dopamine neurons mainl y pr ojecting to the v entral striatum, also termed the nucleus accumbens.4][15] As first described in monkeys by Schultz et al. 6 , AP firing rates of some SNc/VTA neurons rapidly and transientl y incr ease in r esponse to unexpected r ew ards ( Figure 2 B) and more generally were found to signal RPE, 6 which In order to study their activity, e xtracellular electrophysiolo gical recor dings can be tar geted to the midbrain dopaminer gic neur ons (dark b lue) located in the substantia nigra pars compacta and the VTA, which r especti v el y pr ominentl y innerv ate the dorsal striatum and v entral striatum, also known as the nucleus accumbens (light blue).(B) The work of Wolfram Schultz and collea gues r ev ealed that deli v er y of an unexpected r ew ard transientl y incr eases AP firing in putati v e midbrain dopamine neurons of monkeys. 6(C) Optotagging can be used to record AP firing of genetically identified classes of neurons.For example, the light-gated ion channel ChR2 can be expressed specifically in dopaminergic neurons of the midbrain through mouse genetics and viral transfection.Blue light flashes specificall y dri v e AP firing in ChR2-expr essing neur ons, which can be recorded by an optrode, a device consisting of an optical fiber coupled to an extracellular recording electrode.Blue light delivery can evoke precisely timed AP firing in subsets of neur ons expr essing ChR2.The work of Naoshige Uchida and colleagues studied opto-tagged dopaminergic neurons and found that such genetically defined dopaminergic neurons transiently increased AP firing in response to rewards in mice, 8 as shown in panel B and similar to the previous work in monkeys. 6(D) Substantial evidence supports the hypothesis that dopamine neurons do not only respond to unexpected rewards, but more precisely they encode RPEs.Animals can learn that specific sensory stimuli reliably predict future rewards.After learning, the reward-predicting sensory stimulus evokes a rapid transient increase in dopamine neuron AP firing, but there is no dopamine signal upon reward delivery because it is now entirely expected (top).However, if reward is omitted, then there is a drop in dopamine firing rates because the outcome was worse than expected (negati v e RPE) (middle).On the other hand, if the r ew ard-pr edicting sensor y stim ulus is omitted, then r ew ard deli v er y is unexpected and is a gain accompanied by incr eased dopamine neur on firing (below).
had been identified as an important learning signal associated with octopamine in the honey bee. 16 , 17Ther e ar e different types of neurons in the SNc/VTA that can be distinguished by distinct molecular, structural, and functional features.An important advance supporting the dopamine reward coding hypothesis came from the work of Naoshige Uchida's la borator y thr ough electr ophysiological r ecordings of optogenetically identified dopaminergic neurons. 8In this method, the light-gated cation channel channelrhodopsin-2 (ChR2) is specificall y expr essed in dopaminergic neur ons by injecting Cr e-de pendent adeno-associated virus (AAV) into the SNc/VTA of dopamine reuptake transporter (DAT)-Cre mice ( Figure 2 C).These mice express Cre-recombinase in cells that express the plasma membrane DAT, a key signature of dopaminergic neurons.Blue light flashes delivered to the midbrain through fiber optics can then ev oke acti vity by dir ectl y de polarizing the ChR2expressing dopamine neurons.Optogenetic stimulation evoked pr ecisel y timed AP firing in opto-ta gged SNc/VTA neur ons in DAT-ChR2 mice, thus genetically defining them as dopaminergic.These optogenetically defined dopaminergic neurons had AP firing patterns consistent with transientl y incr eased acti vity in response to unexpected rewards and RPE ( Figure 2 B). 8 As discussed in the previous section, signals r e pr esenting unexpected r ew ard and RPE ar e useful for reinforcement learning, and dopamine neurons could therefore causally serve to deli v er such learning signals.To test this hypothesis, it is essential to dir ectl y manipulate the activity of dopamine neurons.Current experimental data largely supports the notion that artificially induced transient increases in dopamine indeed act to positi v el y r einforce behavior.By expressing ChR2 in dopamine neurons, dopamine concentrations can be increased through b lue light stim ulation, similar to the opto-ta gging experiments described a bov e. Optogenetic stim ulation of dopamine neur ons w as found to induce place pr efer ence , suc h that a mouse would spend more time in the region of the test chamber in which the dopamine neurons were driven to fire at high frequency. 18Mice were also found to reinforce nose-poking when this triggered optogenetic stimulation of dopaminergic neurons. 19In an oper ant tr aining behavior, mice would also learn to press a lever in order to self-stimulate dopamine neurons optogenetically, with some mice persevering with dopamine self-stim ulation ev en when this w as pair ed with footshock. 20n head-restrained mice carrying out a visual discrimination task in order to r ecei v e r ew ard, stim ulation of dopamine neur ons appear ed to enhance learning speed and prolonged session duration. 21Optogenetic stimulation of dopamine neurons was also found to evoke orofacial movements that share similarities to those time-locked to r ew ard-pr edicting cues in animals trained in stim ulus-r ew ard association tasks. 22Finall y, specific spontaneous movements occur more frequently in a behavioral session if they were previously paired with optogenetic stimulation of dopaminergic axons in the dorsolateral striatum (DLS). 23 lar ge bod y of evidence ther efor e supports the notion that increases in dopamine are rewarding and act as positi v e r einforcers.Conv ersel y, r educed firing of dopamine neur ons, either b y stimulating GAB Aergic inputs to the dopamine neurons 24 or direct optogenetic inhibition of dopamine neurons 25 can cause aversion, acting as a negative reinforcer.
Inter estingl y, dopamine not only signals unexpected primary r ew ards, such as water for a thirsty animal, but, through learning, dopamine signals develop in response to cues that predict futur e r ew ard ( Figur e 2 D). 6 If a sensor y cue is r e peatedl y pr esented to a subject in a manner that anticipates r ew ard deli very, then dopamine neurons will shift their responses to the earliest time point pr edicti v e of upcoming r ew ards.This computation is useful for optimally learning r ew arded sensorimotor sequences. 6 , 7 , 26 , 27This shift in their response profile upon learning str ongl y r eflects what one would expect from an RPE signal, according to the TD learning model.As mentioned in the previous section, in TD learning, RPE is obtained from the difference between the observed and predicted values of the current state.When we consider this computation performed moment-by-moment in a time-continuous setting, the result is that TD-RPE approximates the derivative of the value function.As unexpected r ew ards ar e pr esented, the sudden increase of value associated with that state generates a positi v e RPE, which pr opa gates to the pr edicti v e cue over learning.At that stage, dopamine neurons will respond to the increase in rate of change of the value function associated with the presentation of the cue that is only resolved at the time of reward presentation.
Indeed, as a r ew ard becomes expected, the response of dopamine neurons to reward scales as a function of the difference between the obtained and the expected r ew ard, that is, dopamine neurons increase their response to reward when the r ew ard is bigger than expected and decrease when the r ew ard is smaller than expected, even going below baseline firing rates in case of r ew ard omission. 6 , 8 , 9 , 28 , 29If the sensory cue predicts upcoming reward with complete certainty, then the r ew ard deli v er y itself becomes entir el y expected, and thus, the dopamine signal at the r ew ard deli v er y time decr eases. 7 , 26his is well in accordance with the RPE signaling hypothesis and implies that with experience, the subject builds an internal model that associates an expected value (reward probability and size) to a gi v en sensorimotor sequence in a specific context.Whereas in monkey studies dopamine signals for fully predicted r ew ard completel y disappear, this has typicall y not been observed in mouse experiments, perhaps because mice have noisier time estimation abilities or may have been less extensi v el y trained and thus remain more uncertain about reward expectations. 8Indeed, even in monkeys, if the sensory cue is onl y partiall y pr edicti v e, then the partiall y pr edicted r ew ard evokes a dopamine signal, although smaller compared to that e voked b y deli v er y of the same r ew ard at an unexpected time, and the sensory cue evokes a reduced dopamine signal compared to fully reward-predicting cues. 7 , 26hat dri v es RPE signals in midbrain dopamine neur ons?One major input that dopamine neur ons r ecei v e comes from nearby inhibitory GABAergic neurons.Dopaminergic neurons form appr oximatel y 60% of the neur onal population in VTA, and the r est ar e mostl y GABAergic inhibitor y neur ons, with a smaller fraction of glutamatergic neurons. 30GABAergic neurons in the VTA are more represented in the rostral and medial parts.Electrophysiological recordings from opto-tagged GABAergic neurons located in the vicinity of the VTA revealed that these inhibitor y neur ons appear ed to encode expectation a bout r ew ards, but wer e not str ongl y affected by r ew ard deli v er y or omission. 8Whereas dopamine neurons in VTA represent the difference between observed and predicted reward, the GABAergic neurons predominantly represent only the predicted r ew ard.Optogenetic manipulations of the VTA GABAergic neur ons dir ectl y showed that these neurons provided inhibition to the VTA dopamine neurons in a subtracti v e manner, such that the pr edicted r ew ard is subtracted fr om the actual r ew ard in order to compute RPE. 31 In addition to input from local inhibitory neurons, dopamine neur ons r ecei v e a m ultitude of inputs fr om other brain r egions.These can be identified in a brain-wide manner by using monosynapticall y r estricted mapping of pr esynaptic neur ons labeled by a modified rabies virus. 324][35] Ventral tegmental area dopamine neur ons hav e been shown to r ecei v e glutamatergic inputs fr om the cortex, 36 brainstem, 37 midbrain, 38 basal forebrain, 39 and dorsal raphe nucleus, 37 and GABAergic inputs from the lateral hypothalamus, 40 brainstem, 38 dorsal raphe nucleus, 35 and ventral pallidum. 41Interesting differences were also observed comparing the inputs to SNc and VTA dopamine neurons. 33 , 34hereas SNc dopamine neurons receive stronger input from the dorsal striatum, globus pallidus, subthalamic nucleus, substantia nigra pars reticulata (SNr) and sensorimotor neocortex, VTA neur ons r ecei v e str onger input fr om v entral striatum, ventral pallidum, lateral ha ben ula (LHb), pr efr ontal and orbital cortex, hypothalamus, and dorsal r aphe .These differences in their inputs likely contribute to their differ ential acti vity patterns, as discussed later.
The neuronal circuitry and function of LHb inputs have been studied in some detail.Glutamatergic LHb neur ons wer e found to be excited b y re ward omission, while reward-related stimuli inhibit their activity. 42In doing so, the LHb governs the activity of the do wnstream dopaminer gic r ew ard system, mainl y through a disynaptic pathwa y rela yed via midbrain GABAergic neurons of the rostromedial tegmental nucleus (RMTg), also known as the tail of the VTA. 43The RMTg neurons show phasic acti v ation in r esponse to av ersi v e stim uli like footshocks, shock-pr edicti v e cues, food de pri v ation, or r ew ard omission, whereas they are inhibited after rewards or reward prediction. 44As such, inhibition of LHb activity during r ew ard stimuli would reduce the excitation of GABAergic RMTg neurons, leading to disinhibition of dopaminergic VTA neurons.This neur onal circuitr y ma y pla y a ke y r ole in r ew ard learning since ha ben ula lesions were found to impair midbrain dopamine neur ons fr om encoding r ew ard omission in a r ew ard-conditioning task. 45In support of these observations, it was recently reported that r ew ard-pr edicti v e cues dri v e LHb inhibition mediated by fast GABAergic neurotransmission, which increases as rewardanticipatory behavior emerges. 46Although there is growing evidence that LHb participates in r ew ard-based learning thr ough a LHb-RMTg-VTA circuit, 46 , 47 the upstream synaptic inputs onto LHb neurons active during these behaviors currently remain less clear.
As an alternati v e to the extracellular electrophysiological measurement of the somatic AP firing of dopamine neurons, it is also possible to image the activity of dopamine axons, 48 or to dir ectl y ima ge dopamine r elease, 49 , 50 r ev ealing inter esting spatiotemporal dynamics of dopaminergic signaling in the striatum.These methods have largely replaced previous efforts to measure dopamine through microdialysis or voltammetry.Genetically encoded fluorescent calcium indicators, such as GCaMPs, 51 , 52 can be expressed in midbrain dopamine neurons through combining mouse genetics and viral vectors, and the acti vity of indi vidual axons can be ima ged using two-photon microscopy together with inv asi v e cranial windows or bulk signals can be measured using fiber photometry . 48Recently , it has also become possible to image dopamine release more directly through the development of genetically encoded fluorescent pr oteins sensiti v e to dopamine , suc h as dLight 49 and GRAB-DA. 50Both of these dopamine sensors were engineered to couple a nati v e dopamine r ece ptor to a circularl y perm uted GFP, r endering fluor escence upon dopamine binding to the r ece ptor.
On the whole, the imaging data are consistent with a transient increase in dopamine in the ventral striatum (nucleus accumbens) following an unexpected r ew ard and more generally signaling RPE, 48 , 49 , 53 in good a gr eement with the electrophysiological measurements.However, other parts of the striatum seem to r ecei v e differ ent dopamine signals.Indeed, movement, c hoice , and motor-related signals may dominate dopamine signaling in the DLS, with a smaller contribution of RPE signals. 48 , 53-56In the striatum, there appears to be a gradient between the ventral and the dorsal parts, with dopamine mor e pr ominentl y signaling movement in the dorsal striatum and reward in the ventral striatum, 48 , 53 , 55 although important r ew ard signals have also been r e ported in the dorsal striatum. 57Indeed, optogenetic stimulation of SNc dopamine neurons, which primarily innervate the dorsal striatum, enhanced movement initiation, whereas optogenetic inhibition of these neurons reduced the probability of movement initiation. 55Wave-like propagation of dopamine signals has also been r e ported thr ough ima ging acr oss the dorsal striatum with the directionality changing depending upon task v aria b les. 581][62] Other studies suggest that dopamine neurons in the SNc/VTA may also incr ease acti vity in r esponse to nov el sensor y stim uli, and the incr ease in dopamine r elease follo wing a no v el stim ulus ma y pla y an important role in the learning of the association between that stimulus and the r ew ard deli v er y when the stim ulus pr edicts the r ew ard. 63Mor e r ecentl y, the tail of the striatum (ie, the most caudal part) has been identified as another region receiving distinct dopaminergic signals.The tail striatum r ecei v es dopamine innerv ation fr om the most lateral part of the substantia nigra.Rather than encoding r ew ard or movement, the tail striatum dopamine signals seem to function as reinforcers for the avoidance of threatening stimuli. 64High-intensity unexpected sound stimuli, but not rewards, dr ov e dopamine incr eases in the tail striatum, unlike the ventral striatum.Optogenetic stimulation of dopamine fibers in the tail striatum dr ov e av ersion, wher eas optogenetic stim ulation of dopamine fibers in the ventral striatum dr ov e positi v e r einforcement. 64It is ther efor e clear that there are diverse dopamine signals in different parts of the striatum.
Finally, it is important to remember that although dopamine is considered a key signal for reward learning, it is likely to function in a cooperati v e manner with several other neuromodulatory systems.For example, in the primary visual cortex of r ats, acetylc holine has been shown to be necessary for the learning of the expected time of r ew ard pr edicted by a visual stimulus during reinforcement learning. 65 , 66This finding was supported by in vitro brain slice experiments in which the acti v ation of meta botr opic acetylcholine r ece ptors pr olonged the duration of spiking in layer 5 pyramidal neurons evoked by electrical stimulation, extending the time window for synaptic plasticity to occur. 65Thus, as described in more detail in the next section, meta botr opic signaling by some neur omodulators seems to share common features of promoting synaptic plasticity and learning.

Dopaminergic Modulation of Synaptic Plasticity in the Striatum
The activity of dopamine neurons, at least in part, appears to serve as a signal that encodes RPE.In order to understand what impact these dopamine signals might have upon the brain, it is obviously important to consider where dopamine is released.The most prominent target of the axons of the midbrain dopaminergic neurons is the striatum, and it is presuma b l y by r eleasing dopamine in the striatum that the midbrain dopaminergic neurons carry out an important part of their function.Two classes of striatal projection neurons make up the vast majority of neurons in the striatum, and these two classes of inhibitory GABAergic neurons can be distinguished through anatomical and molecular features, including the expression of different dopamine receptors ( Figure 3 A). 67triatonigral medium spiny neurons (MSNs) projecting from the DLS to the SNr express dopamine type 1 receptors (D1Rs) and define the so-called direct path.Striatopallidal MSNs projecting from DLS to the external segment of the globus pallidus (GPe) express D2Rs and form the basis of the so-called indirect path.
The striatum r ecei v es str ong glutamatergic input fr om the cortex and thalamus.Glutamatergic input to MSNs can change str ength thr ough the induction of long-term synaptic plasticity ( Figure 3 B).The pairing of high-frequency presynaptic firing with high-frequency postsynaptic firing in the presence of elevated dopamine can induce LTP at glutamatergic inputs onto D1R-expressing MSNs, but not D2R-expressing MSNs. 68 , 69ollowing LTP induction, enhanced efficacy of glutamatergic input increases the amplitude of excitatory postsynaptic potentials (EPSPs) in D1R-expressing MSNs, which can last for many hours.Such long-term synaptic plasticity likely contributes to learning.
The molecular signaling pathways engaged by LTP induction on D1R-expressing MSNs have been investigated in detail by Haruo Kasai and others. 70 , 71( Figure 3 C).At baseline, that is, before LTP induction, glutamatergic inputs from the cortex or thalamus would largely activate AMPA receptors located on the spines of the postsynaptic MSNs, giving rise to the electrical signals underlying the EPSP.The voltage-dependent Mg 2 + block of the NMDA receptors would prevent the calciumpermea b le NMDA r ece ptors fr om conducting curr ent, and, at baseline, there would ther efor e be little accompanying postsynaptic calcium signaling.During LTP induction, high-frequency presynaptic firing is paired with high-frequency postsynaptic firing.Postsynaptic depolarization releases the voltage-dependent Mg 2 + block of the NMDA receptors, allowing calcium to enter the dendritic spines of MSNs.Calcium rises are typically considered as the first step in the biochemical cascade underlying the postsynaptic forms of LTP.Elevated spine Ca 2 + concentrations acti v ate CaMKII, which in turn induces phosphorylation of m ultiple downstr eam effectors, culminating in spine enlargement and concomitant insertion of additional AMPA r ece ptors into the postsynaptic membr ane , thus giving rise to enhanced EPSPs.However, the activation of CaMKII is countered by a high rate of PP1 activity in MSNs.For D1R-expressing MSNs, increased dopamine concentration activates the D1Rs, which ar e coupled thr ough the GTP-binding pr otein G s to stim ulate Ca 2 + /calmodulin-de pendent adenyl yl cyclase 1, in turn increasing intracellular cAMP levels, activating protein kinase A (PKA), 72 inducing phosphorylation of cAMP-regulated phosphoprotein 32 kD (DARPP-32) and thereby turning PP1 off.A key impact of dopamine in D1R-expressing MSNs therefore seems to be in helping the acti v ation of CaMKII by turning off its inacti v ation by PP1.Dopamine acting via D1Rs ther efor e enhances the activation of CaMKII, leading to the induction of LTP at the activated synapses.Inter estingl y, Haruo Kasai and collea gues 70 found that the dopamine signals can arri v e up to 1 s after the pairing of presynaptic and postsynaptic activity and can still r etr ogradel y enhance LTP through inactivating PP1 to enhance CaMKII activity.This observation is important because rewards are typically deli v er ed after the correct stimulus-response sensorimotor neur onal acti vity.In order for the dopamine r ew ard signal to contribute to learning through the synaptic plasticity of sensorimotor circuits, it must therefore interact with traces of recent neur onal acti vity. 73 , 74This is often r eferr ed to as the "cr edit assignment pr ob lem" of identifying which synapses should be changed in order to learn and has led to the hypothesis of preferential synaptic plasticity of r ecentl y acti v e synapses highlighted by an "eligibility tr ace ," sharing some similarity to the synaptic tagging hypothesis. 75The 1-s window of r etr ograde enhancement of LTP of r ecentl y acti v ated synapses demonstrated in vitro in brain slices could help bridge the time between sensorimotor processing and reward feedback during the learning of simple stim ulus-r esponse-r ew ard associations.Altogether, it seems plausible that a delayed dopamine r ew ard signal might trigger plasticity at r ecentl y acti v ated synapses, which might have been inv olv ed in the sensorimotor activity that gave rise to the r ew ard, thus contributing to r einforcement learning.Similar observ ations hav e been made for other neur omodulator y signals, including the effects of nor e pine phrine and serotonin on the plasticity of cortical glutamatergic synapses 76 and dopamine affecting synaptic plasticity in the hippocampus, 77 with experiments showing that neur omodulator y a gonists can change the effect of synaptic plasticity induction protocols carried out seconds before the application of the neuromodulatory agonists.
Inter estingl y, transient decreases in dopamine signals have been r e ported, especiall y after r ew ard omissions and more generally in response to negative RPE. Unexpectedly bad outcomes are also important learning signals, and it is therefore also interesting to consider the effects of decreases in striatal dopamine concentration.Dopamine r ece ptor subtypes differ in their affinity for dopamine, with D2Rs having an affinity appr oximatel y two orders of magnitude higher than D1Rs.It is thus possib le that decr eases in dopamine might not be sensed by D1Rs because they may be less acti v ated under basal conditions, and further decreases in dopamine may be outside of the relev ant dose-r esponse range of r ece ptor modulation.On the other hand, it may be that D2Rs ar e normall y highl y occupied with dopamine even during basal conditions because of their higher affinity.A reduction in dopamine concentration might then lead to a decreased activation of D2Rs.Haruo Kasai and colleagues investigated how such dopamine decreases might affect D2Rrelated signaling and learning in mice. 71D2R activation stimulates G i/o subtypes of G proteins, which inhibit cAMP production and suppress PKA.Whereas increases in dopamine do not appear to dri v e r eductions in PKA acti vity, decr eases in dopamine do ev oke incr eases in PKA activity in D2R-expressing MSNs. 72Such dopamine dips appear to be important for mice to carry out a task in which they learn to discriminate between r ew ard-pr edicting and distracti v e auditor y tones. 71The a bsence of r ew ard in r esponse to the presentation of distractor tones resulted in a reduction in dopamine in the ventral striatum during discrimination learning.The dopamine dip enhanced LTP of glutamatergic inputs onto D2R-expressing MSNs via incr eased PKA acti vity, pr ovided concomitant NMDAR-mediated acti v ation of CaMKII and co-acti v ation of adenosine A2A r ece ptors. 71Reward omission causing transient reductions in striatal dopamine (perhaps, at least in part, mediated via LHb neurons) might ther efor e contribute to synaptic plasticity and learning thr ough D2R-expr essing MSNs.
Thr ee time points ar e schematicall y indicated: befor e, during, and after LTP induction.The upper part of the sc hematic dr awings sho ws a glutamater gic synaptic bouton filled with synaptic vesicles (gray).The lower part shows a dendritic spine (green) of a D1R-expressing MSN with AMPA (red) and NMDA (blue) subtypes of ionotropic glutamate r ece ptors in the postsynaptic density.In the baseline period (left), AP firing of the glutamatergic afferent causes the release of glutamate evoking a small EPSP in the postsynaptic MSN through the opening of AMPA r ece ptors.NMDA r ece ptors ar e b locked at r esting membrane potential by Mg 2 + .During LTP induction (middle), pr esynaptic glutamate r elease is pair ed with postsynaptic de polarization to open NMDA r ece ptors as well as AMPA r ece ptors.NMDA r ece ptor acti v ation allows Ca 2 + entry into the spine to acti v ate Ca 2 + /calmodulin-de pendent pr otein kinase II (CaMKII), an essential trigger for many forms of LTP.High activity of protein phosphatase 1 (PP1) would normally inactivate CaMKII under baseline conditions preventing LTP induction, but PP1 is inhibited by elevated cAMP signaling driven by dopamineacti v ated D1Rs.Thus, dopamine can gate the induction of LTP, resulting in an increased number of AMPA receptors in the postsynaptic density of D1R-expressing MSNs (right).Dopamine r ece ptors in MSNs r egulate v arious ionic conductances, including v olta ge-gated Na + , K + , and Ca 2 + channels, 78 , 79 with a recent study suggesting that D1R acti v ation incr eases excitability of striatonigral MSNs largely through voltage-and Ca 2 + -dependent K + channels. 80Dopamine receptors are also pr ominentl y expr essed on pr esynaptic terminals and other cell classes in the striatum, including the D2Rs on cholinergic interneurons 81 and astrocytic glial cells. 82Finally, dopamine r ece ptors ar e also found in other brain r egions, including fr ontal cortex, which also r ecei v es dopaminergic innerv ation fr om VTA neurons.The overall functional role of dopamine signals is ther efor e likel y to be complex.

Dopamine Signals May Contribute to Re w ar d-Based Learning
The hypothesis that dopamine signals might contribute to r ew ard learning thr ough modulating synaptic plasticity of specific neuronal circuits remains to be further tested in detail, but some experiments support the notion that striatal MSNs expressing D1Rs can contribute to dri ving goal-dir ected motor output and show enhanced fast sensory responses across learning, consistent with the dopamine hypothesis.4][85] In the whisker detection task, head-restrained thirsty mice learn to lick a spout in response to a brief single deflection of the C2 whisker in order to r ecei v e w ater r ew ard ( Figur e 4 A).Initiall y, mice ar e na ïve to the reward-predicting rule, and they lick with equal pr oba bility inde pendentl y of whisker deflection.Through trialand-err or learning acr oss dail y training sessions, mice r ecei v e r ew ard, pr esuma b l y at first by c hance , by lic king in the 1s r ew ar d windo w that follo ws the 1-ms magnetic impulse applied to a metal particle attached to the C2 whisker serving as the tactile stimulus.After sever al tr aining sessions, mice learn to lick r elia b l y in r esponse to whisker deflection, on each trial gathering a small droplet of water, accumulating rewards acr oss the corr ect trials until sated.The hit rate (pr oba bility of licking in response to a whisker deflection) therefore incr eases acr oss learning.Concomitantl y, mice also learn to withhold licking at other times, pr esuma b l y to r educe unr ewarded effort.Thus, the false alarm rate (pr oba bility of licking in the absence of a whisker stimulus) decreases across learning.
Membrane potential (V m ) recordings during the whisker detection task 84 , 85 wer e obtained fr om neur ons located in the region of the DLS, known to r ecei v e dir ect glutamatergic input from the primary whisker somatosensory cortex ( Figure 4 B). 84 , 86 , 87Neur ons wer e post hoc anatomicall y identified and colocalized with genetic markers to identify D1R-and D2Rexpressing MSNs.Dopamine type 1 r ece ptor-expr essing MSNs in the DLS str ongl y innerv ate the SNr, wher eas D2R-expr essing MSNs in the DLS str ongl y innerv ate the GPe ( Figure 4 C). 84Avera ged acr oss hit trials for different recordings in different mice, both D1R-and D2R-expressing MSNs in expert mice showed an ov erall incr eased de polarization in r esponse to whisker deflection compared to na ïve mice ( Figure 4 B). 85In part, this is likely dri v en by enhanced glutamatergic input to the striatum from the cortex and thalamus during movements, including licking.The co-acti v ation of D1R-and D2R-expr essing MSNs is in good ov erall a gr eement with other r ecent studies. 88 , 89Although more subtle, learning also appears to enhance a fast sensory response, specifically in D1R-expressing MSNs, occurring during a ∼20-50 ms period immediately after the whisker stimulus.One interesting hypothesis is that dopamine r ew ard plasticity mediated via D1R signaling could contribute to potentiating glutamatergic whisker sensory input from the cortex or thalamus, thus giving rise to the observed fast sensory response in D1R-MSNs.Similarl y, fr equency-specific potentiation of corticostriatal synaptic transmission linked to r ew ard-pr edicting tones as rats learned an auditory discrimination task has been reported. 90ptogenetic stimulation experiments were carried out to test for possible causal contributions of activity in D1R-and D2Rexpr essing neur ons during execution of the whisker detection task ( Figure 4 D). 84A Cre-dependent virus was injected into the DLS of g enetically eng ineered mice expressing Cre-recombinase in either D1R-or D2R-expressing MSNs.The mice were subsequently trained in the whisker detection task, and upon reaching high performance, trials with brief (50 ms) optogenetic stimuli wer e deli v er ed thr ough an optical fiber inserted into the DLS.The optogenetic stimulus trials were randomly interleaved with whisker stimulus trials and no-stimulus catch trials.Stimulation of D1R-expressing neurons evoked licking, but not the stimulation of D2R-expressing neurons.Brief activity in D1Rexpr essing neur ons ther efor e seems to be sufficient for task execution with the optogenetic stim ulus r eadil y substituting for the whisker stimulus.The fast sensor y-ev oked de polarization of D1R-expressing neurons found in expert mice ( Figure 4 B) could ther efor e causall y contribute to the learning and execution of the whisker detection task.][93][94] In order to define hypotheses for further experimental testing, it might be useful to consider how different neuronal pathways might contribute to the transformation of a sensory input into a goal-directed motor output learned through dopamine r ew ard signals ( Figure 4 E).Sensory information is signaled to the thalamus, which in turn innervates the cortex and striatum.Motor control is regulated by the neocortex, SNr, and other brain regions, which str ongl y innerv ate neur onal circuits in the brainstem and spinal cord, where motor neurons are located.Dopamine reward signals might serve to enhance sensor y-ev oked glutamatergic synaptic input to D1R-expressing MSNs through LTP.Enhanced D1R-expressing neuronal activity will release the inhibitory neurotransmitter GABA onto spontaneousl y acti v e neur ons in the SNr, thus r educing their firing r ate .The SNr neurons are also inhibitory, and thus suppression of their firing has a disinhibitory effect upon downstream targets, such as the thalamus and brainstem motor nuclei. 93 , 95 , 96he net effect is increased motor drive, for example, enhanced pr oba bility of initiating a lick in the whisker detection task.The underlying mechanisms of reward learning might thus include a dopamine-de pendent str engthening of feedforw ard synaptic neuronal circuits connecting a reward-predicting sensory stimulus with the execution of a motor command associated with r ew ard deli v er y.
Many open questions remain before one could claim to have an understanding of the neuronal circuitry underlying the learning and execution of any specific goal-directed behavior.Even for the r elati v el y simple whisker detection task discussed a bov e, in which thirsty mice learn to lick a water reward spout in order to obtain a r ew ard, many aspects remain unexplored.Many key causal tests of the specific hypothesis that dopamine-gated Whisker deflection evoked a larger depolarization in expert mice compared to na ïve mice for both D1R-expressing and D2R-expressing MSNs.However, a fast (20-50 ms after whisker stimulus) sensory response appeared to increase specifically in D1R-expressing MSNs across learning. 85(C) Sagittal sections through mouse brains counterstained with 4 ,6-diamidino-2-phenylindole, DAPI (green). 84An AAV was injected into the DLS in order to express fluorescent proteins to allow imaging of the cell bodies in the striatum and their axonal projections (magenta).Distinct classes of MSNs were defined by using tw o tr ansgenic mouse lines in which Cre-recombinase w as specificall y expr essed in either D1R-and D2R-expressing MSNs and injecting the DLS with a Cr e-de pendent AAV.Dopamine type 1 r ece ptor-expr essing MSNs str ongl y innerv ate the SNr, wher eas D2R-expr essing MSNs str ongl y innerv ate the GPe.(D) Channelrhodopsin-2 w as expr essed in either D1R-or D2R-expr essing MSNs in the DLS of different mice, which were subsequently trained in the whisker detection task. 84Once the mice were experts, whisker (orange) and catch (black) trials w ere r andomly interleaved with trials containing a brief blue light pulse (blue) delivered to the DLS.Optogenetic stimulation of D1R-expressing MSNs evoked licking, but not optogenetic stimulation of D2R-expressing MSNs.Appar entl y, brief acti v ation of D1R-expr essing MSNs is sufficient to substitute for the whisker stimulation in this behavior.(E) A schematic circuit diagram that could account for some aspects of the learning and execution of goal-directed motor output in response to a sensory stimulus, as exemplified above by the transformation of a whisker deflection into goal-directed licking in the whisker detection task.Sensory input dri v es thalamic and cortical neurons, which in turn signal to the striatum.If the sensory input is paired with r ew ard, then the sensor y-ev oked glutamatergic input from the thalamus and cortex will be accompanied by a dopaminergic r ew ard signal, str engthening the excitation of D1R-expressing MSNs through LTP during r ew ard-based learning.Enhanced sensor y-ev oked acti vity of D1R-expr essing MSNs will inhibit neurons in SNr, in turn disinhibiting thalamus and brainstem motor nuclei, thus contributing to movement initiation such as licking for r ew ard, causing further reinforcement of the sensorimotor transformation.Panel B is modified from, 85 published under a Cr eati v e Commons License.Panels C and D are modified from, 84 published under a Creative Commons License.
LTP of whisker sensory inputs to striatal D1R-expressing MSNs might underlie learning still need to be carried out.For example, direct manipulation of the dopamine signals has not yet been carried out during execution or learning of the whisker detection task, and neither have pharmacological manipulations targeting D1Rs or D2Rs.Furthermore, ideally, the same neurons and synaptic inputs would be studied longitudinall y acr oss learning to investigate in further detail the underlying mechanisms and sites of synaptic plasticity, as well as the patterns of neuronal activity that induce the plasticity.
Fr om a higher-lev el perspecti v e , w e also need to consider how water becomes rewarding when mice are thirsty, how this moti v ates mice to lick, and how r ecei ving a water reward (or a sensory cue predicting upcoming water reward, such as the whisker stimulus in the whisker detection task) generates a dopamine signal.Some aspects of how thirst is r e pr esented in the brain are beginning to be understood, but it remains difficult to assemble an inte gr ati v e view of how this impacts behavior.Blood osmolality is first sensed by neurons in the subfornical organ and the organ um v asculosum of the lamina terminalis, structures that lack the blood-brain barrier.Inter estingl y, optical stimulation of a genetically defined subset of neurons in the subfornical organ (expressing CaMKII, nitric oxide synthase, and ETV-1) immediately triggers drinking behavior, and these neurons are also activated by thirst. 97Such optogenetic stimulation of the subfornical organ has been shown to be negati v el y r einforcing and thus possib l y generating an av ersi v e state that moti v ates mice to find and consume w ater.Neur ons in the subfornical organ in turn project to various other brain regions, including hypothalamic regions such as the median preoptic nucleus, supraoptic nucleus, and the paraventricular hypothalamic nucleus, and indeed, optogenetic stimulation of neurons in the median preoptic nucleus also drives water-seeking behavior. 98 , 99Perhaps via hypothalamic neurons, the thirst state can change cortical sensor y pr ocessing, 100 and optogenetic manipulation of thirst neurons has been shown to give rise to highly distributed changes in neuronal activity patterns and sensorimotor processing during goal-directed behavior motivated by thirst, 101 but the causal mechanisms linking changes in di v erse classes of neurons in different brain regions remain to be determined.Inter estingl y, neur ons in the lateral hypothalam us appear to signal thirst and fluid balance states to dopamine neurons in the VTA contributing importantly to the learning of which foods and fluids are rehydrating. 102n conclusion, r emarka b le pr ogr ess has been made linking striatal dopamine signals to r ew ard learning, but m uch r emains to be learned.

Figure 2 .
Figure 2. Optogenetically identified dopamine neurons in the midbrain transiently increase firing in response to unexpected rewards.(A)In order to study their activity, e xtracellular electrophysiolo gical recor dings can be tar geted to the midbrain dopaminer gic neur ons (dark b lue) located in the substantia nigra pars compacta and the VTA, which r especti v el y pr ominentl y innerv ate the dorsal striatum and v entral striatum, also known as the nucleus accumbens (light blue).(B) The work of Wolfram Schultz and collea gues r ev ealed that deli v er y of an unexpected r ew ard transientl y incr eases AP firing in putati v e midbrain dopamine neurons of monkeys.6 (C) Optotagging can be used to record AP firing of genetically identified classes of neurons.For example, the light-gated ion channel ChR2 can be expressed specifically in dopaminergic neurons of the midbrain through mouse genetics and viral transfection.Blue light flashes specificall y dri v e AP firing in ChR2-expr essing neur ons, which can be recorded by an optrode, a device consisting of an optical fiber coupled to an extracellular recording electrode.Blue light delivery can evoke precisely timed AP firing in subsets of neur ons expr essing ChR2.The work of Naoshige Uchida and colleagues studied opto-tagged dopaminergic neurons and found that such genetically defined dopaminergic neurons transiently increased AP firing in response to rewards in mice,8 as shown in panel B and similar to the previous work in monkeys.6 (D) Substantial evidence supports the hypothesis that dopamine neurons do not only respond to unexpected rewards, but more precisely they encode RPEs.Animals can learn that specific sensory stimuli reliably predict future rewards.After learning, the reward-predicting sensory stimulus evokes a rapid transient increase in dopamine neuron AP firing, but there is no dopamine signal upon reward delivery because it is now entirely expected (top).However, if reward is omitted, then there is a drop in dopamine firing rates because the outcome was worse than expected (negati v e RPE) (middle).On the other hand, if the r ew ard-pr edicting sensor y stim ulus is omitted, then r ew ard deli v er y is unexpected and is a gain accompanied by incr eased dopamine neur on firing (below).

Figure 3 .
Figure 3. Dopamine modulates synaptic plasticity in the striatum.(A) The midbrain dopaminergic neurons prominently innervate the striatum, which is dominated by two types of GABAergic MSNs expressing different dopamine receptors and projecting to different downstream brain areas.Striatonigral MSNs express D1Rs (green) and project to the SNr.Striatopallidal MSNs express dopamine type 2 receptors (D2Rs, red) and project to the external segment of the globus pallidus.These MSNs also r ecei v e glutamatergic input from the cortex and thalamus, and it is thought that a major role of dopamine is to control the plasticity of these glutamatergic inputs to the MSNs.(B) The amplitude of excitatory postsynaptic potentials (EPSPs) onto D1R-expressing MSNs can be increased through long-term potentiation (LTP) induced by pairing presynaptic glutamate release and postsynaptic depolarization together with an increase in dopamine.(C) The mechanisms underlying LTP of glutamatergic synapses on the spines of D1R-expressing MSNs have been studied in detail in brain slice experiments by Haruo Kasai and collea gues. 70Thr ee time points ar e schematicall y indicated: befor e, during, and after LTP induction.The upper part of the sc hematic dr awings sho ws a glutamater gic synaptic bouton filled with synaptic vesicles (gray).The lower part shows a dendritic spine (green) of a D1R-expressing MSN with AMPA (red) and NMDA (blue) subtypes of ionotropic glutamate r ece ptors in the postsynaptic density.In the baseline period (left), AP firing of the glutamatergic afferent causes the release of glutamate evoking a small EPSP in the postsynaptic MSN through the opening of AMPA r ece ptors.NMDA r ece ptors ar e b locked at r esting membrane potential by Mg 2 + .During LTP induction (middle), pr esynaptic glutamate r elease is pair ed with postsynaptic de polarization to open NMDA r ece ptors as well as AMPA r ece ptors.NMDA r ece ptor acti v ation allows Ca 2 + entry into the spine to acti v ate Ca 2 + /calmodulin-de pendent pr otein kinase II (CaMKII), an essential trigger for many forms of LTP.High activity of protein phosphatase 1 (PP1) would normally inactivate CaMKII under baseline conditions preventing LTP induction, but PP1 is inhibited by elevated cAMP signaling driven by dopamineacti v ated D1Rs.Thus, dopamine can gate the induction of LTP, resulting in an increased number of AMPA receptors in the postsynaptic density of D1R-expressing MSNs (right).

Figure 4 .
Figure 4. Striatal MSNs expressing D1Rs can drive goal-directed motor output and show enhanced fast sensory responses across learning.(A) Head-restrained thirsty mice can learn to lick a spout for a water reward in response to a whisker deflection (orange), which serves as a sensory cue predicting reward availability for 1 s with licking as the necessary goal-directed motor output to trigger reward delivery in Hit trials.(B) Whole-cell membrane potential (V m ) recordings averaged across Hit trials for post hoc identified D1R-expressing and D2R-expressing MSNs in the DLS of expert (blue) or na ïve (green) mice performing the whisker detection task.Whisker deflection evoked a larger depolarization in expert mice compared to na ïve mice for both D1R-expressing and D2R-expressing MSNs.However, a fast (20-50 ms after whisker stimulus) sensory response appeared to increase specifically in D1R-expressing MSNs across learning.85(C) Sagittal sections through mouse brains counterstained with 4 ,6-diamidino-2-phenylindole, DAPI (green).84An AAV was injected into the DLS in order to express fluorescent proteins to allow imaging of the cell bodies in the striatum and their axonal projections (magenta).Distinct classes of MSNs were defined by using tw o tr ansgenic mouse lines in which Cre-recombinase w as specificall y expr essed in either D1R-and D2R-expressing MSNs and injecting the DLS with a Cr e-de pendent AAV.Dopamine type 1 r ece ptor-expr essing MSNs str ongl y innerv ate the SNr, wher eas D2R-expr essing MSNs str ongl y innerv ate the GPe.(D) Channelrhodopsin-2 w as expr essed in either D1R-or D2R-expr essing MSNs in the DLS of different mice, which were subsequently trained in the whisker detection task.84Once the mice were experts, whisker (orange) and catch (black) trials w ere r andomly interleaved with trials containing a brief blue light pulse (blue) delivered to the DLS.Optogenetic stimulation of D1R-expressing MSNs evoked licking, but not optogenetic stimulation of D2R-expressing MSNs.Appar entl y, brief acti v ation of D1R-expr essing MSNs is sufficient to substitute for the whisker stimulation in this behavior.(E) A schematic circuit diagram that could account for some aspects of the learning and execution of goal-directed motor output in response to a sensory stimulus, as exemplified above by the transformation of a whisker deflection into goal-directed licking in the whisker detection task.Sensory input dri v es thalamic and cortical neurons, which in turn signal to the striatum.If the sensory input is paired with r ew ard, then the sensor y-ev oked glutamatergic input from the thalamus and cortex will be accompanied by a dopaminergic r ew ard signal, str engthening the excitation of D1R-expressing MSNs through LTP during r ew ard-based learning.Enhanced sensor y-ev oked acti vity of D1R-expr essing MSNs will inhibit neurons in SNr, in turn disinhibiting thalamus and brainstem motor nuclei, thus contributing to movement initiation such as licking for r ew ard, causing further reinforcement of the sensorimotor transformation.Panel B is modified from,85 published under a Cr eati v e Commons License.Panels C and D are modified from,84 published under a Creative Commons License.