Cell Type-Specific Membrane Potential Changes in Dorsolateral Striatum Accompanying Reward-Based Sensorimotor Learning

Abstract The striatum integrates sensorimotor and motivational signals, likely playing a key role in reward-based learning of goal-directed behavior. However, cell type-specific mechanisms underlying reinforcement learning remain to be precisely determined. Here, we investigated changes in membrane potential dynamics of dorsolateral striatal neurons comparing naïve mice and expert mice trained to lick a reward spout in response to whisker deflection. We recorded from three distinct cell types: (i) direct pathway striatonigral neurons, which express type 1 dopamine receptors; (ii) indirect pathway striatopallidal neurons, which express type 2 dopamine receptors; and (iii) tonically active, putative cholinergic, striatal neurons. Task learning was accompanied by cell type-specific changes in the membrane potential dynamics evoked by the whisker deflection and licking in successfully-performed trials. Both striatonigral and striatopallidal types of striatal projection neurons showed enhanced task-related depolarization across learning. Striatonigral neurons showed a prominent increase in a short latency sensory-evoked depolarization in expert compared to naïve mice. In contrast, the putative cholinergic striatal neurons developed a hyperpolarizing response across learning, driving a pause in their firing. Our results reveal cell type-specific changes in striatal membrane potential dynamics across the learning of a simple goal-directed sensorimotor transformation, helpful for furthering the understanding of the various potential roles of different basal ganglia circuits.


Introduction
The changes in neural circuits underlying reward-based sensorimotor learning remain incompletely understood. The dorsolateral striatum (DLS) is thought to be critically involved, as it receives sensorimotor inputs from thalamus and cortex, and sends its outputs to downstream basal ganglia nuclei important for motor control. [1][2][3][4][5] The DLS is also a major target of dopaminergic innervation, which might serve important functions for reward-based learning, [6][7][8][9] including through the differential regulation of synaptic plasticity in specific types of striatal neurons. 2,3,[10][11][12][13][14][15] Although striatal synaptic plasticity could therefore underlie important aspects of sensorimotor learning, 12,[16][17][18] this hypothesis has not yet been tested by measuring cell typespecific changes in membrane potential (V m ) dynamics across reward-based learning.
In a previous study, we found that V m dynamics of striatal projections neurons (SPNs) in whisker-related DLS were strongly modulated during performance of a task in which mice were trained to lick a water-reward spout in response to a whisker deflection. 19 Here, in a new set of recordings, we compare V m responses from naïve and expert mice, before and after task learning. We find increased task-related depolarizing responses in anatomically and genetically-identified types of SPNs across learning. We also recorded tonically active, putative cholinergic, interneurons (TANs), which form a small distinct population of cells with large somata in the striatal microcircuitry. [20][21][22][23] TANs developed a hyperpolarizing response and a pause in action potential firing with task learning, in agreement with previous studies. [24][25][26] Our data are consistent with the hypothesis that prominent cell-type-specific changes in striatal activity might accompany reward-based sensorimotor learning.

Mice
All experiments were carried out with male and female mice in accordance with the Swiss Federal Veterinary Office (authorization VD1628.6). Mice were 4-7 weeks old at the time of implantation and 6-12 weeks old at the time of recordings. Adora2a-Cre mice (MMRRC: 036158-UCD) and Drd1a-Cre mice (MMRRC: 030778-UCD) were crossed with Lox-Stop-Lox-tdTomato mice (JAX: 007909). Drd1a-tdTomato (JAX: 016204) were crossed with Drd2-Green Fluorescent Protein (GFP) (MGI: 3843608). The mice were implanted with a light-weight metal head-post and a recording chamber under ketamine/xylazine anesthesia. The position of the DLS was stereotaxically marked at the surface of the skull (0 mm anterior and 2.8-3.0 mm lateral of bregma). After the surgery, the animals were returned to their home cage for 5-7 days of recovery. The mice were housed in groups of 2-4 mice with a reverse light/dark cycle (light 7 pm to 7 am), at a temperature of 22 ± 2 • C with food available ad libitum.

Behavior
One week after implantation, all whiskers were trimmed except the C2 whisker on either side. After 1-2 days of adaption to headrestraint, mice were water restricted to 1 ml of water/day and the mice were trained in a whisker detection task, as previously described. 19 Training started with 2 days of free-licking during which mice were habituated to trigger the reward delivery by licking the water spout, but no whisker stimulus was delivered. Following the two free-licking sessions, mice were trained in the sensory detection task. At the beginning of each training session, a small (2 mg) metal particle was attached to the right C2 whisker allowing the whisker to be vertically deflected by a 1 ms current pulse passed through an electromagnetic coil placed immediately beneath the head of the mouse ( Figure 1A and B). Ambient white noise (80 dB) was played at all times to mask any potential auditory cues arising from whisker stimulation. Mice were trained to associate the C2 whisker deflection with the availability of water at the reward spout. If the mouse licked the spout within the reward window (1 s), it was considered a hit trial, and the mouse received a drop of water. If not, it was considered a miss trial and no reward was delivered. Whisker stimuli were delivered without any preceding cues at random time intervals, with intertrial interval ranging between 4 and 12 seconds. To discourage spontaneous licking, a 4-s nolick period was imposed during which no lick should occur in order to start a trial. Trials with whisker stimuli were randomly interleaved with catch trials in which no stimulus was given. If licks occurred during the response window of a catch trial, it was considered a false alarm, if not it was considered a correct rejection. After a few training days (8-10 days), mice were able to achieve a stable level of performance, with a high hit rate and a low false alarm rate, and were then considered as expert. Electrophysiological recordings were performed either during the first training session of the sensory detection task (naïve-i.e., the first time the mice were exposed to the whisker stimulus) or once the mice had reached good performance (expert). Naïve Depiction of the whisker-based sensory detection task: mice learned to associate a brief (1 ms) downward deflection of their right C2 whisker with the availability of a water reward. (B) Whole-cell (Vm) recordings were performed in the DLS of head-restrained mice during the first training session (day 1) of this task (naïve, green) or in mice that had been trained for 7 or more days (expert, blue). (C) Trials were classified as hit if the mouse licked within the 1 s response window that followed whisker stimulus (grey area), as miss if the mouse did not lick, as false alarm if it licked when no whisker stimulus was presented (catch trials) and as correct rejection if it did not lick on catch trials. Stimulus and catch trials were randomly interleaved and separated by a randomized 4-12 s inter-trial interval. In addition, the mouse was required to not lick in the 4 s before a trial was initiated to prevent compulsive licking. (D) The probability of licking in the response window of naïve (n = 26 mice, green) and expert (n = 20 mice, blue) mice during the Vm recordings in trials with a whisker stimulus (hit rate, left) or catch trials without a whisker stimulus (false alarm rate, right) (Wilcoxon-Mann-Whitney test). Open circles indicate individual cells, closed circles with error bars indicate mean ± standard error of mean (SEM). (E) The discriminability (d') of trials with and without whisker stimuli compared between naïve and expert mice (Wilcoxon-Mann-Whitney test). Open circles indicate individual cells, closed circles with error bars indicate mean ± SEM. (F) Response time of naïve and expert mice (Wilcoxon-Mann-Whitney test). Open circles indicate individual cells, closed circles with error bars indicate mean ± SEM. and expert mice were non-overlapping in order to facilitate the anatomical identification of recorded neurons.

Electrophysiology
Whole-cell patch-clamp recording electrodes (6-8 M ) were filled with an intracellular solution containing (in mM) 135 potassium gluconate, 4 Potassium chloride, 10 HEPES, 10 sodium phosphocreatine, 4 MgATP, and 0.3 Na 3 GTP (adjusted to pH 7.3 with KOH), to which 3-5mg/ml biocytin was added. V m was recorded using a Multiclamp 700B amplifier without injection of holding current and was not corrected for liquid junction potentials. On the day of recording, a small (less than 0.5 mm diameter) craniotomy was made under isoflurane anesthesia over the DLS. Mice were allowed to recover from anesthesia for 2-4 hours. Then, whole-cell patch-clamp recordings were obtained as previously described. 19,27,28 At the start of each recording, a series of increasing current steps was injected into each neuron. We proceeded with the recording if the neuron displayed both a stable resting V m and overshooting action potentials. At the end of the recording session, mice were transcardially perfused with Phosphate buffered saline and paraformaldehyde (4%) solutions and the brain was removed for anatomy. Using a vibratome, 100 μm-thick coronal brain sections were cut, and stained with streptavidin coupled to Alexa 647 (1:2000, Invitrogen) to reveal biocytin filling of postsynaptic neurons. Confocal imaging was used to evaluate co-localization of the biocytin-labelled soma of the recorded neuron with the fluorescent protein indicating the genetically-defined cell class. Low magnification fluorescence imaging was used to image the neuron in the context of the entire brain slice. These images were then loaded into Fiji software and the coordinates of labelled cells calculated using builtin software tools. The anterior posterior coordinate was estimated by matching the anatomical markers in the bright field image of the slice with a mouse brain atlas. 29

Quantification and Statistical Analysis
All data analysis was performed in MATLAB using custom written algorithms. To assess the whisker stimulus-triggered response, V m changes were evaluated relative to a baseline V m averaged over the 100ms before the whisker stimulus. To obtain the lick-triggered average, V m traces were aligned to the time of the first tongue-spout contact in a bout of licking with a baseline period of 500-200 ms before the first lick time. All values are presented as mean ± SEM. Non-parametric statistical tests were used to assess significant differences. The Wilcoxon-Mann-Whitney 2-sample rank test was used for unpaired samples (naïve vs expert). The Wilcoxon signed rank test was used for paired samples (hit vs miss trials, with a minimum number of 2 trials of each type for inclusion of neurons in this analysis). Bonferroni correction was applied for comparison between the 3 cell types.

Intrinsic Properties of Striatonigral, Striatopallidal, and Putative Cholinergic Striatal Neurons
We targeted whole-cell recordings 30 to regions of the DLS known to receive input from primary whisker-related somatosensory cortex, 19,27,[31][32][33][34] and compared V m across naïve and expert mice during task performance. The DLS is composed of different types of neurons, and, in this study, we differentiated between dopamine receptor type 1-expressing direct pathway striatonigral neurons (dSPNs), dopamine receptor type 2-expressing indirect pathway striatopallidal neurons (iSPNs) and TANs (putative cholinergic interneurons). The patch recording pipette contained biocytin, allowing for fluorescent post hoc staining and co-localization with tdTomato or GFP in mice engineered to specifically express these proteins in dSPNs and iSPNs 35,36 ( Figure 2A). For SPNs, we only included anatomically identified neurons which were co-labelled with fluorescent proteins to positively characterize striatonigral and striatopallidal projection neurons (Supplementary Table 1). TANs had aspiny dendrites compared with SPNs ( Figure 2B), and were readily identified during recording because of their distinct electrophysiological properties. 23,27,37 The somata of the recorded neurons across naïve and expert mice were located in a similar region of the DLS ( Figure 2C and Supplementary Figure 1), previously revealed to be innervated by whisker-sensory cortex . 19 Relative to SPNs, TANs had a more depolarized baseline V m (baseline V m dSPN −74.2 ± 1.5 mV, n = 29 cells; baseline V m iSPN −71.0 ± 2.2 mV, n = 20 cells; baseline V m TAN −45.4 ± 2.1 mV, n = 10 cells; Wilcoxon-Mann-Whitney test with Bonferroni correction dSPN vs iSPN P = 1.9, dSPN vs TAN P = 1.0 × 10 -5 , and iSPN vs TAN P = 1.7 × 10 -4 ) ( Figure 2D). Baseline V m did not differ comparing naïve and expert mice for dSPNs, iSPNs, or TANs (Supplementary Figure 2).
TANs were more easily excited by injection of depolarizing current compared to SPNs ( Figure 2F and G), as reported previously. 38 Rheobase (minimal depolarizing current needed to evoke action potential firing) did not differ across learning for SPNs (Supplementary Figure 2). We did not compare rheobase across learning for TANs, since these neurons were spontaneously active in both naïve and expert mice.

Cell Type-Specific V m Dynamics Across Task Learning
Analysing hit trials of the whisker detection task, we found that both dSPNs and iSPNs had an enhanced whisker stimulusevoked depolarization in expert mice compared to naïve mice ( Figure 3A-D). The slope of the early sensory-evoked depolarization was significantly larger in expert mice compared to naïve mice for dSPNs (expert dSPNs slope 0.19 ± 0.05 V.s -1 , n = 12 cells; naïve dSPNs slope 0.08 ± 0.02 V.s -1 , n = 17 cells; Wilcoxon-Mann-Whitney naïve vs expert, P = 0.014) ( Figure 3B, far left). The amplitude of the early depolarisation quantified 20-50 ms after the whisker stimulus was also significantly larger in expert mice compared to naïve mice for dSPNs ( V m early, expert dSPNs V m 3.2 ± 0.7 mV, n = 12 cells; naïve dSPNs V m 1.4 ± 0.3 mV, n = 17 cells; Wilcoxon-Mann-Whitney naïve vs expert, P = 0.0084) ( Figure 3B, center left). The negative response slopes and amplitudes in some recordings might result from inhibitory synaptic input or spontaneous noisy V m fluctuations. Consistent with a previous study (Sippy et al., 2015), the early response in dSPNs of expert mice was significantly enhanced in hit trials compared to miss trials (Supplementary Figure 3) and the slope of the early response of expert mice was significantly faster in dSPNs than iSPNs (Wilcoxon-Mann-Whitney dSPN vs iSPN, P = 0.018).
After this initial sensory response there was a significantly enhanced depolarization in hit trials for expert mice compared to naïve mice in both dSPNs and iSPNs ( V m mid, quantified 50-250 ms after the whisker stimulus: expert dSPNs V m 4.4 ± 0.5 mV, n = 12 cells; naïve dSPNs V m 2.0 ± 0.3 mV, n = 17 cells; Wilcoxon-Mann-Whitney naïve vs expert, P = 7.1 × 10 -4 ; expert iSPNs V m 4.5 ± 0.6 mV, n = 13 cells; naïve iSPNs V m 1.8 ± 0.4 mV, n = 7 cells; Wilcoxon-Mann-Whitney naïve vs expert, P = 0.0089) ( Figure 3B and D, center right). The shorter reaction time after learning likely contributes to this enhanced secondary excitation of SPNs in expert mice, since the first lick in a bout of spontaneous licking is accompanied by a similar depolarization of dSPNs and iSPNs in both naïve and expert mice (Supplementary Figure 4). At longer post-stimulus times, there were no significant differences in the depolarization of dSPNs or iSPNs comparing naïve and expert mice ( Figure 3B and D, far right). During this time period in hit trials, both groups of mice were licking to receive water, and the sustained depolarization might at least in part reflect this motor activity (Supplementary Figure 4).
The V m dynamics of TANs were very different from the SPNs ( Figure 3E-F). The grand average of the hit trial V m responses in TANs revealed a pronounced, significantly larger hyperpolarization in expert mice compared to naïve mice (quantified across 100-400 ms after the whisker stimulation: expert TANs V m −5.8 ± 2.3 mV, n = 5 cells; naïve TANs V m 2.1 ± 1.5 mV, n = 5 cells; Wilcoxon-Mann-Whitney naïve versus expert, P =  Schematic summary of the learning-related changes in dSPNs. The early sensory response evoked by whisker stimulation increases in dSPNs across learning. The later component of the response likely relates to motor and premotor inputs to dSPNs, which occur earlier in expert mice, since they lick with shorter latency. These two changes could account for the overall change in Vm dynamics of dSPNs across learning shown in panels A and B. (H) Whisker deflection will drive neurons in cortex and thalamus to release glutamate onto neurons in the DLS. During task learning and execution, the mouse receives a reward in hit trials upon licking after the whisker stimulus, which likely causes a transient increase in dopamine concentration. Increased dopamine could contribute to promoting long-term potentiation of synaptic input onto the D1R-expressing dSPNs. Enhanced sensory-evoked glutamatergic responses in dSPNs from presynaptic thalamic or cortical neurons could increase the probability of licking through inhibition of neurons in substantia nigra pars reticulata, which contains tonically active inhibitory neurons. This might result in disinhibition of motor thalamus and brainstem motor nuclei, thus contributing to driving licking as a motor response to whisker deflection after reward-based learning. See also Supplementary Figures 3 and 4. 0.016). The hyperpolarization of the V m was accompanied by a significant decrease in firing rate following whisker stimulation for expert mice compared to naïve mice (quantified across 100-400 ms after the whisker stimulation: expert change in firing rate −5.8 ± 1.8 Hz, n = 5 cells; naïve change in firing rate 4.4 ± 1.8 Hz, n = 5 cells; Wilcoxon-Mann-Whitney naïve versus expert P = 0.03). The hyperpolarization observed in TANs in expert mice appeared to occur after a delay, and the early response amplitude (quantified 20-50 ms after the whisker stimulus) did not differ significantly comparing naïve and expert mice ( Figure 3F).

Discussion
Here, using the whole-cell recording technique, we examined V m dynamics of 3 cell types in the striatum before and after learning a simple goal-directed sensorimotor transformation, finding 2 important changes: (i) SPNs in expert mice showed an enhanced depolarization compared to naïve mice; and (ii) TANs developed a hyperpolarizing response in expert mice.

Enhanced Depolarization of SPNs in Expert Mice
We found that whisker deflection evoked depolarizing responses which were transiently larger in expert compared to naïve mice for both dSPNs and iSPNs in hit trials ( Figure 3). 39 Increased synchronous excitatory synaptic input across sensorimotor learning could drive the increased depolarization in expert mice. The reduced reaction time for licking in expert mice is likely to contribute ( Figure 3G), since licking was associated with depolarization of SPNs (Supplementary Figure 4). Licking, planning to lick, and other movements, typically correlate with increased action potential firing of some cortical and thalamic neurons, [40][41][42][43][44] part of which are likely corticostriatal and thalamostriatal neurons thus potentially directly driving depolarization of SPNs through increased release of glutamate (quantified in mid and late periods 50-250 ms and 500-1000 ms after whisker stimulation, respectively).
The very earliest depolarization (quantified 20-50 ms after whicker stimulation) occurs before movement initiation, and therefore represents the processing of sensory input and the decision to initiate licking. In this early time window after whisker deflection, we found a significant enhancement of the fast sensory-evoked depolarization of dSPNs across learning ( Figure 3A, B, and G). Mechanistically, an increased excitation of dSPNs across learning could contribute to the previouslyreported larger early depolarization in dSPNs compared to iSPNs in expert mice carrying out a similar whisker detection task. 19 The baseline V m and excitability did not change across learning (Supplementary Figure 2), suggesting that this change could result from increased glutamatergic synaptic input onto dSPNs. Dopamine activation of D1Rs has been shown to enhance long-term synaptic potentiation of glutamatergic synapses. 13,15 Reward-related increases in DLS dopamine levels during hit trials [6][7][8]45 might thus contribute to enhancing fast whisker deflection-evoked glutamatergic input on D1R-expressing dSPNs during learning ( Figure 3G). Increased responses in dSPNs across learning could contribute to task execution by enhancing the inhibition of postsynaptic neurons in substantia nigra pars compacta ( Figure 3H), consistent with the result that brief optogenetic excitation of dSPNs is sufficient to evoke licking in trained mice 19 . Our data therefore support a potential role for dopamine acting on D1R-expressing dSPNs contributing to reward-based learning, but further measurements and manipulations considering the many neuronal circuits and neuromodulatory systems of the basal ganglia are necessary to obtain a more complete understanding.

Enhanced Hyperpolarization of TANs Across Learning
Cholinergic interneurons (TANs), despite being a small minority of local neurons in the DLS, are thought to play important roles in controlling the activity of SPNs 46 and behavior. 47,48 TANs have been shown to exhibit a pause in their firing in response to stimuli that trigger a learned and rewarded motor output. [24][25][26] Here, we find a similar activity pattern following the learning of a whisker-dependent goal-directed sensorimotor transformation. Underlying this pause in firing is a hyperpolarization of the V m of TANs ( Figure 3). Synaptically, the hyperpolarization might result from local inhibitory circuits within the striatum. [49][50][51][52][53][54] Long-range inhibitory input to TANs might also contribute, for example, midbrain dopaminergic and GABAergic neurons in substantia nigra and ventral tegmental area have been reported to innervate striatal TANs causing hyperpolarisation. 50,[55][56][57][58] Future experiments should therefore investigate the possible contributions of diverse synaptic circuits comprising different presynaptic neurons innervating TANs.
The pause in TAN firing presumably leads to a transient reduction in acetylcholine, which might have many diverse effects upon striatal neurons and synaptic transmission in the striatum, 59,60 including through nicotinic receptors, 61,62 presynaptic muscarinic receptors affecting neurotransmitter release, [63][64][65][66][67] and postsynaptic muscarinic receptors affecting various ionic conductances. [68][69][70] Future pharmacological and optogenetic experiments are needed to explore the functional impact of the learning-associated pause in firing of cholinergic striatal interneurons.

Future Perspectives
Our data begin to characterize cell type-specific changes in DLS accompanying reward-based learning of a simple goal-directed sensory-to-motor transformation. In future experiments, it will be important to examine changes in synaptic transmission ideally through longitudinal recordings of specific pathways from various cortical and subcortical brain regions onto the distinct cell types of the whisker-related DLS and to assess contributions from diverse neuromodulatory signals in order to gain a more mechanistic understanding of how DLS might contribute to the reward-based learning of this whisker-dependent detection task.

Data and Software Availability
The complete dataset and Matlab analysis code is freely available at the Open Access CERN database Zenodo with doi: 10.5281/zenodo.5497566 and hyperlink: https://doi.org/10.5281/ zenodo.5497566.

Supplementary Material
Supplementary material is available at the APS Function online.

Funding
This work was funded by grants from the Swiss National Science Foundation (31003A 182010) to C.C.H.P., theEuropean Research