Activity-is better than connectivity-neurofeedback training in Huntington’s disease

Neurofeedback training (NFT) could support cognitive symptom management in neurodegenerative diseases such as Huntington’s disease (HD) by targeting brain regions whose function is disrupted by the disease. Identifying the most appropriate target for NFT is not straightforward. The aim of our study was to test whether HD patients can learn to regulate their brain activity using NFT and to compare two different NFT targets, activity NFT using as target the activity from the Supplementary Motor Area (SMA) and connectivity NFT using the correlation between SMA and left striatum signal. To evaluate each approach we measured learning by testing for an increase in NFT target levels across training visits, and near transfer by examining upregulation of the target levels in the absence of feedback after training. The activity NFT treatment group was the only group that showed both successful learning and near transfer, suggesting that it’s a more promising approach in HD than connectivity NFT.


Introduction
Neurofeedback training (NFT) is a non-invasive intervention used to train participants in a closedloop design to regulate their own brain activity 1 . The underlying principle is that by regulating different aspects of their brain activity, e.g. regional activation or inter-regional connectivity, participants would implicitly regulate associated cognitive function. Because NFT can be delivered non-invasively, it can be used in clinical populations either preventatively or as an adjunct treatment to other potential disease-modifying therapies. However, there are several challenges in designing NFT trials, including the choice of an appropriate NFT target for the specified clinical population.
Huntington's disease (HD) is a genetic neurodegenerative condition characterised by progressive motor, psychiatric and cognitive impairment, as well as early striatal atrophy, cortical and corticostriatal connectivity loss [2][3][4][5] . Striatal activity and cortico-striatal connectivity would therefore be obvious targets for NFT. However, because of the atrophy present in the striatum in HD 5,6 , as well as the increase in iron deposits 7 , BOLD fMRI signal from the striatum may be difficult to measure reliably in real-time in HD patients. Therefore signal from cortical regions that connect to the striatum, and can be measured reliably might be more appropriate. Previous studies have shown that NFT induced changes are not just localised to the target region, but extend to a wider network of regions [8][9][10] , suggesting that a proxy region might also be appropriate.
In a recent proof-of-concept study we used the supplementary motor area (SMA) as a target for realtime fMRI NFT in HD patients 11 . We selected BOLD fMRI signal from the SMA because it can be reliably measured in real-time 12,13 , and its function and connectivity to the striatum is disrupted by HD 14 . We showed that HD patients can be trained to increase the level of SMA activity and that improvement in cognitive and psychomotor behaviour after training related to increases in activity of the left Putamen and SMA -left Putamen connectivity during training. This suggested that SMAstriatum connectivity could be a more appropriate NFT target than SMA activity in HD.
The aim of the current study was therefore to compare the two NFT approaches in order to determine which one is better, and to collect further evidence on the feasibility of the method in HD.
For this purpose we used BOLD fMRI signal change from the SMA as the target for activity NFT and correlation between signal from the SMA and left striatum during upregulation as the target for connectivity NFT 15,16 . Connectivity NFT feedback was presented intermittently at the end of the upregulation block, whereas during activity NFT feedback was provided continuously during upregulation, similar to our earlier work. Because of these differences, we would not be directly comparing the two groups, but rather we will examine changes within each group separately, and also compared them to matched control groups that received non-contingent, sham neurofeedback.
Participants that were randomized to the control group were yoked to a participant in the treatment group, and received feedback based on the activity of their yoked participant from the treatment group, rather than their own. This ensured blinding of all participants and controlled for experimental exposure and motivation. Comparisons between the treatment and control groups would enable us to estimate effect sizes (ES) that would then be used to inform future larger trials testing for efficacy of NFT in HD.

Results
To identify which approach is better we examined learning and transfer success in the two different NFT types. We tested two main effects: 1) training effect 17 , defined as a linear increase in the target NFT measures from the baseline to the last training session, and 2) transfer effect 17 , defined as an increase in the follow-up (after training) sessions compared to baseline in both imaging and behavioural measures. Because the two NFT approaches use a different feedback measure, contrast estimates in the case of activity NFT and correlation coefficients in the case of connectivity NFT, we could not compare the two types of NFT directly. Instead we performed within group analyses and compared each treatment group with its matched control group and reported effect sizes for each comparison.

Baseline Levels of the NFT Target Measure
The baseline session included a motor imagery task used to calculate the levels of the NFT target measure prior to NFT (see Methods section for details on experimental design). For the activity NFT group this was the BOLD fMRI signal from the SMA during motor imagery minus the baseline SMA activity, whereas for the connectivity NFT group it was the time-series correlation between the SMA and the left striatum during upregulation (SMA-striatum connectivity). We first examined whether there were any differences between the groups at baseline. A one-way ANOVA with group as the main effect (treatment vs control) showed that for the activity NFT group the main effect of group was not significant (F(1, 14) = 0.174, p = 0.683, estimate(SE) = -0.089(0.214)). However, the intercept was (t(14) = 5.916, p < 0.001, estimate(SE) = 0.894(0.151)), suggesting that participants were able to reliably increase SMA activity prior to NFT. For the connectivity NFT group neither the main effect of group was significant (F(1, 14) = 1.171, p = 0.298, estimate(SE) = 0.054(0.050)), nor the intercept (t(14) = 1.793, p = 0.095, estimate(SE) = 0.064(0.035)). This suggested that SMA-striatum connectivity was not reliably engaged prior to NFT.

Learning Effects: Linear increase across sessions
To examine learning across the two different types of NFT we first tested for a linear increase in the target NFT measure within each group using linear mixed models. The dependent variables were NFT target ROI contrast estimates in the case of the activity NFT group and correlation coefficients in the case of the connectivity NFT group. Figure 1A and 1B show the location of the selected ROIs.
Session was included as a repeated fixed effects factor and modelled as a continuous variable with values increasing linearly from the baseline to the last training session (i.e. 1 to 5). The group mean per session is shown with thick continuous lines. Shown in red and green hues are the treatment groups, whereas shown in black and blue hues are the control groups for the activity (C) and connectivity (D) NFT groups respectively. The scaling of the plots is different between plots C and D and therefore they are not visually comparable. (E) and (F) Dot plots show the NFT target levels across visits for activity (E) and connectivity (F) NFT respectively. Squares represent the treatment groups, whereas circles represent the control groups (red and green for the treatment groups; black and blue for the control groups). The small squares and circles show the individual data points, whereas the larger squares and circles show the adjusted mean group effects. Error bars are 95% CI.
To test for differences between treatment and control groups within activity NFT or connectivity NFT we ran the same model as above adding group as a fixed effect factor. The group by session interaction in the model shows if there were any differences in the learning slopes between the treatment and control groups. For activity NFT there was a significant main effect of session (F(1,  Figure 1F).

Near Transfer: Upregulation without feedback
To measure whether participants were able to increase the NFT target levels volitionally after training and without feedback (known as near transfer), we examined change from baseline in the follow-up sessions. There were three follow-up sessions, within 2 weeks of training, between 4-6 weeks and between 8-10 weeks (see Methods for more details). At the follow-up sessions participants performed the same task as at baseline, but were instructed to use the mental strategy that they believed was the most effective in upregulating the NFT target during the training sessions.
To examine within-group changes we used linear regression with dependent variable the change in the target NFT measure from the baseline to each of the follow-up visits and adjusting for the baseline level of the target NFT. Adjusting for baseline levels is used to increase sensitivity in prepost treatment comparisons 18 . In these analyses the intercept was the effect of interest and represented adjusted change at follow-up from baseline.
To test between-group differences we used an ANCOVA to compare change from baseline to the first follow-up in the treatment and control groups, controlling for baseline NFT target levels.
Because we used the change scores in the ANCOVA, the group main effect was the effect of interest and equivalent to a group by session interaction looking at the difference between groups in the change from baseline. There were no significant main effects of group for either the activity NFT

Far Transfer: Cognitive and psychomotor performance
To assess the effects that NFT had on the participants' performance in tasks unrelated to the training (far transfer), we compared performance in a number of cognitive and psychomotor tasks selected a priori because they have been previously shown to be sensitive to HD progression 5,11 before and after training. They were then standardized and summed to create a behavioural composite score (see Methods section for details). The change in the individual scores is shown in Supplementary   Figure 1 for completion. Behavioural performance was measured twice prior to the start of NFT and the second session was used as baseline to account for practice effects. It was also measured twice after the end of NFT, within two weeks and between 8-10 weeks after the last training session.
Although the study was not powered to detect changes in behaviour, estimating the effect size would be useful when planning future research studies that would be powered on behavioural change, therefore it was important to include behavioural measurements.
None of the groups showed significant increase in the composite score in either of the follow-up sessions compared to baseline (all p > 0.19; see Supplementary Table 3). Overall, in the activity NFT To compare change in performance across groups we used ANCOVAs with fixed effects of group (treatment vs control) and adjusting for baseline performance. As previously we used change from baseline as our dependent measures, therefore the main effect of group was equivalent to a group by session interaction. For activity NFT there were no significant group differences for either the first

Power Calculations
Based on the results presented so far only the activity NFT treatment group showed a significant increase in NFT target levels during training and successful upregulation of NFT target levels in the absence of training (near transfer). Although the connectivity NFT treatment group showed upregulation of the NFT levels in the absence of training, there was no significant linear increase in the NFT levels across the training visits. In addition, although none of the groups showed a statistically significant within group change during the first follow-up visit in the behavioural composite score compared to baseline or a significant group by session interaction, only the activity NFT group had a positive estimate of change in both within-and between-group comparisons. Our results therefore suggest that the activity NFT treatment group is a more plausible NFT type for HD and more appropriate as a target in future trials.
A future RCT using NFT designed to show efficacy would need to be powered on the basis of far transfer effects on a behavioural endpoint. To calculate the effect size (Cohen's d) for a power calculation we ran a two-sample t-test assuming equal variances to compare baseline to first followup change in the behavioural composite score between the activity NFT treatment vs control groups (t(14) = 1.42, p = 0.178, effect size = 0.71). To detect an effect size of 0.71 in this context with a type I error rate of 5% and 80% statistical power, we would need 33 subjects per group (assuming no follow-up loss).

Discussion
The aim of the present study was to compare two different NFT methods, activity and connectivity NFT, and establish which one is preferable to use in HD. We replicated our previous findings that HD patients can learn to increase their target NFT levels during training using SMA activity as NFT target and further showed that activity NFT is more promising than connectivity NFT. The activity NFT treatment group was the only group that showed successful learning and near transfer, as well as more promise in terms of behavioural change after NFT. Although the connectivity NFT treatment group showed successful near transfer, it did not show successful learning or far transfer; if anything the results from the behavioural performance were on the opposite direction, suggesting that it could be an unfavourable approach. The results from our study, combined with the fact that SMA activity NFT is much simpler to administer and setup, led us to conclude that SMA activity NFT is more preferable than SMA-striatum connectivity NFT, in the case of our patient population.
A fundamental difference between the two NFT approaches is the type of signal provided as feedback during training. In the case of activity NFT we presented participants in the treatment group with percent change in SMA activity during upregulation vs baseline. In the case of connectivity NFT we presented participants in the treatment group with the correlation coefficients between signals recorded from the SMA and from the left striatum during upregulation. In the first case, percent signal change can be computed and presented as feedback to participants continuously in near real-time, which means that there is greater perceived contingency between a participant's mental actions and the feedback they receive. In the second case, correlations are computed over a number of time-points, in our case 30s, and the feedback is presented intermittently at the end of the upregulation block, which results in lower contingency. In our case the two elements, frequency of feedback presentation and NFT type, are intertwined and it is therefore not possible to identify whether the differences observed are driven by the lack of contingency or the type of NFT target.
A previous study comparing continuous vs intermittent feedback using percent signal change in the amygdala in healthy young adults showed that participants were able to learn to increase the target NFT levels using both approaches, although intermittent feedback was more effective than continuous in that study 17 . These results suggest that the differences observed in our study could be driven by the type of feedback presented and not the frequency of presentation. However, both activity 17,19 and connectivity 15,16,20 NFT have been used successfully in other studies, suggesting that both methods could be appropriate. A possible explanation for our results could be that the quality of the signal recorded from the striatum in real-time was not reliable. Atrophy in the striatum is one of the earliest measurable signs of HD pathology in the brain and is accompanied by a decrease in striatal BOLD fMRI activation 21,22 . This would have an effect on the local signal-to-noise ratio (SNR) and our ability to measure striatal activation reliably. For the purposes of this study we defined the striatal ROI functionally and used a large enough region to try to mitigate this issue by averaging across functionally relevant voxels. Even so, the SNR for the SMA will be higher, therefore the activity NFT group would have received more accurate real-time feedback than the connectivity NFT group, which could explain the differences in learning between the two groups.
Because of the differences between the two NFT types, it was not possible to directly compare the two different treatment groups. Instead, we compared each treatment group to a matched sham control group. We chose a sham control group in order to control for potential placebo effects as a result of recruiting participants to an interventional study 23 . By choosing the "yoked" approach we ensured that the feedback control participants received was biologically plausible and matched to that of the treatment group. We chose not to use the approach of using a different ROI for the control group, because of potential problems with the spread of training effects across other brain regions. We do not yet understand the mechanism underlying NFT in HD and how widespread any effects could be, therefore we were not certain which other regions in the brain would be appropriate to be a control target 24 . Subsequent analyses examining the relationship between the signals of the yoked pairs (see supplementary methods on sham neurofeedback) show that our approach worked. The correlation between the signal from the control participants' brain and the signal from the brain of the yoked participants was overall quite low, suggesting that the feedback the control group received was not contingent on their actual brain activity.
A limitation of the present study was that it was only single-blind. This means that although participants were not aware of their group allocation, the researchers conducting the MRI and behavioural assessment were not blinded. Because this was a small, feasibility study, doubleblinding was not possible at this stage. To minimize any researcher bias, patients were not provided any input during the NFT sessions, but relied on the feedback they received during NFT. Statistical analyses were pre-defined and all behavioural assessments were objective measures of computeror paper-based tasks. In addition, the main aim of this study was not to test the efficacy of the approach, but rather to compare two different NFT methods. None of the two methods was deemed more preferable at the start of the study, therefore researcher bias was minimal, if any. Doubleblinding would, however, be necessary for any future RCTs that would test efficacy of the method.
Another limitation of the study was the small sample size. Because this is the first sham-controlled neurofeedback study in HD, we based our sample size calculations on a previous proof of concept NFT study on healthy young adults, which showed large effect size with group size of 11 per group 25 (see Methods section on participants). However, the reported effect sizes in healthy controls may not be appropriate for a study in clinical populations. Hence an additional aim of this study was to calculate effect sizes appropriate for our population and experimental design. The observed effect size for far transfer (behavioural change) at the first follow-up was moderate and sample size calculations suggest that a future RCT would need at least 33 participants per group in order to have 80% statistical power. This is a feasible number of participants to recruit in an RCT and further highlights that NFT using SMA activity may be a promising non-invasive intervention for HD.
To conclude, the aim of our study was to compare two different NFT approaches in HD, SMA activity and SMA -left striatum connectivity NFT. Although cortico-striatal connectivity is biologically more relevant in HD, the results from our study suggest that SMA activity NFT is a more plausible approach than connectivity NFT. SMA activity NFT is simpler to administer and the moderate effect sizes calculated in our study suggest that this approach may hold promise, as a non-invasive preventative or adjunctive intervention in HD. A future larger RCT is required in order to collect more robust evidence on the efficacy of the approach for the treatment of cognitive and psychomotor symptoms in HD for which there are currently no available treatments.

Participants
Thirty-four adults who carried an HTT gene CAG expansion greater than 40 were recruited to the study. One participant withdrew from the study after three visits because he could not tolerate the MRI scanning environment; the data were not used in any of the analyses. Another participant was excluded from the study, because a large number of trials had to be excluded due to motion related artifacts (see section on offline data analysis below for more details). These issues were identified during data pre-processing and another participant was recruited as a substitute. The remaining thirty-two participants who completed the training and testing protocol were included in the analyses (23 females, mean (SD) age = 49.7 (11.1)). There were no statistically significant differences between the treatment and control groups for the two types of NFT for any of demographic measures (using a non-parametric Mann-Whitney test all p > 0.2; Table 1 for detailed participant information). All participants provided written informed consent according to the Declaration of Helsinki and the study was approved by the Queen Square Research Ethics Committee (05Q051274).
All procedures, including recruitment, consent and testing, were carried out in accordance to the relevant good clinical practice guidelines and regulations. Information regarding power sample calculations prior to the start of the study are provided in the supplementary methods.   As part of the study, participants completed 1 screening, 1 baseline, 4 neurofeedback training and 3

Number of Participants
follow-up sessions. A diagram of the study design is shown in Figure 3. Supplementary Table 4    Drop-out: n = 2

Baseline & Follow-up Sessions
There was one baseline session and three follow-up sessions. The first follow-up was within 2 weeks from the last training visit, the second between 4-6 weeks and the third between 8 and 10 weeks (Supplementary Table 4). The baseline and follow-up sessions included: 1) repetition of the cognitive and psychomotor testing (only on the first and third follow-up), 2) structural MRI measurements using multi-parameter maps (MPMs) 28 and 3) two fMRI runs assessing the participant's ability to upregulate their motor control network.
At the baseline session participants were instructed to use motor imagery during the active blocks in order to increase activity in regions of their brain involved in movement and motor control. The aim was to measure the NFT targets' baseline activity/connectivity levels prior to training. At the followup sessions participants were instructed to use the mental strategy that they believed worked best to increase the level of the NFT targets during the training. These "near transfer" runs measure whether participants were able to regulate the NFT target in the absence of feedback and therefore assess learning. It is important to note that in this study we did not explicitly ask participants to practice upregulation at home between the end of the training and the follow-up sessions in order to measure how long any effects of training can be sustained without any additional practice.
The fMRI runs consisted of 5 upregulation blocks (30s each), 6 rest blocks (30s each) and 5 response blocks (18s each). Similar to our previous study 11 we used a simple attention task during the rest blocks, whereby participants monitored changes in the luminance of a white bar. If the white bar flickered to grey, they would wait until a question mark appeared inside the white bar (response blocks) and then make a response by clenching their left fist once. A maximum 3 out of 6 baseline blocks flickered per run and the timing of the flickering during the block was random. The design of the tasks is shown in Figure 4B.

Neurofeedback Training Sessions
All NFT sessions started with a functional localiser run to identify the target ROIs. Participants were instructed to clench their left fist during the active blocks (10 blocks lasting 20.4s each) and rest during the rest blocks (11 blocks lasting 20.4s each). The design of the run is shown in Figure 4A.
Using Turbo-BrainVoyager (TBV; Brain Innovation, The Netherlands) the fMRI run was analysed in real-time and the resulting statistical map was used to define the target ROIs for the subsequent NFT runs. For the activity NFT sessions, the SMA was selected as the target ROI. For the connectivity NFT sessions, the SMA and the left striatum (including putamen, globus pallidus and caudate) were selected as the target ROIs. The ROIs were drawn using TBV. It was not always possible to acquire a structural MRI volume during the baseline visit, because of fatigue. To avoid having to add an extra visit and burden the patients, we added the structural scan at the start of their first neurofeedback training session, and chose to define our ROIs using a functional localiser and anatomical landmarks, rather than creating an anatomical mask of the region. For the SMA the statistical map was thresholded at t-value = 3 and a rectangle was drawn around the SMA cluster for the active vs rest contrast. The location of the striatum was identified visually on the first EPI scan of the localiser run using landmarks and the EPI contrast. Due to high iron concentration, the putamen and globus pallidus appear darker on an EPI scan and are therefore easy to identify on EPI scans. A rectangle was drawn around the striatum including the putamen, globus pallidus, caudate and ventral striatum. Because of the rectangular shape, the striatal ROI, also included surrounding white matter.
However, the ROI was centred around the striatum and most of the recorded signal originated from the gray matter of the striatum. To enhance SNR we further applied 6mm FWHM smoothing during online data acquisition and also corrected in real-time for head motion and physiological noise (see below for more details).
Similar to our previous study 11 and comparable to other studies 12,29-31 , the ROIs were re-drawn at each session ensuring that only voxels with high activation are selected. The ROIs from the first visit were used as a reference, when drawing the ROIs for the subsequent visits to ensure that the position was similar, although the exact voxels selected might be different. A heat map showing the overlap of the ROIs across all participants is shown in Figure 1A and 1B.
For the NFT sessions the EPI volumes were exported using Ice and Gadgetron 32 . In-house scripts created using Gadgetron and MATLAB (Mathworks) were used to reconstruct the 3D EPI data using SENSE 33 such that they could be read in near real-time by TBV to produce the target ROI time-series.
There was a small delay at the start of each run to enable MATLAB to start, but after about 15s both the MRI scanner and the Gadgetron pipeline were fully in-synch with approximately 1s latency. To enable for both systems to synchronize we introduced a delay of 18 volumes at the start of each run.
During this time participants viewed a white cross on a black background followed by a count-down (from 10 to 1) until the NFT paradigm started. In-house MATLAB scripts were used to process the ROI time-series and record participants' responses, breathing and heart rate. For the NFT runs, the ROI signal was regressed against head motion traces and physiological noise from respiration 34 and cardiac rhythm using RETROICOR 35  No statistical analyses were performed on these data, but are presented here for completeness. All analyses mentioned in the main document were performed on the data after extensive quality control and pre-processing (see section bellow on offline pre-processing).
Participants completed 4 NFT sessions across multiple sessions and each session included 4 NFT runs. Two participants completed 3 runs instead of 4 during one of the NFT sessions, because of fatigue. The activity NFT runs consisted of 6 rest blocks (30s), 5 response blocks (18s) and 5 upregulation blocks (30s). The rest and response blocks were identical to those of the transfer runs described above. During the upregulation blocks feedback was presented continuously in the form of a red bar. In the treatment group the height of the red bar represented the percent signal change at a given point during the upregulation blocks vs the mean activation during the preceding rest block. In the sham control group the height of the red bar was calculated using data from a yoked participant in the treatment group. A black line was set at 3/5 of the total bar height and acted as an additional reminder to the participants that they needed to increase the height of the red bar. Once the upregulation blocks started, there was an average delay of 2s until the red bar appeared on the screen and then it was updated roughly every 1.2s. The initial delay allowed for the ROI time-series to be processed and the GLM model estimation to be initiated. The design of the task is shown in Figure 4C.
The connectivity NFT runs consisted of 5 rest blocks (45s), 5 upregulations blocks (30s) and 5 feedback blocks (3s). Feedback was presented intermittently at the end of the upregulation blocks in the form of a red bar. Similar to the activity NFT runs a black line was set at 3/5 of the total bar height as an additional reminder to the participants to upregulate. In the treatment group the height of the red bar was calculated using the Pearson's correlation coefficient between the SMA and left striatum ROI time-series during the upregulation blocks only 15 . In the sham control group the feedback was calculated using data from a yoked participant in the treatment group. The design of the task is shown in Figure 4D.
Similar to our previous study, we used shaping in order to facilitate learning and motivation 11,36,37 , whereby the difficulty in increasing the height of the feedback bar was adjusted using the participants' performance in the preceding block. The details of the shaping approach are described fully in our previous study 11 .
Prior to scanning, all participants were instructed to refrain from making any overt movements and only use mental strategies, such as motor imagery, in order to increase the levels of the NFT target (represented by the height of the red bar). Compliance was monitored using pneumatic tubes similar to our previous study 11 . More details are provided in the supplementary methods.

Scanning Parameters
All scanning was performed on a Siemens TIM Trio 3T scanner using a standard 32-channel head coil.
For the fMRI tasks we used a whole-brain multi-shot 3D echo-planar imaging (EPI) sequence 38

Offline fMRI Analysis
Statistical Parametric Mapping SPM12 (Wellcome Trust Centre for Neuroimaging, London) was used for offline pre-processing of the fMRI data. The first 3 volumes were removed from all fMRI time series apart from the NFT runs, where we removed the first 18 volumes. The images were then corrected for head-motion with rigid-body realignment using a 2-step approach and smoothed using an isotropic 8mm FWHM Gaussian smoothing kernel. Six motion parameters were generated for each fMRI run. In participants who used the PMCS the motion parameters represented residual and not actual head movement, i.e. they reflected motion that could not be fully corrected by the PMCS and remained in the time-series. For this reason, we did not use the motion parameters in order to identify and exclude bad scans, e.g. by examining scan-to-scan motion. Instead we used the DVARS 41 approach, the root mean square of the signal difference between consecutive scans, with the tool developed by Afyouni and Nichols 42 , which provides a more standardized version of DVARS and computes p-values for a null hypothesis of homogeneity. Volumes were identified as bad if the change in DVARS was greater than 20% and were added in separate regressors to the first-level statistical models. Blocks for which more than half of the volumes were de-weighted were excluded from the condition of interest regressor (e.g. upregulation) and modelled as separate regressors.
Data were also inspected for the presence of overt hand movements during upregulation or rest blocks, if a response was detected, the block was excluded from the condition of interest regressor and added to a separate task regressor. Runs with less than 2 blocks remaining in the upregulation condition were excluded from the analyses completely. Supplementary Table 5 shows the number of sessions that were included in the analyses across the groups for the 32 participants. In the case of the participant who was excluded from the analyses and replaced, all the runs were contaminated by large head movements across the time-series and had to be rejected.
First-level, within-subject models included the condition of interest regressors. We used 2 regressors modelling the upregulation and response blocks for the baseline, NFT and transfer runs, and 1 regressor modelling the fist clenching blocks for the localiser runs. The baseline condition was modelled implicitly. In addition, models included 6 head motion parameter regressors produced by SPM and extracted from the PMCS (when used) with their temporal derivatives and the quadratic expansions of the parameters and their derivatives 43,44 , spike regressors 45 , as well as 13 physiological noise regressors modelling the heart rate using RETROICOR and respiration 34,35,46 . Temporal autocorrelation was modelled using SPM's first-level autoregressive process (AR(1)) and a high-pass filter with 128s cutoff.
For the activity NFT group, contrast values for upregulation vs baseline were extracted for the target ROI for each session and the highest 10% of t-values 47 were used to calculate the average ROI value.
For the connectivity NFT group, the time-series for the target ROIs (SMA and striatum) was extracted using a 6mm sphere centred on the peak for upregulation vs baseline across all runs. The Pearson's correlation coefficient of the time-series between the two ROIs within the upregulation periods was then calculated and transformed into Fisher z-scores for analyses.
For the between-group comparisons of the training effect, the ROI estimates and transformed correlation coefficients were used as outcomes in linear repeated-measure models with group (treatment vs control) and session as fixed effects. Session was modelled as a repeated factor within subjects. Intersession covariance was modelled using heterogeneous compound symmetry (CSH), as this gave a reasonable approximation of the observed within-subject covariance while using minimal degrees of freedom. For the transfer effects, we tested change from baseline for each follow-up session separately and ran 1-way ANCOVAs with group as a fixed effect, adjusting for baseline levels of the NFT target to increase model sensitivity 18 . We used SAS 9.4 to estimate the repeated-measure linear models and ANCOVAs. For the repeated measure models using CSH, effect size was calculated using the effect t-statistic as the numerator and the square root of the sample size per group as the denominator. For the ANCOVAs, effect size was calculated using the contrast estimate as the numerator and the square root of the mean square error as the denominator.

Cognitive and Psychomotor Assessments
To assess change in cognitive and psychomotor function following neurofeedback training we calculated a composite score using the same measures and procedure as in our previous study 11 .
This was a-priori specified using a set of independently validated measures sensitive to HD progression 5,6,48,49 . As a summary, the cognitive measurements included were: number correct for Stroop Word Reading only, number correct for Symbol Digit Modalities Test (SDMT), annulus length for Indirect Circle Tracing (log transformed) and number correct for negative Emotion Recognition.
The Q-Motor measurements included were: inter-tap interval (ITI) and standard deviation of interonset interval (log transformed; log SD IOI) during speeded tapping with the left (non-dominant) index finger, and standard deviation of mid-tap interval deviation from target rhythm (log transformed; log SD dMTI) for paced tapping with left index finger at 1.8Hz. The composite score at the baseline session correlated highly with the normalized CAG Age Product score 50 (CAP score; Spearman's r = -0.7, p < 0.001) and the MOCA (Spearman's r = 0.6, p < 0.001) after controlling for age (results were also significant without controlling for age, both p < 0.001). It was therefore a sensitive measure of the participant's disease stage and overall cognitive and capacity.