Shared and Disorder-Specific Neurocomputational Mechanisms of Decision-Making in Autism Spectrum Disorder and Obsessive-Compulsive Disorder

Abstract Autism spectrum disorder (ASD) and obsessive-compulsive disorder (OCD) often share phenotypes of repetitive behaviors, possibly underpinned by abnormal decision-making. To compare neural correlates underlying decision-making between these disorders, brain activation of boys with ASD (N = 24), OCD (N = 20) and typically developing controls (N = 20) during gambling was compared, and computational modeling compared performance. Patients were unimpaired on number of risky decisions, but modeling showed that both patient groups had lower choice consistency and relied less on reinforcement learning compared to controls. ASD individuals had disorder-specific choice perseverance abnormalities compared to OCD individuals. Neurofunctionally, ASD and OCD boys shared dorsolateral/inferior frontal underactivation compared to controls during decision-making. During outcome anticipation, patients shared underactivation compared to controls in lateral inferior/orbitofrontal cortex and ventral striatum. During reward receipt, ASD boys had disorder-specific enhanced activation in inferior frontal/insular regions relative to OCD boys and controls. Results showed that ASD and OCD individuals shared decision-making strategies that differed from controls to achieve comparable performance to controls. Patients showed shared abnormalities in lateral-(orbito)fronto-striatal reward circuitry, but ASD boys had disorder-specific lateral inferior frontal/insular overactivation, suggesting that shared and disorder-specific mechanisms underpin decision-making in these disorders. Findings provide evidence for shared neurobiological substrates that could serve as possible future biomarkers.


PVL-DecayRI and PVL-Delta models
In the prospect valence learning models, outcome evaluation is assessed according to the prospect utility function. The utility u(t) on trial t of each outcome x(t) is expressed as: α (0< α<2) determines the shape of the utility function, and the loss-aversion parameter (0< <10) determines sensitivity to losses versus gains. Higher α implies greater sensitivity to feedback, and a value of λ<1 indicates higher sensitivity to gains than losses (whereas λ>1 indicates the opposite).
The PVL models are identical except that they use different learning rules. The parameter A determines how much past expectancy is discounted; in the decayRI learning rule, expectancies of all decks are discounted on each trial, and the expectancy of the chosen deck is updated by current outcome utility: ( + 1) = ⋅ ( ) + ( ) ⋅ ( ) In the delta rule, only the expectancy of the selected deck is updated while expectancies of the other decks is unchanged: Thus, learning rate A determines how much weight is placed on past experiences vs.
most recent experience of the chosen deck. A high learning rate indicates that the recent outcome has a large influence on expectancy of the chosen deck (i.e. 'forgetting' is more rapid) while a low learning rate indicates the opposite. Next, a softmax function (Luce, 1959) is used to calculate the probability of choosing deck j, with sensitivity ( ) determining the degree of exploitation vs. exploration. c is a choice consistency (sensitivity) parameter: (4)

Value-Plus-Perseverance model
Evidence suggests that participants frequently use a win-stay-lose-switch (WSLS) strategy, that is, a perseverative strategy that cares only about the last choice's outcome for making a choice on the current trial during reward-based learning and decision-making (Worthy et al., 2013). Based on a model comparison between the PVL-DecayRI and WSLS models showing that each model respectively was the best fit for only half of the subjects investigated, a hybrid VPP model was developed (Worthy et al., 2013) combining the PVL-Delta and perseverance heuristic. This model assumes that individuals track expectancies (Ej(t)) and perseverance strengths (Pj(t)); expectancies are computed using the learning rule of the PVL-Delta model, and three additional perseverance parameters are included: ω is the reinforcement learning (RL) weight (0<ω<1); a low ω indicates the subject relies less on RL/more on perseverance. Choice probability was again computed using the softmax function, but with Vj(t+1):

Hierarchical Bayesian Analysis
HBA is a more suitable approach for parameter estimation compared to e.g.
Maximum Likelihood Estimation (MLE) for considering individual differences through the use of posterior distributions and Markov chain Monte Carlo (MCMC) sampling algorithms (Ahn et al., 2014). Parameter estimates obtained through traditional methods such as MLE are generally estimated at the individual level from point estimates that maximize the likelihood of data for each individual subject (Myung, 2003). However, these MLE estimates can be noisy, particularly in samples with insufficient amounts of data. To address this, group-level analysis estimating a single set of parameters for an entire group may provide more reliable estimates but consequently ignores fine-grained individual differences (Ahn et al., 2016).
Bayesian statistics rely on the use of prior distributions, estimating model parameters and updating these prior using posterior distributions on a trial-by-trial basis given the data using Bayes' rule. In HBA, hyper-parameters are derived in addition to parameters introduced at the individual level (Gelman et al., 2014). These hyper-parameters are set with group-level means and standard deviations, where the resulting joint posterior distribution (Θ, Φ| ) is defined as: This hierarchical structure of HBA leads to a "shrinkage effect", i.e. individual estimates are pulled closer to the group mean because they inform the group's estimate, which in turn informs the estimates of each individual (Gelman et al., 2014 (Ahn et al., 2016).

Individual-level analysis
Data were first processed to minimize motion-related artefacts (Bullmore et al., 1999a). A 3D volume consisting of the average intensity at each voxel over the entire experiment was calculated and used as a template. The 3D image volume at each time point was then realigned to this template by computing the combination of rotations (around the x, y and z axes) and translations (in x, y and z dimensions) that maximised the correlation between the image intensities and the volume in question and the template (rigid-body registration). Following realignment, data were then smoothed using a Gaussian filter (fullwidth at half-maximum (FWHM) 7.2 mm) to improve the signal-to-noise ratio of the images (Bullmore et al., 1999a). Following motion correction, global detrending and spin-excitation history correction, time series analysis for each subject was conducted based on a previously published wavelet-based resampling method for fMRI data (Bullmore et al., 1999b, Bullmore et al., 2001). At the individual subject level, a standard general linear modelling approach was used to obtain estimates of the response size (beta) to each of the task conditions (choice, anticipation and outcome phases) against an implicit baseline. We first convolved the main experimental conditions with 2 Poisson model functions (peaking at 4 and 8s). We then calculated the weighted sum of these 2 convolutions that gave the best fit (least-squares) to the time series at each voxel. A goodness-of-fit statistic (SSQ ratio) was then computed at each voxel consisting of the ration of the sum of squares of deviations from the mean intensity value due to the model (fitted time series) divided by that of the squares due to the residuals (original time series minus model time series). The appropriate null distribution for assessing significance of any given SSQ ratio was established using a wavelet-based data re-sampling method (Bullmore et al., 2001) and applying the modelfitting process to the resampled data. This process was repeated 20 times at each voxel, and the data was combined over all voxels, resulting in 20 null parametric maps of SSQ ratios for each subject. These maps were then combined to give the overall null distribution of SSQ ratio. This same permutation strategy was applied at each voxel to preserve spatial correlation structure in the data. Individual SSQ ratio maps were then transformed into standard space, first by rigid-body transformation of the fMRI data into a high-resolution inversion recovery image of the same subject, and then by affine transformation onto a Talairach template (Talairach and Tournoux, 1988).

Group-level analysis
For the group-level analysis, less than 1 false positive-activated 3D cluster was expected at p<0.05 (voxel-level) and p<0.01 (cluster-level). A group-level activation map was produced for each group and each experimental condition (choice, anticipation, outcome) by calculating the median observed SSQ ratios at each voxel in standard space across all subjects and testing them against the null distribution of median SSQ ratios computed from the identically transformed wavelet-resampled data (Brammer et al., 1997, Bullmore et al., 2001. The voxel-level threshold was first set to 0.05, and tests were conducted to identify voxels that might be plausibly activated followed by a test at a cluster-level threshold of p<0.01 to remove the false-positive clusters produced by the voxel-level test (Bullmore et al., 1999b, Bullmore et al., 2001. Next, a cluster-level threshold was computed for the resulting 3D voxel clusters. The necessary combination of voxel and cluster-level thresholds was not assumed from theory but rather was determined by direct permutation for each dataset, giving excellent type-II error control (Bullmore et al., 1999b). Cluster mass rather than a cluster extent threshold was used to minimize discrimination against possible small, strongly responding foci of activation (Bullmore et al., 1999b).

Task performance
Supplementary Figure S1. Overall advantageous preference ratio Supplementary Figure S2. Advantageous preference ratio split by task-block

Within-group fMRI results
Supplementary Figure Figure S1. Horizontal sections showing within-group brain activation for each task condition (choice, anticipation, outcome) for (A) typically developing control boys, (B) boys with ASD and (C) boys with OCD. Talairach z-coordinates are shown for slice distance (in mm) from the intercommisural line. The right side of the image corresponds with the right side of the brain.