Estimating the effect of physical activity on cognitive function within the UK Biobank cohort

Abstract Background Physical activity (PA) has been associated with benefits for cognitive function (CF), but previous estimates of the strength of this relationship may have been biased due to limitations in statistical modelling practices that are common among observational studies. We aimed to address this by using a rigorously constructed conceptual causal model to guide an empirical analysis estimating the effect of PA on CF in the UK Biobank cohort of middle-aged and older adults. Methods This study analysed a subsample of 334 227 adults from the UK Biobank prospective cohort study. PA was measured subjectively by self-report and by device using accelerometry, and CF was measured using objective cognitive tests. Composite CF measures were derived to represent general and domain-specific performance. Effect coefficients were estimated using regression models, adjusting for a wide range of confounders specified by the assumed causal model, including genetic risk factors, and relevant health, sociodemographic and behavioural variables from across the lifespan. Results Results indicated very small effect sizes (standardized mean difference estimates all <0.01) of inconsistent direction, for both cross-sectional and longitudinal analyses. Conclusions The expected protective effect of PA on CF was not observed. This may reflect selection bias within UK Biobank, or the relatively young age of the sample at follow-up.


Introduction
The term cognitive function (CF) describes the set of mental abilities that enable the acquisition and use of knowledge and skills throughout life.It is recognized that CF is multidimensional and consists of abilities within subdomains such as memory, speed of processing, verbal ability and reasoning, and that abilities within these domains tend to be positively correlated. 1 Brain imaging studies show associations between various characteristics of the brain, such as grey and white matter volumes, and CF. 2 Physical activity (PA) is defined as 'any bodily movement produced by skeletal muscles that requires energy expenditure'. 3PA thus includes everyday activities as well as purposeful exercise.Researchers typically conceptualize PA along a continuum, and tools such as the International Physical Activity Questionnaire (IPAQ) categorize activities according to their intensity as 'light', 'moderate' and 'vigorous'. 4The World Health Organization (WHO) recommends that adults should undertake 150-300 min of moderate intensity, or 75-150 min of vigorous physical activity, per week. 57][8][9] However, due to the observational nature of such studies, researchers have been cautious about making causal claims from their findings.Furthermore, the statistical adjustment strategies used in such studies often neglect to consider the assumptions behind, and implications of, the selection of variables included for adjustment, and thus may introduce more bias into effect estimates. 10o address these potential shortcomings in the existing literature, we conducted a systematic review 11 using a protocol 12 for synthesizing observational evidence to produce a directed acyclic graph (DAG).This involved integrating relevant covariates identified by the review into a model by mapping the assumed structure of inter-relationships between these variables, using a set of causal criteria to guide decisions.This enabled us to identify confounders and mediators of the PA-CF relationship and select appropriate adjustment strategies.There was a large number of mediators and confounders indicated by this process.Intermediate mechanisms by which PA may affect CF include by facilitating neurogenesis, synaptogenesis and angiogenesis 13 and modifying grey matter volume 14 and white matter integrity. 15Identified confounders broadly fell into the following categories: pre-birth factors (e.g.genetic risk); early life factors (e.g.childhood PA, education, traumatic events); adult sociodemographic factors (e.g.socioeconomic status, exposure to pollution); adult health behaviours (e.g.diet, alcohol consumption and smoking status); adult health outcomes (e.g.cardiovascular disease, neurological disease, mental health disorders); and medication (e.g.psychotropic or antihypertensive medication).
Previous observational studies of the PA-CF relationship may be biased because none has followed a comprehensive method to select covariates.Our study addressed this by using the DAG produced by our review 11 to inform the specification of models to estimate the total effect of PA on CF (i.e.including any effect transmitted indirectly via intermediate variables).This study matched variables from the DAG with data available within the UK Biobank dataset, in order to address the following research aims: i.To estimate the magnitude of the relationship between PA and CF in a cross-sectional analysis of baseline UK Biobank data; ii.To estimate the magnitude of the relationship between PA at baseline and CF at follow-up in a longitudinal analysis.

Methods
lifestyle questionnaires and an interview with a trained staff member, as well as undergoing physical and biological measurements and a brief computerized cognitive assessment.Subsequently, a subset of the total cohort completed accelerometry (device-measured PA) and neuroimaging visits (including repeat cognitive assessments).Invitations to participate in accelerometry (2013-15) 18 were sent to a random sample of participants with e-mail addresses (excluding those closest to the main UK Biobank centre, due to concerns about burden on those participants).Invitations to the neuroimaging assessments (2014 onwards) were based on proximity to UK Biobank MRI scanning centres in England.Because our study made use of genetic score data, the analysis sample was restricted to those with White British genetic ancestry (as determined centrally by UK Biobank based on a combination of selfreport and genetic data) in order to reduce confounding induced by groups differing systematically both by genetic ancestry and according to phenotypic measures of interest. 19This type of restriction has been performed in UK Biobank studies previously. 20Similarly, the sample was restricted to unrelated people; this was done by randomly keeping one member of each related set (third degree or closer).Along with further standard exclusions based on genotyping quality control, this left a sample of 334 227 which was used for baseline analysis in our study (see Figure 1).

Measures
The variables within the conceptual DAG model 11 were matched to the data available within UK Biobank (see Supplementary Methods, available as Supplementary data at IJE online for detail).

Exposure: baseline physical activity
Self-report data: at the assessment centre visits, participants completed a modified form of the IPAQ short form, 21 reporting the frequency and duration of walking and moderate and vigorous activity undertaken in a typical week.Data were processed in accordance with IPAQ scoring guidance, such that each category of activity was assigned the following weighting in metabolic equivalent of task (MET) units: walking, 3.3 METs; moderate activities, 4 METs; and vigorous activities, 8 METs.Total amount of moderate-vigorous PA was estimated as the sum of these moderate and vigorous PA expressed in METhours per week, and classified as active if they met IPAQ recommendations of at least 10 MET-hours per week of moderate to vigorous PA, as has been done in previous UK Biobank studies. 22Total PA was calculated by summing the weighted time spent across all three categories and expressed in MET-hours per week.Therefore, participants received a total PA value if they had data for at least one of the three levels of activity.

Figure 1 Flowchart showing participants included in the study
Accelerometer data A subsample participated in accelerometer-measured PA data collection for a 1-week period.The Axivity AX3 wrist-worn triaxial device recorded accelerations over 5-s periods (measured in milligravity, mg, units), resulting in 120 960 data points per participant. 18Mean daily accelerations per day were derived for use in analysis.The levels of missing data and overall participant compliance in collecting these data are similar to other studies. 18tcome: cognitive function

Follow-up cognitive function tests
The follow-up data in our study pertain to 10 tests administered at the imaging visit, which included the five administered at baseline as well as the following five: 'Trail Making Test' (Part A reflects processing speed, Part B executive function); 'Digit Symbol Substitution Test' (processing speed); 'Tower Rearranging Test' (executive function); 'Paired Associate Learning' (verbal declarative memory); 'Matrices' (non-verbal reasoning).For further detail, see Supplementary Methods..One additional test ('Picture Vocabulary') was administered at the imaging visits, but the data from this test have not yet been released by UK Biobank and so this is not described further here.For longitudinal models using self-reported PA, sample sizes ranged from 21 225 (Trails-B Time) to 30 330 (Prospective Memory).After adjusting the models for the specified covariates, sample sizes ranged from 4805 (Paired Associate Learning) to 6840 (Prospective Memory).Longitudinal models using accelerometer-measured PA ranged from 9362 (Trails-B errors) to 14 392 (Prospective Memory) and, after adjusting for specified covariates, from 2742 (Trails-B Time) to 3935 (Reaction Time).

Individual and composite cognitive function scores
For our study, the raw scores for all tests except Prospective Memory (as it is a binary variable) were converted into z-scores for ease of interpretation, standardized within 5-year age bands at each assessment time point.Therefore the mean score is approximately zero, and the standard deviation is approximately one.For each z-score, higher scores represent better performance.
Composite measures of global CF at baseline and at the imaging visit were derived by taking the mean of the four baseline visit z-scores and the 10 imaging visit z-scores (trails B-A not included), for participants with at least two non-missing z-score values.As the imaging visit data included multiple tests that measure the same domain of CF, domain-level composite scores were also derived using the mean of z-scores for participants who had at least two non-missing values for tests within that domain: processing speed (Digit Symbol Substitution and Reaction Time); reasoning (Reasoning and Matrices); executive function (Tower Rearranging, Trails-A time and Trails-B time); and memory (Pairs Matching, Numeric Memory and Paired Associate Learning).

Covariates
The covariates were identified by the graphical causal model reported in the systematic review 11 and matched to available UK Biobank data.These are described in the Supplementary Methods and listed in full in Table 1.

Unmatched variables
There were several variables within the conceptual model which could not be matched to data within UK Biobank.These were childhood PA, childhood intelligence quotient (IQ), earlier adulthood PA and cognitive activity.Of these, childhood PA and earlier adulthood PA were specified in all minimum covariate adjustment sets determined by structural analysis of the DAG.Thus, the model estimated in this study represents the nearest approximation of the full conceptual model, as is recommended in recent guidance. 24

Statistical analysis
All analyses were performed in Stata version 16.Data were summarized using descriptive statistics and are reported for the whole sample and stratified by PA classification: active, inactive or missing.Normally distributed continuous variables are summarized as means and standard deviations, and skewed variables as medians with interquartile ranges.Ordinal and binary variables are reported as frequencies and percentages within each category.These summary statistics are presented for the baseline characteristics of the total sample and the subsample who attended the imaging visit (Table 1a).Data pertaining to the cognitive outcomes at follow-up are presented in Table 1b.Differences between the PA groups for each measure were not formally tested, as the decision about entering covariates into the regression models was based a priori on the DAG rather than on the existence of statistical differences.The relationship between PA and CF was then estimated using two sets of regression models using the 'regress, vce(robust)' command in Stata: the entire analysis code file is available at [https://osf.io/tngqh/].
The first set of regression models (Table 2) used CF data that was measured cross-sectionally with the PA measure.Cognitive scores at baseline were entered as the dependent variable and total self-reported PA in MET-hours per week as a continuous independent variable.Models were initially run without adjustment, and then adjusted according to the nearest approximation of the minimum sufficient adjustment set (listed in the Table 2

footnote).
The second set of regression models (Table 3) used the CF variables pertaining to the imaging visit, making the analysis longitudinal by design.The included covariates were as above with the addition of follow-up duration, and both self-reported PA and the covariate values were again taken from baseline data.This set of models was also repeated using accelerometer-measured PA (which was acquired after baseline CF measurement and thus not used in cross-sectional models).
Diagnostic checks were performed for all models to ensure the assumptions for regression were met.P-values were two-tailed and false discovery rate (FDR) correction was used within groups of models that tested the same hypotheses, to maintain the false-positive rate at 0.05.All analyses were conducted on a complete-case basis and missing values were not imputed.

Sample characteristics
Table 1a shows descriptive statistics for the variables specified in the conceptual model at baseline for both the entire sample, and the subsample who returned for imaging.The subsample who returned for imaging was on average younger, more active, less deprived and generally healthier at baseline than the overall sample.It is also apparent that, within the baseline data, those who were missing PA status were less educated, more deprived and generally less healthy than the overall sample, suggesting that missingness on the moderate-vigorous PA measures (which determined the PA groups) was not random.High missingness on some cognitive tests reflects that some tests were introduced at different stages within the baseline recruitment window.Table 1b shows the cognitive outcomes for the imaging sample at follow-up.Generally, the descriptive statistics suggested very small reductions in CF, of similar magnitude across the PA groups.The mean duration between baseline and follow-up was 8.94 years (SD 1.76).

Effect of physical activity on cognitive function at baseline
Table 2 shows the cross-sectional regression results estimating the effect of PA, expressed in MET-hours per week, on CF, expressed in z-score units (with the exception of Prospective Memory, which is an odds ratio reflecting the odds of a correct response).The unadjusted models indicated a trivially small effect of PA on CF.For each measure of CF, the direction of the effect was negative (harmful), except for Reaction Time which was positive (protective).When models were adjusted for covariates, the effect sizes were similarly tiny and the confidence intervals were wider.

Effect of physical activity on cognitive function at follow-up
Table 3 displays the longitudinal regression results estimating the effect of self-reported PA, expressed in MET-hours per week, on CF expressed in z-score units (with the exception of Prospective Memory, which is an odds ratio reflecting the odds of a correct response).Results for the same models repeated using accelerometry averages (expressed in milligravity units), as a continuous measure of devicemeasured PA, are displayed in the lower half of the same table.
Across the unadjusted self-reported PA models, the estimated effect of PA on CF was trivially small.The direction of the effect was negative (harmful) for all CF measures.After adjusting for covariates, effects remained trivially small in magnitude, and in the negative direction.
For device-measured PA, there were trivially small effects in unadjusted models for Reaction Time, Symbol Digit Substitution, Trails A time and Processing Speed    a Individuals were classified as active if they met !10 MET-hours of moderate to vigorous PA per week.However, they also reported levels of light PA (walking) which did not contribute to this classification.Total PA includes light PA as well as mvPA.Therefore, there is a subset of individuals who are non-missing on light PA, but missing on both moderate and vigorous PA.These individuals will have a value for total PA but be counted as missing PA classification.composite, all in the positive (protective) direction.After adjusting for the specified covariates, effect sizes estimates remained trivially small, with wider confidence intervals.

Sensitivity analysis
Because the adjusted models contained large numbers of covariates, results were potentially sensitive to bias arising from missing data.To examine this possibility, a sensitivity analysis was performed by repeating the unadjusted analyses, restricted to those participants who had full covariate data.The results are presented in Supplementary Tables S1  and S2 (available as Supplementary data at IJE online).
There was very little difference in effect estimates, indicating that the unadjusted relationship between PA and CF was very similar among people with and without missing covariate data.Therefore, it is unlikely that observed results in adjusted models are being driven by missing data bias in the analytic sample.

Discussion
In this study using a large cohort of middle-aged to early old-age adults of White British ancestry to estimate the causal effect of PA on CF, virtually no relationship was observed between these variables.Due to very large sample sizes, the effects were estimated with high precision (narrow confidence intervals); however, they were of trivially small magnitude, and became smaller after adjustment for covariates.This pattern of results was unexpected as it does not align with most of the recent literature that was synthesized and reviewed to construct the conceptual model informing this analysis. 11However, a minority of the synthesized studies also reported no association between PA and CF [25][26][27][28] and, in common with two of these studies, the UK Biobank sample was younger at baseline than most of the other cohorts in which protective effects have been found.Taken together, our findings may suggest that observed associations in older baseline samples reflect reverse causation, whereby some participants had preclinical cognitive decline at baseline.In other words, physically slowing down may reflect a prodromal symptom of cognitive decline, rather than a cause of it.0][31] Studies with younger samples at baseline, such as ours, reduce the risk that preclinical disease processes have begun.Another possible consequence of our sample's relatively young age is that benefits of PA for CF have not yet been realized, and may yet be observable later in life when a greater degree of cognitive decline would be expected. 32Other potential explanations for our finding are considered below.
The UK Biobank sample is known not to be representative of the general population, with participants being more wealthy and educated and less likely to engage in unhealthy behaviours and experience negative health outcomes. 33When both exposure and outcome are related to participation in studies, this can lead to collider bias, 34 and it is plausible that both higher levels of PA and better cognitive health influence participation and retention within UK Biobank.Indeed, the sample who returned for     follow-up were more active, less deprived and healthier than the total sample.6] Furthermore, a recent analysis using UK Biobank indicated that analyses using socio-behavioural variables such as PA are particularly susceptible to participation bias. 37Taken together, our findings underline the importance of subjecting studies using UK Biobank to rigorous methodological scrutiny.
Finally, it is worth considering our results in the context of global public health guidance, which emphasizes that PA is just one of a broader suite of modifiable risk-factors against cognitive decline. 38Our results would support the notion that focusing on PA alone is unlikely to substantially reduce risk of decline.

Strengths and limitations
The use of a DAG to inform the examination of the PA-CF relationship represents a novel contribution to the literature and, by following a protocol, was done to a standard of rigour and transparency that is not common within existing literature. 24However, the complexities of the rigorous model (dozens of nodes and hundreds of paths) posed practical limitations, such as being unable to interrogate its structural implications comprehensively using the available software, meaning that the plausible alternative variations of the specified model were not explored.Nor were the implied independencies of the model tested against the measured data, which is another way of assessing model fit. 39he use of UK Biobank data represents a trade-off in terms of strengths and limitations.The range and detail of measures available within this resource allowed a close approximation of the complex conceptual model to be estimated statistically.However, the internal and external validity of the findings are limited due to the selection bias within the sample, and the genetic ancestry restriction means results cannot be generalized beyond populations of White British ancestry.
Finally, total PA was selected as the exposure variable, as this had the fewest missing data relative to other categories of PA.However it is possible that the inclusion of 'light PA' within this has diluted the specific influence of moderate to vigorous PA, which has been observed to be associated with CF in a recent study using UK Biobank. 40

Conclusions
Due to limitations of both internal and external validity as discussed above, our results should be interpreted with caution.However, in the context of existing literature, the finding of no meaningful association between PA and CF aligns with other studies that had younger baseline samples, and may lend weight to the reverse causation hypothesis.Alternatively, the virtually null findings may reflect the suppression of true effects due to collider bias induced by the factors influencing participation and retention within the UK Biobank sample.Future research using UK Biobank can explore whether the hypothesized protective effect of PA on CF does emerge as the cohort matures and, if so, whether this effect is mediated by the hypothesized pathways via brain health.
Biobank.We are grateful to Dr Joey Ward and Dr Carlos Celis Morales, University of Glasgow, for their contributions to data coding.

b
Global CF ¼ mean of z scores on four tests (assuming at least two non-missing values).c Medical diagnoses based on linked health records with positive diagnosis indicating diagnosis on or before baseline assessment date.International Journal of Epidemiology, 2023, Vol.52, No. 5

b
All expressed as z score units (standardized mean difference), except Prospective Memory which is expressed as an odds ratio.c Probability adjusted using the Simes-Benjamini-Hochberg method implemented in the Stata qqvalue package.d Global CF ¼ mean of z scores on four tests (assuming at least two non-missing values).1606 International Journal of Epidemiology, 2023, Vol.52, No. 5

a
Adjusted for: alcohol binge, alcohol frequency, antihypertensive medication, apoe-e4 allele count, body mass index, cardiovascular disease diagnosis, dementia genetic risk score, diabetes diagnosis, distance to major road, friend and family visits, gender, HDL cholesterol, head injury diagnosis, household income, kidney disease diagnosis, kJ of energy, LDL cholesterol, living alone status, manual work, mood disorder diagnosis, musculoskeletal diagnosis, neurological disorder diagnosis, neuroticism score, psychosis diagnosis, psychotropic medication, salt added to food, smoking status, Townsend deprivation score, trauma status, waist circumference, worrier status.Also adjusted for technical covariates used with genetic risk scores.b All expressed as z score units (standardized mean difference), except Prospective Memory which is expressed as an odds ratio.c Probability adjusted using the Simes-Benjamini-Hochberg method implemented in the Stata qqvalue package.d Processing speed composite ¼ mean of Digit Symbol Substitution and Reaction Time (assuming non-missing on both measures).e Executive function composite ¼ mean of Tower Rearranging, Trails A and Trails B completion time (assuming non-missing on two measures).f Reasoning composite ¼ mean of Reasoning test and Matrix Pattern Completion (assuming non-missing on both measures).g Memory composite ¼ mean of Pairs Matching, Numerical Memory and Paired Associate Learning (assuming non-missing on two measures).h Global CF ¼ mean of z scores on 10 tests (assuming at least two non-missing values).

Table 1a
Baseline characteristics of total sample and imaging subsampleInternational Journal of Epidemiology, 2023, Vol.52, No. 5

Table 1b
Cognitive outcomes for imaging subsample at follow-up

Table 1b Continued
Individuals were classified as active if they met !10 MET-hours of moderate to vigorous PA per week.However, they also reported levels of light PA (walking) which did not contribute to this classification.Total PA includes light PA as well as mvPA.Therefore, there is a subset of individuals who are non-missing on light PA, but missing on both moderate and vigorous PA.These individuals will have a value for total PA but be counted as missing PA classification.Global CF ¼ mean of z scores on 10 tests assuming at least two non-missing values.Processing speed composite ¼ mean of Digit Symbol Substitution and Reaction Time (assuming non-missing on both measures).Executive function composite ¼ mean of Tower Rearranging, Trails A and Trails B completion time (assuming non-missing on two measures).Reasoning composite ¼ mean of Reasoning test and Matrix Pattern Completion (assuming non-missing on both measures).f Memory composite ¼ mean of Pairs Matching, Numeric Memory and Paired Associate Learning (assuming non -missing on two measures).
a b c d e

Table 2
Cross-sectional regression models for baseline cognitive functionAdjusted for: alcohol binge, alcohol frequency, antihypertensive medication, apoe-e4 allele count, body mass index, cardiovascular disease diagnosis, dementia genetic risk score, diabetes diagnosis, distance to major road, friend and family visits, gender, HDL cholesterol, head injury diagnosis, household income, kidney disease diagnosis, kJ of energy, lLDL cholesterol, living alone status, manual work, mood disorder diagnosis, musculoskeletal diagnosis, neurological disorder diagnosis, neuroticism score, psychosis diagnosis, psychotropic medication, salt added to food, smoking status, Townsend deprivation score, trauma status, waist circumference, worrier status.Also adjusted for technical covariates used with genetic risk scores. a