Validation of the Oxford WebQ Online 24-hour Dietary Questionnaire Using Biomarkers.

Oxford WebQ is an online dietary questionnaire covering 24 hours, appropriate for repeated administration in large-scale prospective studies including UK Biobank and the Million Women Study. We compared performance of the Oxford WebQ and a traditional interviewer-administered multi-pass 24-hour recall against biomarkers for protein, potassium and total sugar intake, and total energy expenditure estimated by accelerometry. 160 participants were recruited between 2014 and 2016 in London, UK, and measured at 3 non-consecutive time-points. The measurement error model simultaneously compared all 3 methods. Attenuation factors for protein, potassium, sugars and total energy intake estimated by the mean of 2 Oxford WebQs were 0.37, 0.42, 0.45, and 0.31 respectively, with performance improving incrementally for the mean of more measures. Correlation between the mean of 2 Oxford WebQs and estimated true intakes, reflecting attenuation when intake is categorised or ranked, was 0.47, 0.39, 0.40, and 0.38 respectively, also improving with repeated administration. These were similar to the more administratively burdensome interviewer-based recall. Using objective biomarkers as the standard, Oxford WebQ performs well across key nutrients in comparison with more administratively burdensome interviewer-based 24-hour recalls. Attenuation improves when the average is taken over repeated administration, reducing measurement error bias in assessment of diet-disease associations.

Dietary intakes estimated from self-reported dietary assessments are prone to measurement error, introducing potentially substantial bias and loss of statistical power (1,2). It is therefore important to calibrate self-reported intakes against objective biomarkers, where measurement errors can be assumed to be independent (3,4). Most cohort studies have used food frequency questionnaires (FFQs) designed to assess diet over the long term, but short-term recalls may have less bias from measurement error (5)(6)(7), other than for episodically consumed foods. Repeated application of short-term recalls may offer longer-term coverage, but the process is administratively burdensome. Online dietary assessment offers repeated administration with reduced administrative costs (8), but to facilitate this, the assessment instrument must be convenient for the participant to use (9,10).
The Oxford WebQ is an online dietary questionnaire covering the previous day's intake (11), developed to provide an easy-tocomplete dietary assessment appropriate for repeated use in large-scale prospective studies. It is currently being used in the UK Biobank (12)(13)(14) and the Million Women Study (15,16).
The Oxford WebQ has previously been shown to provide results similar to those derived from an interviewer-administered self-report 24-hour dietary recall, but it is quicker to complete (17,18). However, the comparison tool was itself a selfreport instrument, providing an inadequate basis for validation, because self-report tools are prone to correlated personspecific biases (19)(20)(21). These biases may differ according to personal characteristics such as age, sex, or body mass index (BMI; weight (kg)/height (m) 2 ).
We therefore aimed to provide the first validation of the Oxford WebQ tool against established recovery and predictive nutritional biomarkers and a reference measure of energy expenditure free from these person-specific biases. In doing so, we present the degree to which diet-disease relationships assessed using the Oxford WebQ are attenuated and the extent to which statistical power to detect these comparisons is reduced even in large-scale studies such as the UK Biobank study and the Million Women Study.

Recruitment
Participants were enrolled in a United Kingdom study designed to validate both the Oxford WebQ dietary assessment tool and the myfood24 dietary assessment tool (22) against nutritional biomarkers, comparing these with a standard interviewerbased multiple-pass 24-hour dietary recall, hereafter called the multiple-pass recall (MPR) (23). Eligibility criteria were aimed at recruiting participants who were broadly representative of the adult general population. Participants were eligible for the study if they were between 18 and 65 years of age and were maintaining a stable weight, confirmed by no substantive weight loss or weight gain over the course of the study (>5% weight change from first clinic appointment). Further criteria included regular access to high-speed Internet service, use of a telephone, and ability to speak and read English so the participant could complete the online questionnaires and 24-hour recalls. Participants had to be willing to visit the Clinical Research Facility at Hammersmith Hospital, London, United Kingdom (Imperial College Healthcare NHS Trust), to provide blood and urine samples.
Participants were identified between 2014 and 2016 through a multidisciplinary network of primary-care professionals and practices, the North West London Primary Care Research Network, and persons known to the Clinical Research Facility who had previously expressed an interest in participating in research projects. Participants were also identified from a list of local addresses provided by the post office. Participants were not a subsample of the UK Biobank subjects but an independent sample designed to be of a similar age and sex distribution as the UK Biobank subjects. Upon completion of the study, participants were provided with modest financial reimbursement for their time. The recruitment target was 200 participants with complete information collected (see Web Appendix 1, available at https:// academic.oup.com/aje).

Overview of study design
Each participant provided 3 sets of urine samples and accelerometry data for reference measures (recovery biomarkers, predictive biomarkers, and total energy expenditure (TEE)) and completed 3 MPRs and 3 Oxford WebQ online dietary questionnaires, all spread over a 5-week period. This data collection was achieved in 3 separate cycles, carried out 2 weeks apart (Figure 1). At the start of each cycle, participants provided urine and accelerometry for the set of reference measures, followed by a dietary assessment 1-3 days later and another dietary assessment 2-4 days after that. The order of the dietary assessments within each cycle was allocated by simple randomization, to reduce order effects. Each of the assessments is described in detail below.

Biomarkers
Participants provided 24-hour urine samples, discarding the first morning void and then collecting every subsequent urine specimen for the remaining 24 hours, ending with the last specimen the following morning. Urine specimens were then returned to the clinic on the day that collection ended. Urine volumes were recorded, and urine was then aliquoted into separate 50-mL aliquots before being stored at −20°C and transported to the Molecular Epidemiology Unit at the University of Leeds (Leeds, United Kingdom). The Kjeldahl method (24) was used to measure the total urinary nitrogen content of the samples. Participants took 3 80-mg 4-aminobenzoic acid (PABA) tablets with meals during the course of the 24-hour urine sampling period for confirmation of sample completeness (25). The concentration of PABA in the urine was measured using high-performance liquid chromatography. We considered 93% PABA to indicate complete urine collection over the 24-hour period, but 85%-110% was permissible, consistent with previous research (25).
Protein intake was estimated on the basis of the assumption that 81% of nitrogen is excreted within 24 hours (26). Potassium intakes were estimated from the amount excreted in the urine, as measured by the Clinical Biochemistry Department at  Each 24-hour dietary assessment (the Oxford WebQ online tool and the interviewer-based multiple-pass 24-hour recall, in random order) and selected reference measurements (recovery biomarkers, predictive biomarkers, and total energy expenditure) were completed on 3 different occasions separated by approximately 2 weeks. On each occasion, the reference measurement was followed 1-3 days later by the first dietary assessment, which was followed approximately 2-4 days later by the second dietary assessment.
the Leeds Teaching Hospitals NHS Trust using an ADVIA 2400 Clinical Chemistry System (Siemens AG, Munich, Germany) with ion-selective electrode detection. We assumed that 80% of potassium intake is excreted in the urine (27).
Urinary concentrations of fructose and sucrose were measured using a Sucrose/D-Glucose/D-Fructose assay (Boehringer Mannheim/R-Biopharm AG, Darmstadt, Germany) scaled down to a microplate format. Daily excretion of urinary sucrose and fructose was then estimated on the basis of total urine volume collected over 24 hours. The predicted intake of total sugars for each individual, allowing for age and sex, was then estimated using a calibration equation derived from previous feeding studies comprising 30 days' intervention in a metabolic suite under controlled conditions (28,29).

Total energy expenditure
Resting energy expenditure was measured using open-loop indirect calorimetry (Gas Exchange Monitor; GEM Nutrition, Daresbury, United Kingdom), assessed at the research facility when participants came for their clinic visit. The calorimeter was calibrated, and volunteers lay in a semirecumbent position. Following stabilization of measurements, oxygen consumption (volume of oxygen (VO 2 )) and carbon dioxide production (volume of carbon dioxide (VCO 2 )) were recorded every minute for 15 minutes. The mean values of the last 10 sets of measurements were used to estimate resting energy expenditure (30). Activity energy expenditure was also estimated, using 3plane accelerometry, by means of a SenseWear armband miniaccelerometer (BodyMedia Inc., Pittsburgh, Pennsylvania). This was worn for 24 hours on the left upper arm on one of the days before each clinic visit. The thermic effect of food was assumed to be 10% of TEE (31). TEE was estimated by summing resting energy expenditure, activity energy expenditure, and the thermic effect of food, with estimated TEE indicating total energy intake, provided that the participant remained in energy balance. This method has previously demonstrated close agreement with energy expenditure estimated using doubly labeled water (32). TEE estimates for participants with more than a 5% weight change over the course of the study were excluded. Within-person variability was taken into account in statistical analysis for all repeated measures.

Oxford WebQ online dietary questionnaire
The development of the Oxford WebQ online dietary questionnaire has been fully described elsewhere (11,17,18). Briefly, the tool was designed as a Web-based dietary questionnaire that was easy to use by both participants and researchers in large-scale observational studies, through extensive piloting and iterative improvement. The Oxford WebQ presents participants with 21 broad food groups, with options then expanding to offer over 200 commonly consumed foods and drinks. The participants are prompted to select the amount consumed over the previous 24 hours, mostly from predefined categories offered to them. To facilitate large-scale automatic coding of nutrient information, use of free-text boxes is minimized. Upon completion of the tool, the participants are presented with a summary page of all the food and drink items they reported consuming, together with the amounts reported, and are asked to make any necessary amendments. Completed questionnaires are coded automatically through multiplication of amounts consumed by the nutrient contents specified in standard United Kingdom food composition tables (33), producing a profile of the intake of 21 separate nutrients, without any additional intervention required by nutritionists.

Interviewer-administered 24-hour recall
To facilitate comparison of the Oxford WebQ with an equivalent interviewer-administered tool, participants also completed an MPR, which was conducted over the telephone by a trained researcher using a prompt sheet based on the 5step multiple-pass method (34). Participants were asked to provide details on cooking methods, brand names, and portion sizes. Nutrient intake was estimated using Dietplan6.7 software (Forestfield Software, Horsham, United Kingdom), based on the same food composition tables as the Oxford WebQ (33). Trained researchers matched the food and drink items recorded to the food composition tables and applied the portion sizes using a standard operating protocol, which is described fully elsewhere (35).

Statistical analysis
Urine samples with 2 or more voids missed during the 24hour period were excluded. Apart from this, the main analyses included all participants (36). The robustness of urinary biomarker results to completeness of the urine samples was assessed by conducting a sensitivity analysis including only participants who had complete PABA recovery (85%-110%) or whose PABA recovery was 50%-85% with their urinary nitrogen and potassium rescaled to the 93% PABA recovery expected for complete recovery, consistent with previous research (37).
We present results for both nutrients and nutrient densities. The densities are defined as the ratio of nutrient intake (g) to energy intake (MJ) measured by the same dietary assessment tool to represent energy-adjusted quantities derived from the tool. All nutrient intake and nutrient density data were logtransformed prior to statistical analysis to better approximate normal distributions. All statistical analyses were performed in Stata, version 14.2 (StataCorp LLC, College Station, Texas) (38).

Measurement error models
A measurement error structure similar to that used by the Observing Protein and Energy Nutrition (OPEN) Study and the European Prospective Investigation Into Cancer and Nutrition (EPIC)-Norfolk (20, 39) was assumed, including linear associations between the longer-term true intake and both the biomarkers and self-reported intakes. We assumed person-specific systematic biases for both self-report tools, which were assumed to be correlated. We also assumed a systematic bias related to level of intake.
Our measurement error model follows that proposed by Kipnis et al. (19,39). For Oxford WebQ estimate Q ij , interviewerbased MPR F ij , and biomarker M ij on person i at occasion j, where T i is the true intake for individual i; μ Qj and μ Fj represent possible drift over time between measures; β Q0 , β Q1 , β F0 , and β F1 are biases, where β Q0 and β F0 are additive components associated with each tool and β Q1 and β F1 are multiplicative components; and r i and s i model the person-specific biases for each tool. We allow these person-specific biases to be correlated with ρ(r, s) ≠ 0, because the same mechanisms may be influencing both r i and s i . We assume independent within-person errors ε ij and u ij that follow normal distributions with mean 0 and variances σ ε 2 and σ u 2 , respectively. We assume that there is no person-specific bias associated with biomarker M ij and that within-person error v ij follows a normal distribution with mean 0 and variance σ v 2 and is independent of the true intake and other error components. For analyses assessing estimated intake from the Oxford WebQ based on the average of k serial measurements, variance σ ε 2 is replaced by σ ε 2 /k. We assume that correlation between biomarkers and dietary assessment measures does not vary by proximity in time because, for each cycle, the biomarker measure is completed before the dietary assessment day, and the gap between the final dietary assessment of a previous cycle and the biomarker collection for the next is short. Subsequent exploration of the observed correlation structure was consistent with this assumption (data not shown).
We assume that associations between urinary sucrose/ fructose excretion and total sugar intake were similar to those in previously published feeding studies (28,29), allowing us to apply calibration equations derived from those studies: where M * ij is the calibrated biomarker value, M ij is the observed biomarker value, S i is 0 for men and 1 for women, and A i is log-transformed age. M * ij was then used in place of M ij in the measurement error model defined above. Participants in the feeding study from which this calibration equation was derived were healthy adults aged 23-66 years (28), similar to the OPEN study population of healthy adults (ages 40-69 years) (29) and the UK Biobank population (ages 40-69 years) (12).

Model-fitting
The measurement error models were fitted as structural equation models using maximum likelihood estimation, assuming that any missing data points were missing at random. Results are presented as attenuation factors indicating the extent to which estimated diet-disease associations are diluted using the Oxford WebQ. Attenuation factors closer to 1 indicate less bias in diet-disease estimates. The correlation between the Oxford WebQ and the latent variable in the structural equation model estimating true longer-term intake is also presented to indicate the amount of power lost in prospective studies using the Oxford WebQ. This correlation also represents the attenuation of log relative risks between equal-sized categories of intake estimated by the Oxford WebQ (5,6,20,40). The Oxford WebQ is designed for repeated administration (17,18), and in the UK Biobank, participants were invited to complete it on up to 5 separate occasions over a 16-month period, with the majority of responders completing it twice (41). We therefore present the predicted attenuation factors for the mean of several repeat administrations and derived from our estimated measurement model parameters. This takes the same approach as that used by Schatzkin et al. (40). We focus on the mean of 2 administrations to reflect current use in the UK Biobank.
The mean differences between the Oxford WebQ and recovery biomarkers (for protein, sodium, and potassium), the predictive biomarkers (sugars), and total energy intake (accelerometry) are presented. For each participant, this was based on the mean intake over the repeated cycles of the Oxford WebQ measures minus the mean over the repeated cycles of the biomarker and energy expenditure measures, back-transformed and expressed as a percentage. This is equivalent to the mean difference estimated by the Bland-Altman method of assessing agreement (42).

Subgroup analyses
We repeated the analyses with stratification on sex, age (<40 vs. ≥40 years), and BMI (<25 vs. ≥25) to quantify the robustness of results to different participant characteristics and to explore the possible impacts of differences in personspecific biases.

Ethics
The validation study was conducted according to the guidelines laid down in the Declaration of Helsinki. Full written informed consent was obtained from all participants included. The procedures of the validation study and associated documentation were reviewed and approved by the West London NHS Research Ethics Committee.

RESULTS
In total, 225 potential participants were invited to undergo screening for eligibility. Of these, 7 persons (3%) were ineligible, 30 (13%) did not consent, and 27 (12%) subsequently withdrew consent. The remaining 161 persons (72%) completed Oxford WebQs and MPRs and provided samples for biomarkers on at least 1 occasion. After exclusion of missed voids, data were available for analysis from 160 participants, 152 (95%) of whom completed the Oxford WebQ after visit 1, 146 (91%) of whom completed it after visit 2, and 147 (92%) of whom completed it after visit 3; 130 participants (81%) completed all 3 WebQs. Of these, 434 WebQs (98%) were completed on weekdays. The median amount of time needed to complete the Oxford WebQ was 10 minutes (interquartile range, 10-15 minutes).
Demographic characteristics of the participants at recruitment are shown in Table 1. Participants appeared metabolically stable over the course of the study, with weights changing by more than 5% of weight at booking for only 6 (4%) participants. The energy expenditure readings of those participants were excluded from the analysis.
Estimated geometric mean intakes of protein, potassium, and total sugar and their associated nutrient densities are shown in Table 2 for the Oxford WebQ, the MPR, biomarkers, and reference tools relating to the first clinic visit. Estimated intakes from the Oxford WebQ were broadly similar to those from the MPR for all nutrients. Compared with biomarker measures, the Oxford WebQ overestimated protein and potassium intakes and underestimated total sugar intake, with estimated total energy intake less than the estimated TEE.
Attenuation factors and correlations between the self-report tools and estimated true longer-term intake for a single application of the Oxford WebQ are shown in Table 3. For nutrient densities, attenuation factors were slightly higher and correlations were slightly lower than for unadjusted nutrient intakes. The full list of parameters estimated from the measurement models is shown in Web Table 1. Table 3 also shows the mean percentage difference between the self-report tools and the biomarker measures (Table 3). Mean percentage differences for the Oxford WebQ were similar to those for the MPR.
Using the mean of a series of 2, 3, 4, or 5 repeat administrations of the Oxford WebQ would substantially improve measurement properties (Table 4), with an associated reduction in bias. With 2 repeats of the tool, the most likely use within the UK Biobank as it currently stands, the attenuation factors and the correlation with true intake would improve markedly. With more repeats of the tool, the attenuation and the correlation with true intake would improve further.
When urinary biomarker concentrations were adjusted for completeness of urine samples and samples with PABA recovery less than 50% or more than 110% were excluded, attenuation factors and correlations were essentially unchanged (see Web Appendix 2).
There was some variation between subgroups defined by age group, sex, and BMI (Web Tables 2-4). Attenuation factors for protein, potassium, and sugar intake were higher in men than in women. Attenuation was worse in older people (age ≥40 years) than in younger people (age <40 years) for protein but similar between age groups for total sugars and better in older people for potassium and total energy intake. Participants with BMI ≥25 had attenuation broadly similar to that of people with BMI <25, but with generally greater disparities for correlation with the truth.

DISCUSSION
Our findings show that the Oxford WebQ dietary assessment tool being used in the UK Biobank and a sample of the Million Women Study has good measurement error properties, improving further when the mean of several measures is taken.
The Oxford WebQ tends to overestimate potassium intake and underestimate total sugar intake, but these figures were similar for the interviewer-administered MPR. Additionally, the Oxford WebQ is of broadly equivalent validity to the MPR in terms of attenuation of diet-disease associations. This held across 3 nutrients that could be measured by recovery biomarkers and other objective reference tools. For total sugars, the Oxford WebQ performed better than the interviewer-administered MPR. However, the Oxford WebQ is substantially quicker and cheaper to implement (17,18).
The Oxford WebQ compares well to a recently validated online 24-hour recall used in the United Kingdom (23). The validity of the Oxford WebQ is also broadly similar to 24-hour recalls that have been validated in the United States (5, 6), though our finding that protein is overreported and potassium underreported in the United Kingdom contrasts with the underreporting of protein and unbiased reporting of potassium in the United States, on average. This may reflect the shorter length of assessment in our study compared with most US studies or different cultural perceptions of foods with high concentrations of those nutrients.
FFQs generally estimate diet over a longer time scale than the 24-hour period covered by the Oxford WebQ. Similarly, in common with 24-hour recalls, the Oxford WebQ cannot estimate past diet, while FFQs may be used for this purpose. However, repeated measures of the Web-based Oxford WebQ tool throughout follow-up, covering different seasonal intakes and reflecting dietary changes as the cohort ages, can provide an estimate of long-term diet in a more prospective manner. The correlations we found between the mean of 2-5 Oxford WebQ estimates and the truth were also better than those previously  Abbreviation: CI, confidence interval. a Nutrient density for protein, potassium, and total sugars was expressed in grams per MJ of total energy intake.  reported for FFQs (5,6,29), though no better for nutrient densities. The improvement in measurement properties upon repeat administration reflects how the Oxford WebQ is currently being used in the UK Biobank (18,41). It is possible that a tool optimized for mobile phones could be used in a more prospective manner, further improving performance. We have focused on use of the Oxford WebQ to estimate true longer-term diet and the extent to which measurement error in this estimated exposure could lead to attenuated estimates of the association with disease outcomes. This application of the tool for estimation of longer-term diet is the most relevant to large-scale cohort studies with long follow-up. Dietary exposures are often categorized to simplify presentation and because it is harder to estimate absolute intake precisely than to simply rank intakes from low to high. We therefore present the correlation between the Oxford WebQ intake and estimated true longer-term intake, which reflects the attenuation in diet-disease estimates based on ranked exposures. The Oxford WebQ generally performed slightly better according to this criterion. The Oxford WebQ performed well in comparison with other tools assessed using the same statistical methodology (21,23).
In assessing the validity of the Oxford WebQ, we used objective biomarkers that are free from person-specific biases shared by self-report tools. Our validation was therefore more robust than validation using another self-report tool that may agree well partly because it shares this same bias. However, 45% of urine samples contained less than 85% PABA recovery, which could have led to underestimation of the agreement between self-reported diet and urinary biomarkers. Biomarker data were collected at clinic visits prior to completion of the dietary recalls, but participants were not informed of results, which minimized potential recall bias. Had biomarker collection coincided with dietary recall measures, there may have been greater agreement between them.
We did not use doubly labeled water to estimate TEE, which is a potential weakness in our study. In addition to estimated energy intake, this could also have affected nutrient density estimates and could partly explain why our results differed from those of previous studies which generally found that using densities improved measurements. However, use of activity monitor equipment provided an equally objective measure, which we used instead. It is a potential weakness that activity monitors were only worn for 1 day during each cycle, but within-person variation was still estimable because of the repeated cycles.
Unfortunately, not all nutrients have adequate reference tools such as recovery biomarkers (43). This is another potential weakness of our study that is shared by other validation studies of dietary assessment tools. It is therefore possible that the Oxford WebQ performs better or worse for other nutrients than those we were able to validate it against, particularly those derived from episodically consumed foods, for which 24-hour recalls are not well-suited. Where this is particularly important, combination with other dietary assessment tools is recommended (9,(44)(45)(46)(47).
Our measurement error models also only considered one error-prone variable at a time. In the presence of additional error-prone covariates, the error structure becomes more complex and the direction of bias may change. This commonly occurs when a nutrient and total energy intake are included in the same model. We therefore present estimates for nutrient densities as well.
Internet applications such as the Oxford WebQ are potentially more accessible to some groups, such as the younger or better educated. To address this concern, our validation study included both men and women, with a spread of ages and a range of educational backgrounds. Additionally, we repeated our analyses by age, sex, and BMI. Results were broadly comparable between men and women. Results suggested that the online format was not a deterrent to the quality of reporting in older participants. Participants with higher BMI had similar attenuation factors, but correlation with the truth was worse for total sugar and total energy intake, suggesting greater person-specific bias in reporting certain food types in this group. This provides some support for taking BMI into account in measurement error models, as others have proposed (48,49). While the Oxford WebQ was specifically developed for the UK Biobank and the Million Women Study, the wide age range used in our validation and the exploration within demographic subgroups provide a basis for its use in other large-scale prospective studies.
Our results indicate that repeat applications of the Oxford WebQ in large-scale projects such as the UK Biobank and the Million Women Study should provide high-quality dietary information, at least for intakes of total energy, protein, sugars, and potassium. The Oxford WebQ provides results broadly similar to those obtained using the more researcher-intensive and expensive-to-administer 24-hour recall delivered and coded by a trained researcher. This should facilitate additional dietary assessments repeated over time to measure long-term diet with greater precision, providing a platform for better estimates of the relationships between diet and disease.