Measuring Quality of Life in Carers of People With Dementia: Development and Psychometric Evaluation of Scales measuring the Impact of DEmentia on CARers (SIDECAR)

Abstract Background and Objectives A 2008 European consensus on research outcome measures in dementia care concluded that measurement of carer quality of life (QoL) was limited. Three systematic reviews (2012, 2017, and 2018) of dementia carer outcome measures found existing instruments wanting. In 2017, recommendations were published for developing reliable measurement tools of carers’ needs for research and clinical application. The aim of this study was to develop a new instrument to measure the QoL of dementia carers (family/friends). Methods Items were generated directly from carers following an inductive needs-led approach. Carers (n = 566) from 22 English and Welsh locations then completed the items and comparator measures at three time points. Rasch, factor, and psychometric (reliability, validity, responsiveness, and minimally important differences [MIDs]) analyses were undertaken. Results Following factor analysis, the pool of 70 items was refined to three independent scales: primary SIDECAR-D (direct impact of caring upon carer QOL, 18 items), secondary SIDECAR-I (indirect impact, 10 items), and SIDECAR-S (support and information, 11 items). All three scales satisfy Rasch model assumptions. SIDECAR-D, I, S psychometrics: reliability (internal ≥ .70; test–retest ≥ .85); convergent validity (as hypothesized); responsiveness (effect sizes: D: moderate; I and S: small); MIDs (D = 9/100, I = 10/100, S = 11/100). Discussion and Implications SIDECAR scales demonstrate robust measurement properties, meeting COSMIN quality standards for study design and psychometrics. SIDECAR provides a theoretically based needs-led QoL profile specifically for dementia carers. SIDECAR is free for use in public health, social care, and voluntary sector services, and not-for-profit organizations.

In 2017, the Alzheimer's Association estimated that carers of people with dementia in the United States provided 18.4 billion hours of unpaid care annually, equating to a cost of $232.1 billion (Alzheimer's Association, 2018). We define a "carer" as someone who is unpaid in supporting a friend or family member with dementia who cannot manage without their assistance. These many hours of unpaid care protect society from a huge financial burden but possibly at considerable personal carer cost. Although some carers report positive outcomes, many report negative impacts on their quality of life (QoL) (Alzheimer's Association, 2018). Reasons for diminished carer QoL are complex (Sörensen & Conwell, 2011), and it has been recognized that the physical and mental health needs of carers should be tracked in order to limit preventable carer burden (Khachaturian et al., 2017). Evaluating the effectiveness of help, in terms of carer benefit and value for money, is a responsibility of service providers and clinical trialists. One method either to identify carers at risk of reduced QoL or to measure intervention effectiveness is to assess carer QoL. Many approaches have been taken to assess dementia carer outcomes with a multitude of existing, and overlapping generic, carer-specific and dementia carer-specific questionnaires employed used in clinical/social care practice, clinical trials, service evaluation, and economic evaluation (Moniz-Cook et al., 2008;Mosquera et al., 2016). In their systematic review of measures of the impact of caring for an elderly relative, Mosquera and coworkers (2016) conceptualize QoL and caregiver burden as two major constructs that represent the highest levels of integration of domains of impact, as opposed to scales that integrate a narrower range, such as questionnaires on impact of caregiving on physical health or psychosocial functioning. They suggest that QoL is a more generic construct while caregiver burden is more specific. QoL is essentially a positive construct which brings together a range of dimensions that, when fulfilled, constitute "the good life" (Lawton, 1983, p. 349), in contrast to burden which reflects the impact of stress and strain. This connects well with the current emphasis on positive psychology and well-being (Seligman, 2018) and recognition of the need to research the positive as well as the negative aspects of caregiving (Quinn et al., 2019).
Three systematic reviews of QoL questionnaires for use with dementia carers have been completed, but these reviews reached no consensus about which questionnaire, if any, delivers the measurement standards that are required across both descriptive and scoring/valuation systems (Dow et al., 2018;Jones, Edwards, & Hounsome, 2012;Page et al., 2017). The World Health Organization defines QoL as multidimensional including physical, psychological, and social domains, as a minimum (Kuyken et al., 1995). However, carers may be physically and psychologically well and have no functional limitations but be severely restricted in their everyday lives directly or indirectly by their caring responsibilities. It is therefore challenging to identify a QoL questionnaire that is relevant to dementia carers, meets measurement standards, and is "fit for purpose" for varied objectives.
One approach to measuring QoL originates from a "needs-based" theoretical premise, where: "Life gains its quality from the ability and capacity of the individual to satisfy his or her needs," with QoL high when needs are fulfilled and low when few needs are fulfilled (Hunt & McKenna, 1992, p. 307). This model is therefore conceptually unidimensional, meaning all items (questions) reflect a single underlying latent trait. Questionnaire content is derived "bottom-up" from the "client group" only, rather than from a professionally driven agenda (Doward, McKenna, & Meads, 2004). The needs-led approach focuses on fundamental human needs (e.g., need for affection, for freedom) (Maslow, 1943) rather than more external or servicerelated needs (e.g., for information, for services) (McCabe, You, & Tatangelo, 2016). This approach may be particularly relevant to carers of people with dementia. A model, based on the needs-led premise, delineates the direct relationship between needs fulfilment and QoL in people with dementia (Scholzel-Dorenbos, Meeuwsen, & Olde Rikkert, 2010). This approach has been recommended in developing QoL questionnaires for carers of people with dementia (Bangerter, Griffin, Zarit, & Havyer, 2019).
The subject of this paper is the development and psychometric evaluation of the new QoL questionnaire known as SIDECAR (Scales measuring the Impact of DEmentia on CARers). A heath economic valuation of SIDECAR will be published separately.

Item Generation and Response Format
Item generation, reported in detail elsewhere Pini et al., 2018), comprised interviews, to capture the impact of caring, with 42 carers of a relative with dementia (Alzheimer's disease, vascular dementia, other forms of dementia) living in the community. Where possible, exact phrases provided the wording for the initial 99 items generated. These were subject to checks regarding ambiguity, content, and face validity. Twenty-two cognitive interviews with carers pretested and assessed response formats. Final review and an administration rehearsal with two carers resulted in an item pool of 70 dichotomous (agree/disagree) items, some positively and others negatively phrased .

Study Design
Participants were invited to complete a questionnaire pack at three time points: Time 1 (T1), following consent; Time 2 (T2), 2-4 weeks later; and Time 3 (T3), for a subsample (due to time constraints), 6 months after Time 1.

Participants
Participants were English-literate primary carers, at least 16 years of age, supporting a partner or family member with a diagnosis of dementia living in the community. Twenty-two clinical network teams in England and Wales recruited carers via health and social care services (e.g., memory clinics); third-sector organizations (e.g., charities); the National Institute for Health Research Join Dementia Research (JDR: https://www.joindementiaresearch.nihr. ac.uk/) database (Juaristi & Dening, 2016); and carers involved in the "IDEAL" study (Improving the Experience of Dementia and Enhancing Active Life) at the time of their third interview (Clare et al., 2014).

SIDECAR item pool (T1, T2, and T3)
This comprises 70 short statement items generated from the qualitative interviews such as, "I have had to put my own life on hold," with response options: "agree"/"disagree" . The time frame relates to "today." Short Warwick-Edinburgh Mental Well-being Scale (T1 and T3) Short Warwick-Edinburgh Mental Well-being Scale (SWEMWBS) is the shortened seven-item scale derived from the Warwick-Edinburgh Mental Well-being Scale (WEMWBS) (Stewart-Brown et al., 2009). The positively worded items cover feeling and functioning aspects of mental well-being over the preceding 2 weeks, for example: "I've been feeling useful." Each item has five response categories ("none of the time" through to "all of the time") which, when summed, create a score ranging from 7 to 35, with higher scores denoting higher well-being.

Sociodemographic details
At T1, carers provided sociodemographic details (e.g., age, sex, relationship to the cared for person) and information about the person cared for (e.g., age, dementia diagnosis). At T2 and T3, carers reported whether their caring situation had changed (better or worse) or remained the same since the last time point. At T3, carers rated any change in their overall QoL in the last 6 months using a 5-point response option ("Much worse," "Worse," "About the same," "Better," and "Much better").

Sample Size
A sample size of 400 was targeted, based on Rasch analysis requirements to provide stable item calibrations (Linacre, 1994), and avoid Type I errors (Hagell & Westergren, 2016).

Scale Development
Participants with more than 90% item pool responses missing were excluded (n = 4). Across all other participants (n = 566), the mean amount of missing responses across the 70-item pool was 1 (SD 4.28; median 0 [interquartile range 0-0]), with 80% of participants having complete data. For factor analysis and Rasch analysis, all available data at T1 were used, without imputation.

Exploratory factor analysis
Earlier work indicated that the item pool covered several themes , suggesting the complete item set may not lend itself to measuring a single overarching construct. Therefore, preliminary exploratory factor analysis (EFA) was undertaken on the item pool using MPlus 7.4 (Muthén & Muthén, 1998. A tetrachoric correlation matrix and GEOMIN rotation were used to account for the ordinal nature of the data, with indicators of model fit provided by the root mean square error of approximation (RMSEA), comparative fit index (CFI), and Tucker-Lewis index (TLI). EFA identified item sets/factors (of different constructs) that were taken forward for further refinement through Rasch analysis.

Rasch analysis
The Rasch model is a unidimensional measurement model that satisfies the assumptions of fundamental measurement (Luce & Tukey, 1964;Newby, Conner, Grant, & Bunderson, 2009), meaning it provides a measurement template against which scales can be tested. Essentially, Rasch Measurement Theory (RMT) provides a way to assess multi-item latent scales to ensure it is valid to add the items together to form an overall total score. The application of RMT provides a unified confirmatory framework for several aspects of internal construct validity to be assessed, highlighting measurement anomalies within an item set.
Rasch analysis was completed with RUMM2030 software (Andrich, Sheridan, & Luo, 2010). All items were assessed for: individual fit to the Rasch model, relative to the item set, to test whether each item was contributing to the same underlying construct (nonsignificant at Bonferroni-adjusted chi-squared p-value, standardized [z-score] fit-residuals within ±2.5); local dependency, to determine whether the response to any item has a direct impact on the response to any other item (Q3 criterion cut point = .2 above average residual correlation- Christensen, Makransky, & Horton, 2017); item bias, in the form of uniform and non-uniform differential item functioning (DIF) by age, gender, and carer relationship (spouse/other) (nonsignificant at Bonferroni-adjusted analysis of variance p-value); and scale targeting (relative distribution of item and person locations) (Hagquist, Bruce, & Gustavsson, 2009). Additionally, a series of t tests was used to assess the unidimensionality assumption (Smith, 2002), where evidence of unidimensionality is apparent when independent subsets of items deliver significantly different person estimates, and the lower bound 95% confidence interval (CI) percentage of significantly different t tests is <5%.
When the assumptions of the Rasch model are satisfied, the sufficiency of the raw score allows for a linear, intervallevel transformation of scores (Tennant & Conaghan, 2007). For all individuals, raw SIDECAR scale scores correspond with an interval-level logit value which was extracted from the Rasch analysis software. The linear logit values were subsequently converted into 0-100 scale values in order to aid interpretability.

Psychometric Evaluation
Basic descriptive statistics were run for each scale, including floor and ceiling effects.
Internal Consistency and Convergent Validity were assessed using T1 data, which provides the largest available sample of independent responses.

Internal consistency reliability (T1 data)
This was assessed using Cronbach's alpha, in addition to a Person Separation Index (PSI), derived from the Rasch analysis. The PSI should be interpreted in a similar way to Cronbach's alpha, but it uses the Rasch-derived linear scores rather than raw scores, and it also takes into account the relative targeting of the scale. A minimum alpha value of .7 was set (Nunnally & Bernstein, 1994). Cronbach's alpha values are only available when calculated for cases with complete data.

Test-retest reliability (T1 and T2 data)
Responses from participants recruited from March 2017 to study close, who returned the T2 survey within 6 weeks of the original survey and with "no change" in their caring situation, were included.

Convergent validity (T1 data)
Based on clinical experience, a negative correlation was hypothesized between SIDECAR scales and well-being (SWEMWBS) and to a lesser extent, with health valuation (EuroQol Group Visual Analogue Scale [EQ-5D VAS]). Spearman's rank correlation assessed the strength and direction of these associations. COSMIN recommends the following guidance for interpretation of the correlation coefficients: correlations measuring a similar construct should be ≥ .50; correlations with instruments measuring related but dissimilar construct should be lower, that is, .30-.50; correlations with instruments measuring unrelated constructs should be < .30. Correlations defined previously should differ by a minimum of .10 (Mokkink et al., 2018).

Responsiveness (T1 and T3 data)
Responsiveness represents an instrument's ability to detect changes over time, and minimally important difference (MID) provides meaningful interpretation from the carer perspective (Revicki et al., 2006). All responsiveness indicators were based on the converted 0-100 SIDECAR scores of those that responded at T3 (n = 173). A number of anchor-based measures of responsiveness were calculated based on a self-reported worsening in QoL status between T1 and T3, pooling the groups stating their QoL was "worse" and "much worse" (n = 72; 41.6%). The responsiveness indicators reported are the effect size (ES), standardized response mean (SRM), responsiveness statistic (RS) (Revicki et al., 2006), and repeated measures effect size (RMES) (Morris & DeShon, 2002).
The MID was calculated relative to the group reporting no change in their QoL (n = 93; 53.8%) (Revicki et al., 2006). None of the responsiveness indicators are provided for the group reporting an improvement in self-reported QoL, due to insufficient numbers (n = 7; 4%).
In addition to the anchor-based indicators, smallest detectable difference (SDD) is a distribution-based indicator of responsiveness that was calculated based on the complete T1 sample.

Results
The Sample

Missing values
Four participants were excluded due to void responses. Excluding these participants (n = 566), all items had ≤3% missing values excepting one: "Receiving help is more hassle than it's worth" (6.7% missing).

Rasch analyses and factor analysis
Initial Rasch analysis of the 70-item set revealed extensive misfit and severe breach of the unidimensionality assumption, with a series of t tests reporting significantly different person estimates in 29% (lower CI = 27%) of cases (Table 2), suggesting a multidimensional item set. EFA identified four potential factors (RMSEA = .021, CFI = .966, TLI = 0.962) (Figure 1 and Supplementary Figure S1).
Rasch analysis of the four factors revealed varying levels of overall fit, and a number of individual misfit anomalies ( Table 2). Within each factor, scale refinement was conducted iteratively, where item misfit anomalies were identified and dealt with in order of their magnitude. Items displaying more than one aspect of misfit were selected as prime candidates for removal. This was undertaken in turn for all four factors starting with the first factor.
This process resulted in one primary scale of 18 items (SIDECAR-D), measuring the direct impact of caring on the carer and representing the a priori concept that "life gains its quality from the ability and capacity of the individual to satisfy their needs." Two secondary scales were derived which reflect aspects of carer QoL that are more dependent on external circumstances (Figure 1; Figure 2). One of these reflects the status/circumstance of the person being cared for and how that affects the carer, and has therefore been labeled as measuring aspects of "indirect impact of caring" (SIDECAR-I; 10 items). The third scale, "support and information" (SIDECAR-S; 11 items), largely concerns more practical external support and feels distinctly different to the other two scales, demonstrated by the positive wording. No resolution was possible for the fourth factor, which was also more conceptually ambiguous.
Within each factor, when the final refined item set had been configured, each of the removed items was individually added back into the final item set, in order to test whether the original source of misfit (and reason for removal) remained. This was the case for all removed items, thus precluding any reintroductions. Please see Supplementary Figure S1 for items that were not retained in the final SIDECAR scales.
SIDECAR-D satisfies all assumptions of the Rasch model, at both the individual-item level and the scale level, indicating a unidimensional, psychometrically robust scale. SIDECARs I and S mostly satisfy the Rasch model assumptions, but some potential borderline issues remain. However, these supplementary scales contain useful information and are satisfactorily robust (see Table 2).

Psychometric Evaluation
Descriptive statistics are provided in Table 3. Score distribution was good across all scales, with no evidence of significant floor or ceiling effects. All scales demonstrated acceptable internal consistency.

Test-retest reliability
Responses from 100 carers met inclusion criteria. Two items had "fair" kappa values ( .38; .40), the rest were moderate or good (excepting one where kappa was not calculated as all participants "agreed" at T1). SIDECAR scales demonstrated very good overall test-retest reliability.

Convergent validity
Convergent validity was supported with all scales negatively correlated more highly with SWEMWBS than with EQ-5D VAS, as hypothesized. All differences between the correlation coefficients of the SIDECAR scales with the two measures were greater than .10. Table 4. SIDECAR-D demonstrated a moderate responsiveness ES, with the supplementary SIDECARs I and S demonstrating a small ES. Using the higher 95% CI for the MIDs to indicate a worsening in QoL (a higher score), the MID values are 8.71, 9.73, and 10.96 for SIDECAR-D, I, and S, respectively (on a 0-100 linear scale). These values represent the score shift that is meaningful from the carer's perspective. The items included in the three SIDECAR scales are indicated in Figure 1, and the final scales and scoring algorithms are available via (www.licensing.leeds.ac.uk).

Discussion and Implications
We have described the development and psychometric evaluation of SIDECAR-D, I, and S, a questionnaire designed to evaluate the QoL of carers of people with dementia for use in clinical/social care practice, research, and service evaluation. The research has met recognized international criteria set by COSMIN in terms of not only the quality of the study, but also the psychometric properties reported (Mokkink et al., 2010). In line with other "needsled" QoL questionnaires (Doward et al., 2004), a higher score indicates poorer QoL, reflecting the increasing impact of caring for someone with dementia. In this respect, the questionnaire has considerable overlap with scales of carer burden. If high, burden would be expected to impact negatively on QoL. A comparison of items of SIDECAR with those in the 22-item Zarit Burden Interview (ZBI) (Zarit, Orr, & Zarit, 1987, pp. 83-85) indicates that six items from the two scales are very similar (e.g., "I often feel I want to escape my caring responsibilities." [SIDECAR] cf "Do you wish you could leave the care of your relative to someone else?" [ZBI]). The remaining items of the ZBI tap subjective impact of a range of issues mostly with direct reference to the person who is cared for (e.g., Do you feel you should be doing more for your relative?) whereas the items in SIDECAR-D, in particular, refer back to the carer (e.g., I feel guilty if I do something for myself.) In this respect, this difference reflects that described by Mosquera and coworkers (2016) of burden being more specific and QoL being more generic.
Although the primary focus was on QoL derived from fundamental universal human needs (Hunt & McKenna, 1992), our study has resulted in three SIDECAR scales reflecting differing needs-led QoL domains. SIDECAR-D arises directly from universal human needs, whereas SIDECAR-I reflects a more indirect impact of caring on QoL, and SIDECAR-S has a more external focus on support and information needs. SIDECAR scales may be used independently, or alongside each other to provide a profile of QoL across these domains.
It has been recognized that the social impact of the continuing increase in dementia prevalence will be ongoing (Khachaturian et al., 2017). The well-being of family carers is paramount to prevent further escalation of the issue, and therefore relevant measurement tools are necessary to monitor carer QoL. The universality of the needs-based model may provide the basis for generalized measurement, enabling international comparisons.
Needs-based QoL scales have been created for a variety of specific patient groups, for example with Crohn's disease (Wilburn, McKenna, Twiss, Kemp, & Campbell, 2015) and ankylosing spondylitis (Doward et al., 2003). A more generic needs-led questionnaire, the CASP-19, is used widely in studies of early old age (Hyde, Higgs, Wiggins, & Blane, 2015;Hyde, Wiggins, Higgs, & Blane, 2003). Recently, a specific questionnaire for spouses of people with Alzheimer's Disease was robustly developed and evaluated. Although it was initially intended for all family carers of those with Alzheimer's disease, the psychometric evaluation did not support wider family application; it thus is restricted to spouses/partner carers of those with Alzheimer's disease (Hagell, Rouse, & McKenna, 2018).
The rigorous conceptual and psychometric development of the SIDECAR scales demonstrates that they are all robust with wide application potential. The item reduction process ensured (within each scale) all items relate to the same construct (unidimensionality), and are statistically independent, thus validating a total scale score. Also, items are free from item bias (DIF) by age, gender, and carer relationship, meaning the scales operate equivalently across different types of informal carer (e.g., partner or child; male or female). All scales are appropriately targeted, meaning the scales cover the measurement range of carer QoL we wish to capture, with a minimal floor or ceiling effects. The fourth potential factor, which did not stand up to the rigorous standards demanded, contained items that were broadly associated with the emotional interaction between the person with dementia and their carer, but there was conceptual ambiguity within the item set, along with associated psychometric issues.
All SIDECAR scales performed well in psychometric tests, with SIDECAR-D demonstrating the strongest properties overall. SIDECAR scales exhibited "good" to "very good" internal and test-retest reliability. Confirmation of content and face validity was undertaken in the item generation phase of SIDECAR development (Oyebode et al., 2018). Establishing convergent validity requires relevant measures to be available for comparative purposes. Our hypothesis that "SIDECAR scales would be more closely correlated with well-being than health-related QoL" was substantiated.
The three reviews of QoL questionnaires for use with dementia carers highlighted the absence of responsiveness testing (Dow et al., 2018;Jones et al., 2012;Page et al., 2017), which can be reported in various ways (Mokkink et al., 2010;Revicki et al., 2006). One exception reported in the Dow and coworkers' (2018) review was the Caregiver Quality of Life Instrument (Mohide, Torrance, Streiner, Pringle, & Gilbert, 1988), but this was tested with nine carers only. Responsiveness evaluation was anchored on self-reported change in QoL over the preceding 6 months of carers as no objective external "gold-standard" was available. We demonstrated that all SIDECAR scales detected changes in QoL over time, although the ES for SIDECAR I and S were small. Using the same self-reported QoL anchor, MIDs were established for the 0-100 Raschconverted linear scores. However, there was no single MID value applicable across all populations and applications (Revicki et al., 2006), so this estimation should be repeated in future studies.

Limitations
Although the initial item generation was based on a diverse sample , and the sociodemographic characteristics of the sample were broadly representative of informal carers of people with dementia in the United Kingdom (Alzheimer's Research UK, 2018), there was under-representation of carers from minority ethnic groups. There are persisting barriers to use of mainstream dementia services by minority ethnic communities in the United Kingdom (Parveen, Peltier, & Oyebode, 2017). Assessment tools must be culturally sensitive to reduce any chance of measurement bias and to maximize inclusivity.
A point of debate during the item generation phase was the mix of positive and negative item phrasing . It was impossible to change the valence of some items and maintain the integrity of meaning conveyed by the carers, so item phrasing remained bi-directional. Although this was accounted for in the scoring of the items, the psychometric refinement resulted in one scale (SIDECAR-S) having all positively worded items, opposed to the two other negatively worded scales. Although it is not known whether this item set was identified purely due to the scoring direction, the content of the final SIDECAR-S item set suggests the items belong together conceptually.
Additionally, no gold-standard measure of carer QoL is currently available, so all responsiveness measures were based on carer self-assessment. Although self-assessment is an important and meaningful anchor, this should also be triangulated against other measures of change (Revicki et al., 2006).

Further Studies
Prospective testing of the questionnaire is planned with new samples in different clinical and voluntary sector settings.  (Clare et al., 2014;Silarova et al., 2018) will enable additional testing in a research context, allowing for the investigation of testable hypotheses of relationships between carer QoL and IDEAL study variables. Additional aims include to work towards the adoption of SIDECAR into the NHS Digital Indicator Governance Board library, to enable the impact of dementia on carers to be monitored at a national level, and to utilize SIDECAR within intervention trials, to gauge interventional impact on carers and to extend responsiveness testing.

Conclusion
The SIDECAR scales were derived directly from carers and satisfy rigorous psychometric criteria. The primary scale (SIDECAR-D) is firmly grounded in the fulfilment of universal human needs. SIDECAR scales may be used independently, or alongside each other to provide a profile of QoL. The results indicate SIDECAR may be useful in individual carer assessment, or at group level in research and service evaluation. The raw score of each SIDECAR scale is valid as an ordinal unidimensional score, but the satisfaction of Rasch model assumptions also means that a 0-100 interval-level equivalent transformation is available for complete data. SIDECAR-D has been subjected to a valuation analysis, which will be separately reported. SIDECAR is free for use in public health, social care, and voluntary sector services, and not-for-profit organizations. To use SIDECAR please register with the University of Leeds Fast Licensing Platform (www. licensing.leeds.ac.uk). The interview and questionnaire data are available via the University of Leeds data repository for academic purposes subject to request (https://doi. org/10.5518/433).

Supplementary Material
Supplementary data are available at The Gerontologist online.
Supplementary Figure S1. EFA Rotated Factor Loadings from 70-item set, reported for the 31 items that were not retained in final SIDECAR scales.