Abstract

This study compared the factor structure of the translated Wechsler Memory Scale-III (WMS-III), which is the latest available version in Norway, with the original U.S. version. A sample of 122 healthy, elderly Norwegians (mean age: 74; standard deviation = 8.8) completed the WMS-III. The factor structure of the translated WMS-III was tested, using Confirmatory Factor Analysis, with comparison of model fit based on five a priori hypothesized models. Several model fit indices pointed to a three-factor model (working memory, visual memory, and auditory memory) providing the best fit to the data. Our study supports updated findings of the original WMS-III in nonclinical samples and suggests that the translated version is structurally equal to the original. The study supports the cross-cultural validity of the WMS-III. However, based on the present data, one might expect scores on the Family Pictures subtest to fall below scores on other WMS-III subtests in elderly Norwegians.

Introduction

Since the first version was published in 1945, the Wechsler Memory Scale (WMS) has been a key instrument in the assessment of memory. Although the first edition of the scale gave only a global measure of memory function, later editions have distinguished between verbal/auditory memory and visual memory, as well as immediate and delayed memory and attention/working memory. The construction of WMS-III composites and indices was based on theoretical models and on the results of research on previous versions of the test. Subsequent confirmatory factor analytical studies have yielded somewhat conflicting results regarding the number of underlying factors. The WAIS-III/WMS-III Technical Manual (Tulsky, Zhu, & Ledbetter, 1997) initially stated that a three-factor model (working memory, visual memory, and auditory memory) was the best fit for data in the 16- to 29-year age group, whereas a five-factor model (working memory, auditory immediate memory, visual immediate memory, auditory delayed memory, and visual delayed memory) was the best fit for data from the 65- to 89-year age groups. The overall conclusion was that the analyses of WMS-III standardization data support a five-factor model, separating modality-specific dimensions, in addition to immediate and delayed memory and working memory. Reanalyzing the correlation matrices presented in the manual, Millis, Malina, Bowers, and Ricker (1999) did not find evidence for immediate and delayed subtests being specified on separate factors. Price, Tulsky, Millis, and Weiss (2002) verified that a three-factor model composed of an auditory (immediate and delayed), a visual (immediate and delayed), and a working memory factor accurately represents the factor structure of the WMS-III, whereas Wilde and colleagues (2003) argued for a two-factor model (working memory and general memory).

Despite the inability of factor analyses to provide statistical support for the WMS-III immediate versus delayed memory composites in nonclinical groups and epilepsy samples (Bell, Hermann, & Seidelberg, 2004), the updated Technical Manual (Tulsky, Zhu, & Ledbetter, 2002) recommends continued use of this distinction in clinical practice.

The newest version of the WMS available in Norway is the WMS-III, published in 2008 (Wechsler, 2008), which is to be used with U.S. norms. To justify the use of U.S. norms, the test-publisher conducted a joint validation study of the Norwegian and Swedish versions of the subtests Logical Memory I and II, Faces I and II, Verbal Paired Associates I and II, Family Pictures I and II, Word Lists I and II, and Mental Control. The remaining subtests were not supposed to be influenced by translation or cultural differences. The study sample (hereafter S/N sample) consisted of 180 nonclinical Swedes and Norwegians aged 20–74 years. Applying age-appropriate U.S. norms, the sample obtained scaled scores somewhat higher than the U.S. means on all subtests, except Family Pictures I and II and Faces I and II. Regression analysis demonstrated the expected effects of age, gender, and education on performance, but for some subtests, not specified in the manual, the effect of age was evident in the age group 70–74 years only. The elevated mean scores were attributed to the above average education level of subjects in the sample as 33% have a university degree. Culturally and ethnically biased stimulus material was discussed as possible reason for scores being lower on Family Pictures and Faces than on the other subtests (mean scaled scores 9.7 and 9.1, respectively). A factor analysis was conducted as part of the same study. However, not including all primary subtests, this factor analysis does not lend itself to comparison with prior factor analyses of the WMS-III. The Norwegian test manual concluded that the test could be used in Norway and Sweden with U.S. norms. A comprehensive search in the literature yielded no independent studies examining the factor structure of the WMS-III from the Nordic countries. Scandinavian clinicians, therefore, are left with limited information on the psychometric properties of the WMS-III.

The objective of the present study was to compare the factor structure of the Norwegian version with the original WMS-III, as this is a prerequisite for using the U.S. rules of interpretation in Norway. As the WMS-IV is not to be translated to Norwegian, the WMS-III will be used for many years to come and therefore data on the psychometric characteristics of this version are needed. The present study provides data on an elderly sample. Despite the fact that evaluation of memory is often requested in the elderly sample, nonclinical data are quite limited for this age span and the Scandinavian test sample did not include subjects above 74 years of age. The study was approved by the Regional Ethical Committee in Norway.

Materials and Methods

Sample

One hundred and twenty-two individuals aged 55–89 years (M = 74, SD = 8.8) participated in this study. They had all completed questionnaires and medical testing in the Nord-Troendelag Health Study (HUNT3, 2005–2008) and consented to participate in a comprehensive study of cognitive function. Participants had 10.7 years of education (SD = 3.2) on average. Inclusion criteria were as follows: Participants were recruited from the county of Nord-Troendelag, previously documented to be fairly representative of the Norwegian population (Holmen et al., 2003). The age, sex, and education of the participants closely matches figures by Statistics Norway (ssb@ssb.no) for individuals born in the early 1920s to the mid-1950s.

  • age 55–89 years

  • normal hearing and vision

  • absence of mental or medical disease affecting cognition (based on self-assessment)

  • a general medical checkup at the time of HUNT3

  • adequate physical functioning to meet testing requirements.

Procedure

Potential participants were extracted from the database at the HUNT research center. They were telephoned consecutively, until at least 20 individuals in each of 6 age bands (55–64, 65–69, 70–74, 75–79, 80–84, and 85–89 years) gave consent to participate. Before testing, each participant was interviewed about current diseases that would make them not suitable for inclusion in the study. These interviews revealed that one individual had recently been diagnosed with dementia and another had suffered a cerebral infarction. These individuals were excluded.

Measurement

Subjects completed all of the primary subtests, and the optional subtest Visual Reproduction. Scoring was done independently by two neuropsychologists and standardized scores were computed using the U.S. scoring program published by the Psychological Corporation.

Statistical Analysis

The factor structure was tested using Confirmatory Factor Analysis with Mplus 6.12. We followed the procedure devised by Price and colleagues (2002) with comparison of model fit based on a set of five a priori hypothesized models: All models were estimated with the Maximum Likelihood estimator, as well as the Yuan-Bentler Robust Maximum Likelihood estimator and all were strictly tested without modifications permitted. We included several model fit indices, including Akaikes Information Criterion (AIC), Bayes Information Criterion (BIC), Root Mean Square Error of Approximation (RMSEA), and Comparative Fit Index (CFI). Based on previous recommendations, a RMSEA <0.05 and a CFI >0.95 would indicate a good fit between the theorized model and the data. Because the current models were non-nested, relative comparisons of the models were done by comparing the AIC and BIC across models. In a situation with non-nested models, the AIC and BIC have benefits as they can be estimated for comparisons without degrees of freedom. Both indices have a penalty for model complexity, but the penalty tends to be higher for the BIC.

  • A one-factor model assuming that the covariance between subscales could be accounted for by a single general memory factor (General Memory Model).

  • A two-factor model assuming the presence of a working memory factor and a general memory factor (Working Memory Model). Subtests Letter/Number sequencing and Spatial Span loading on a working memory factor and all other subtests loading on the general memory factor.

  • A three-factor model assuming one working memory factor, and one immediate and one delayed memory factor (Retention Model). Subtests Letter/Number sequencing and Spatial Span loading on a working memory factor; Logical Memory 1, Faces I, Verbal Paired Associates 1, Family Pictures 1 loading on an immediate memory factor; and Logical Memory II, Faces II, Verbal Paired Associates II and Family Pictures II loading on a delayed memory factor.

  • A three-factor model assuming one working memory factor, and one visual and one auditory memory factor (Modality Model). Subtests Letter/Number sequencing and Spatial Span loading on a working memory factor; Family Pictures I and II and Faces I and II loading on a visual factor; Logical Memory I and II and Verbal Paired Associates I and II loading on an auditory memory factor.

  • A five-factor model assuming one working memory factor, one auditory immediate memory factor, one auditory delayed memory factor, one visual immediate memory factor, and one visual delayed memory factor (Retention and Modality Model). Subtests Letter/Number sequencing and Spatial Span loading on a working memory factor; Logical Memory I and Verbal Paired Associates I on an auditory immediate factor; Logical Memory II and Verbal Paired Associates II on an auditory delayed factor; Faces I and Family Pictures I loading on a visual immediate factor; and Faces II and Family Pictures II loading on a visual delayed factor.

Results

Table 1 shows the Pearson r correlations and descriptive statistics for each study variable. Correlations between immediate and delayed measures within subtests were high, ranging from 0.93 for Logical Memory I and II to 0.66 for Faces I and II. Across subtests, correlations were moderate to weak. The mean scaled scores ranged from 7.98 (Family Pictures I) to 11.80 (Faces II).

Table 1.

Correlations, means, and SDs between Wechsler Memory Scale-III subtests

 M SD 
1. Logical Memory (I) 1.00         10.61 3.31 
2. Faces (I) 0.38 1.00        10.77 2.99 
3. Verbal Paired A. (I) 0.46 0.32 1.00       9.47 2.91 
4. Family Pictures (I) 0.41 0.28 0.14 1.00      7.98 2.66 
5. Letter Number Seq. 0.26 0.25 0.19 0.13 1.00     9.42 2.34 
6. Spatial Span 0.10 0.22 0.23 0.07 0.29 1.00    10.45 2.91 
7. Logical Memory (II) 0.93 0.36 0.42 0.45 0.20 0.13 1.00   11.57 3.32 
8. Faces (II) 0.44 0.66 0.32 0.33 0.21 0.23 0.41 1.00  11.80 3.12 
9. Verbal Paired A. (II) 0.41 0.29 0.84 0.13 0.23 0.25 0.36 0.31 1.00 9.95 2.96 
10. Family Pictures (II) 0.49 0.35 0.18 0.91 0.16 0.11 0.53 0.38 0.17 8.45 2.61 
11. Visual Reproduction I          9.7 3.4 
12. Visual Reproduction II          11.7 3.1 
 M SD 
1. Logical Memory (I) 1.00         10.61 3.31 
2. Faces (I) 0.38 1.00        10.77 2.99 
3. Verbal Paired A. (I) 0.46 0.32 1.00       9.47 2.91 
4. Family Pictures (I) 0.41 0.28 0.14 1.00      7.98 2.66 
5. Letter Number Seq. 0.26 0.25 0.19 0.13 1.00     9.42 2.34 
6. Spatial Span 0.10 0.22 0.23 0.07 0.29 1.00    10.45 2.91 
7. Logical Memory (II) 0.93 0.36 0.42 0.45 0.20 0.13 1.00   11.57 3.32 
8. Faces (II) 0.44 0.66 0.32 0.33 0.21 0.23 0.41 1.00  11.80 3.12 
9. Verbal Paired A. (II) 0.41 0.29 0.84 0.13 0.23 0.25 0.36 0.31 1.00 9.95 2.96 
10. Family Pictures (II) 0.49 0.35 0.18 0.91 0.16 0.11 0.53 0.38 0.17 8.45 2.61 
11. Visual Reproduction I          9.7 3.4 
12. Visual Reproduction II          11.7 3.1 

Note: Mean scaled scores for Visual Reproduction I and II are reported but not included in the analyses.

I = immediate memory; II = delayed memory; Verbal Paired A. = Verbal Paired Associates; Letter Number Seq. = Letter Number Sequencing; SD = standard deviation.

Table 2 shows the model summary for the five specified models. Models 1–4 converged with proper solutions. However, in line with the findings of Millis and colleagues (1999) and Price and colleagues (2002), Model 5 converged with an inadmissible solution for some of the model parameters. In line with Millis and colleagues (1999), the estimated psi covariance matrix (Mplus model specification), that provides information about the correlation between latent factors, was not positively definite. Model misspecification is a likely cause of a nonpositive definite psi matrix. Model diagnostics pointed to the auditory delayed memory factor. In the current model, the indicators thought to reflect auditory delayed memory, Verbal Paired Associates II and Logical Memory II, correlated only weakly, whereas the correlation between Verbal Paired Associates I and II was .84. A similar situation was observed for the visual delayed memory factor. The computation of the five-factor model has difficulty integrating relative, weak within-factor correlation with the high cross-factor correlation. Given the weak loading for Verbal Paired Associates II with the auditory factor, even a latent correlation of 1 will not reproduce the correlation of .84 between the observed indicators, and the psi covariance matrix is thus constrained by the boundary solution. In line with this, the estimated psi covariance matrix indicated an inadmissible correlation of above 1 between the auditory delayed factor and the auditory immediate factor. Interpretation of the results for Model 5 thus requires particular caution. Examination of the CFI revealed an acceptable fit to the data, with values exceeding 0.99, but provided little information for discriminating between the fit of the models. As judged by the RMSEA, all models were below the 0.05 criterion suggested by Browne and Cudeck (1992). Overall, as judged by the AIC, CFI, and RMSEA, the three-factor model (Modality Model) provided the best fit for the data. According to BIC, which incorporated a stronger penalty for model complexity, the one-factor model is the better model, followed by the three-factor modality model. Using Kass and Raftery's cutoffs (Kass & Raftery, 1995) on the difference between BIC values, there is no strong evidence in favor of any single model.

Table 2.

Summary for tested models with multiple fit indices

Title ML MLR p Parameters BIC AIC CFI RMSEA 
One-factor general memory 35.56 35.87 .26 34 5430.25 5334.91 0.994 0.035 
Two-factor working memory 33.68 33.67 .29 35 5433.17 5335.03 0.995 0.032 
Three-factor retention 29.32 30.58 .40 37 5438.43 5334.68 0.998 0.020 
Three-factor modality 27.79 27.90 .48 37 5436.89 5333.14 1.00 0.000 
Five-factor modality and retentiona 24.41 27.33 .27 44 5467.14 5343.76 0.996 0.036 
Title ML MLR p Parameters BIC AIC CFI RMSEA 
One-factor general memory 35.56 35.87 .26 34 5430.25 5334.91 0.994 0.035 
Two-factor working memory 33.68 33.67 .29 35 5433.17 5335.03 0.995 0.032 
Three-factor retention 29.32 30.58 .40 37 5438.43 5334.68 0.998 0.020 
Three-factor modality 27.79 27.90 .48 37 5436.89 5333.14 1.00 0.000 
Five-factor modality and retentiona 24.41 27.33 .27 44 5467.14 5343.76 0.996 0.036 

Notes: MLR = Maximum Likelihood Robust estimator; ML = Maximun Likelihood estimator; BIC = Bayes Information Criterion; AIC = Akaike Information Criterion; CFI = Comparative Fit Index; RMSEA = Root Mean Square Error of Approximation.

aInadmissible solution.

The standardized factor loadings of the three-factor modality model are shown in Table 3. It can be seen that the latent factors loaded well on each salient subscale, although for two of the subscales, the loading was <0.50.

Table 3.

Standardized factor loadings for three-factor modality model of Wechsler Memory Scale-III

 Auditory memory Visual memory Working memory 
Logical Memory (I) 0.859   
Verbal Paired Ass. (I) 0.522   
Logical Memory (II) 0.831   
Verbal Paired Ass. (II) 0.475   
Faces (I)  0.561  
Family Pictures (I)  0.517  
Faces (II)  0.609  
Family Pictures (II)  0.624  
Letter Number Sequencing   0.624 
Spatial Span   0.461 
 Auditory memory Visual memory Working memory 
Logical Memory (I) 0.859   
Verbal Paired Ass. (I) 0.522   
Logical Memory (II) 0.831   
Verbal Paired Ass. (II) 0.475   
Faces (I)  0.561  
Family Pictures (I)  0.517  
Faces (II)  0.609  
Family Pictures (II)  0.624  
Letter Number Sequencing   0.624 
Spatial Span   0.461 

Note: I = immediate memory; II = delayed memory; Ass. = associates.

Discussion

The proposed three-factor model including working memory, visual memory, and auditory memory fits the Norwegian data very well. The original five-factor model, also distinguishing between immediate and delayed memory, provided a good absolute fit, but less adequate parsimony adjusted fit. In addition, the five-factor model displayed computational problems similar to those reported in several previous studies. Based on the model fit indices, the notion of a one-dimensional memory construct seems too simple, and the working memory versus general memory distinction carries less information than the modality-specific distinction. Our data nicely replicate the factor structure reported by Price and colleagues (2002) and suggest that the latent structure of the test has been maintained in the Norwegian version. However, the level of performance on some subtests raises some concern. Our sample performs approximately three-quarter SD below the U.S. mean on Family Pictures I, and almost as much above that mean on Faces II. Two explanations seem close to hand: Norwegians performing differently than the U.S. standardization sample may indeed be the case for Family Pictures. The S/N test sample demonstrated a similar pattern, performing substantially below their own mean, and somewhat below the U.S. mean, despite their above average level of education. The publisher concluded that this finding was probably due to a cultural bias in the paintwork of the subtest. Presumably, such a cultural bias could become more salient in elderly subjects, as is the case in our sample, because younger subjects are more extensively exposed to the U.S. influence, via modern media. Performance above the U.S. mean on Faces, however, can hardly be explained in terms of cultural or ethnical bias, because such a bias would be expected to result in a below mean performance on the subtest. Our sample performing differently than both the U.S. sample and the S/N test sample leaves us with the possibility that it may not be entirely representative of the Norwegian population. Several factors may have contributed to this. We explicitly wanted to study healthy and cognitively intact elderly individuals. Potential participants were thus extracted on rigorous criteria from a very large database developed through a longitudinal public health surveillance program (HUNT). All our subjects had participated in this program over a period of 20 years. Several studies have demonstrated that responders are less prone than nonresponders to have problems in health and cognition (Holmen et al., 1990); consequently, our sample may be somewhat healthier than most elderly Norwegians.

  1. Norwegians perform differently than those in the U.S. sample, and/or

  2. Our sample is not representative of the Norwegian population.

The most important weakness of our study is the relatively small number of participants compared with the U.S. sample. Even though the sampling procedure was designed to obtain an unbiased sample, selection bias cannot be ruled out.

A major strength of the present study is that it was conducted on cognitively intact elderly individuals, a population seldom studied. Inclusion and exclusion criteria were confirmed by interviewing each participant. We think that this procedure resulted in a sample free of diseases affecting cognition in any significant way. The mean age of our sample was 74 years, whereas the S/N test sample did not include individuals above 74 years of age. In addition, our sample closely matches the figures given by Statistics Norway (ssb@ssb.no), regarding age, sex, and education in individuals born from the early 1920s to the mid-1950s.

Even though the three-factor model was statistically superior to the five-factor model in our sample, we would argue, like Price and colleagues (2002) and the Technical Manual (2002), that the distinction between immediate and delayed memory may still be clinically useful. When evaluating memory in individual patients with severe memory impairment and neurological deterioration, measures of immediate and delayed memory may signal different forms of diseases, and indicate different prognosis and lines of treatment and care. The empirical validity of the immediate/delayed distinction can only be established by studies on clinical samples including patients with known memory problems. To our knowledge, no such studies of Scandinavian samples have been published.

The fourth edition of the WMS was published in the United States in 2009, but it has not been translated into any Scandinavian language. Therefore, in Scandinavia, we will have continued use of the WMS-III for many years to come, and data on the psychometric characteristics of the test are needed. In spite of its limitations, the present study contributes to fulfill this need. The WMS-IV has deleted Family Pictures and Faces from the scale, because these subtests have well-documented limitations. Users of the WMS-III may consider including Visual Reproduction as a substitute, as this optional subtest may avoid the shortcomings of Family Pictures.

Conclusion

Our factor analytic data suggest that the Norwegian version of the WMS-III measures multiple aspects of attention and memory in much the same manner as the original. However, the U.S. norms should be applied with some caution in elderly Norwegian subjects, keeping in mind that a scaled score of 5 on Family Pictures may actually be within 1 SD from the Norwegian mean.

Funding

This work was supported by the North-Troendelag Health Trust (grant number 347).

Conflict of Interest

None declared.

References

Bell
B. D.
,
Hermann
B. P.
,
Seidelberg
M.
(
2004
).
Significant discrepancies between immediate and delayed WMS-III Indices are rare in temporal lobe epilepsy patients
.
The Clinical Neuropsychologist
 ,
18
,
303
311
.
Browne
M. W.
,
Cudeck
R.
(
1992
).
Alternative ways of assessing model fit
.
Sociological Methods and Research
 ,
21
,
230
258
.
Holmen
J.
,
Midthjell
K.
,
Forsen
L.
,
Skjerve
K.
,
Gorseth
M.
,
Oseland
A.
(
1990
).
Participation and comparison of attendants and non-attendants
.
Tidsskrift Norske Lægeforening
 ,
110
,
1973
1977
.
Holmen
J.
,
Midthjell
K.
,
Krüger
Ø.
,
Langhammer
A.
,
Holmen
T. L.
,
Bratberg
G. H.
et al
. (
2003
).
The Nord-Trøndelag Health Study 1995–97 (HUNT 2): Objectives, contents, methods and participation
.
Norsk Epidemiologi
 ,
13
,
19
32
.
HUNT3 Survey
. .
Kass
R. E.
,
Raftery
A. E.
(
1995
).
Bayes factors
.
Journal of the American Statistical Association
 ,
90
,
773
795
.
Millis
S. R.
,
Malina
A. C.
,
Bowers
D. A.
,
Ricker
J. H.
(
1999
).
Confirmatory factor analysis of the Wechsler Memory Scale-III
.
Journal of Clinical and Experimental Neuropsychology
 ,
21
,
87
93
.
Price
L.
,
Tulsky
D.
,
Millis
S.
,
Weiss
L.
(
2002
).
Redefining the factor structure of the Wechsler Memory Scale-III: Confirmatory factor analysis with cross-validation
.
Journal of Clinical and Experimental Neuropsychology
 ,
24
,
574
585
.
Tulsky
D.
,
Zhu
J.
,
Ledbetter
M.
(
1997
).
WAIS-III/WMS-III Technical Manual
 .
San Antonio, TX
:
Psychological Corporation
.
Tulsky
D.
,
Zhu
J.
,
Ledbetter
M.
(
2002
).
WAIS-III/WMS-III Technical Manual
 .
San Antonio, TX
:
Psychological Corporation
.
Updated
.
Wechsler
D.
(
2008
).
WMS-III. Norsk versjon. Manual
 .
Stockholm, Sweden
:
Harcourt; Pearson
.
Wilde
N. J.
,
Strauss
E.
,
Chelune
G. J.
,
Hermann
B. P.
,
Hunter
M.
,
Loring
D. W.
et al
. (
2003
).
Confirmatory factor analysis of the WMS-III in patients with temporal lobe epilepsy
.
Psychological Assessment
 ,
15
,
56
63
.