-
PDF
- Split View
-
Views
-
Cite
Cite
Elouise Botes, Lindie van der Westhuizen, Jean-Marc Dewaele, Peter MacIntyre, Samuel Greiff, Validating the Short-form Foreign Language Classroom Anxiety Scale, Applied Linguistics, Volume 43, Issue 5, October 2022, Pages 1006–1033, https://doi.org/10.1093/applin/amac018
Close - Share Icon Share
Abstract
Foreign language classroom anxiety (FLCA) is a popular construct in applied linguistics research, traditionally measured with the 33-item Foreign Language Classroom Anxiety Scale (FLCAS). However, recent studies have started utilizing the eight-item Short-Form FLCAS (S-FLCAS). There is therefore a need, which this study addressed in five sequential steps, to validate the S-FLCAS in order to ensure the validity and reliability of the scale. A sample of n = 370 foreign language learners was utilized in the validation efforts, which included exploratory and confirmatory factor analyses, the establishment of convergent and discriminant validity, and invariance testing. The S-FLCAS was found to have a unidimensional structure with the eight items loading on a single latent variable. Evidence was provided of the internal consistency and the convergent and discriminate validity of the S-FLCAS. In addition, the measure was found to be fully invariant across age, gender, educational levels, and L1 groups. It is, therefore, with some considerable confidence that we can recommend the future use of the S-FLCAS in peer-reviewed research.
Anxiety has been found to negatively impact learning across numerous contexts, including the learning of mathematics (Hembree 1990), science (Mallow 2006), and foreign languages (FLs) (Dewaele and MacIntyre 2014). In the case of FL learning, a domain-specific form of anxiety has been defined, namely, Foreign Language Classroom Anxiety (FLCA). In their now seminal article, Horwitz et al. (1986) summarized FLCA as ‘a distinct complex of self-perceptions, beliefs, feelings, and behaviours related to classroom language learning arising from the uniqueness of the language learning process’ (128). The introduction of the construct of FLCA also contained an accompanying 33-item measure, appropriately titled the Foreign Language Classroom Anxiety Scale (FLCAS). The measure has been extensively investigated with regard to its validity and reliability over the last three decades (see Horwitz 1986; Park 2014) and has widely been accepted as a valid measure of the construct of FL learning anxiety.
Recent developments in the field of language learning (MacIntyre and Gregersen 2012) have led to even greater research interest in the complex role of emotions, including FLCA and its measurement. Foreign Language Classroom Anxiety has been studied alongside other emotions, such as Foreign Language Enjoyment (FLE; see Dewaele 2019; Botes et al. 2020a) and Foreign Language Boredom (FLB; see Li et al. 2021). The article which introduced FLE to the research lexicon measured FLCA with an eight-item shortened version of the FLCAS (see Dewaele and MacIntyre 2014), which we refer to as the Short-Form Foreign Language Classroom Anxiety Scale (S-FLCAS) in the present article.1 The S-FLCAS was first developed in an appendix to MacIntyre’s (1992) doctoral dissertation. However, the eight-item measure has rarely been used in lieu of the 33-item FLCAS in peer-reviewed articles, that is, until the article by Dewaele and MacIntyre (2014), which has increased research attention for the S-FLCAS. Indeed, numerous recent research publications have favoured the eight-item S-FLCAS over the longer, original 33-item measure (e.g. Dewaele et al. 2019; Botes et al. 2020a; Moskowitz and Dewaele 2020).
Although the use of short-form measures such as the S-FLCAS has advantages, the lack of a full validation study of the scale poses a significant research risk. Short-form measures have the crucial advantage of reducing administration time and thus allowing researchers to include a larger number of measures in an assessment battery (Heene et al. 2014). This is a substantial advantage given the rise in the complexity of hypotheses and research questions in the social sciences, which creates the need for researchers to measure an ever-increasing number of variables as efficiently as possible in a single study (Ziegler et al. 2014). As such, it is unsurprising that the S-FLCAS has become increasingly popular since its 2014 revival, as the eight items can easily be incorporated into an assessment battery. However, unlike the 33-item FLCAS, no independent validation of the S-FLCAS has been carried out. Researchers therefore run the risk of utilizing a measure that has not been consistently demonstrated to be valid and reliable or as usable across various diverse contexts with different gender, age, education, and target language groups.
The aim of this study is therefore to fill the gap in research and provide an independent validation of the S-FLCAS in order to determine the validity and reliability of the measure. Validation efforts will include invariance testing in order to ensure that items do not function differently across different age, gender, educational level, and target language groups. In addition, an overview of the literature on the use, development, and current evidence of the validity and reliability of the FLCAS and S-FLCAS will be provided.
FOREIGN LANGUAGE CLASSROOM ANXIETY
Foreign language classroom anxiety is seen as a unique situation-specific anxiety that specifically occurs in the language learning classroom but is trait-like in terms of its consistency and long-term stability (MacIntyre 2017). Horwitz (2017) attributes the cause of FLCA to the ego-threatening nature of FL use: ‘Language Anxiety emanates from the discomfort some language learners have when they must interact in the language but are unable to present themselves authentically when doing so’ (42). She compares this discomfort to ‘our feelings when we have a bad haircut or must wear unfashionable or unflattering clothing’ (41). Repeated occurrences of the emotional discomfort and worry about authentic self-presentation associated with using the FL coalesce over time into a pattern that affects how learners perceives themselves; learners come to describe themselves as anxious in FL situations (MacIntyre and Gardner 1988). The introduction of the variable created a research interest in individual differences in language learning as whole and specifically an interest in examining the role anxiety plays in gaining proficiency in a target language.
Anxiety in language learning has been a popular research topic for decades, with the first review of the matter conducted by Scovel (1978). Early research into anxiety in FL learning aimed at examining the relationship between anxiety and proficiency in the target language, with contradictory results (see MacIntyre 2017). Scovel (1978) described the findings regarding the role of anxiety in language learning at the time as ‘mixed and confusing’ (132). The possible explanation for such results is often attributed to the varying definitions and measures used to capture anxiety (Scovel 1978; MacIntyre 2017), with no clear conceptualization of how and why exactly anxiety should affect language learning.
The introduction of the concept of FLCA and its accompanying scale (Horwitz et al. 1986) directly addressed the concerns raised by Scovel (1978). The FLCA and the FLCAS provided the field with a clear definition and measurement of anxiety in the context of foreign language learning, resulting in over 30years of relatively clear and stable findings (Horwitz 2001, 2017; MacIntyre 2017).
The value of the variable of FLCA and the FLCAS can be directly seen in the exponential rise of studies examining anxiety in the context of language learning shortly after its publication (see Botes et al. 2020b for a review). Studies predominantly focused on the relationship between FLCA and proficiency, with FLCA linked to academic achievement (Khodadady and Khajavy 2013; Jee 2016; Botes et al. 2020b), a willingness to communicate (Liu and Jackson 2008; Dewaele 2019), and a positive self-perception of proficiency in the target language (Szyszka 2011; Liu 2013). Indeed, two recent meta-analyses have found direct and indirect negative associations between FLCA and language learning proficiency (Teimouri et al. 2019; Zhang 2019). Beyond proficiency, studies examining the negative association between language anxiety (conceptualized as FLCA) and FL learner well-being have found negative relationships between FLCA and self-esteem or self-confidence (Crookall and Oxford 1991; Onwuegbuzie et al. 1999), intrinsic motivation (Liu and Zhang 2013), and maladaptive perfectionism (Dordinejad and Nasab 2013). FLCA therefore is not only negatively associated with gaining proficiency in the target language, but may also impact the well-being of FL learners.
In summary, the introduction of FLCA became a landmark in the field of applied linguistics, with the validity and reliability of the measurement addressing a central research concern. The clear definition and stable measurement provided by the FLCAS contributed to an exponential increase in research examining anxiety and demonstrated the negative impact that anxiety can have on the learning process, demonstrating proficiency, and on the overall well-being of the language learner.
FOREIGN LANGUAGE CLASSROOM ANXIETY SCALE
The success of the FLCAS in conceptualizing language anxiety may be attributed to the strong design of its items that were derived in part from experience dealing with anxious FL learners who described testing, communicating, and worry about being negatively evaluated by others in the context of FL learning (Horwitz et al. 1986). Items were based on the theoretical building blocks of existing scales of test anxiety, communication apprehension, and fear of negative evaluation in the context of FL learning. However, it should be noted that Horwitz (1986, 2017) has specified that although FLCA is conceptually related to these three constructs, it is a unique variable tied to language and can therefore not be reduced to the sum of test anxiety, communication apprehension, and fear of negative evaluation in the context of FL learning. Instead, FLCA took its theoretical inspiration from these three constructs. Items therefore refer to feelings of anxiety, nervousness, and unease in the FL classroom, with items such as ‘In language class, I get so nervous I forget things I know’ and ‘I often feel like not going to language class’ (Horwitz et al. 1986: 129).
The psychometric attributes of the FLCAS have been investigated and reinvestigated numerous times since the scale was introduced. Horwitz (1986) published a validation study of the scale with evidence of its test–retest reliability (rtt = 0.83) as well as its convergent and discriminant validity. In order to establish validity, Horwitz (1986) correlated the FLCAS with trait anxiety (r = 0.29, p < 0.01), test anxiety (r = 0.53, p < 0.01), fear of negative evaluation (r = 0.36, p < 0.01), and communication apprehension (r = 0.28, p > 0.05). In addition, FLCA was found to be related to—but independent from—trait anxiety, test anxiety, and fear of negative evaluation. Additional evidence of the validity of the FLCAS has included high internal consistencies (α > 0.90; Aida 1994; Elkhafaifi 2005; Gocer 2014) and response validity (Tóth 2008). Foreign language classroom anxiety was therefore established as a unique construct, a form of anxiety conceptualized in a domain-specific way and worth investigating for its effects on learners, its role in language use, and its relationship to language learning.
However, there has been some concern about construct validity of the FLCAS. The original publication of the measure did not delve into the underlying factor structure, nor was the factor structure broached in Horwitz’s (1986) validation study. It is therefore not surprising that examinations of the factor structure of the FLCAS became a frequently studied and debated topic. Different factor structures have been found across four studies, summarized in Table 1. Even though Horwitz (2001, 2017) has repeatedly stated that the conceptual building blocks of FLCA do not necessarily translate into the underlying factors, Table 1 demonstrates that numerous authors have labelled their factors in accordance with these conceptual building blocks. The majority of studies examining the factor structure underlying FLCA (see Table 1) disregarded a number of items that did not load on any of the selected factors. Indeed, Cheng, Horwitz and Schallert (1999) disregarded 13 items, and Park (2014) disregarded 10 items that did not load on any selected factors. The variability found in the underlying construct of FLCA may be attributed to the varying contexts in which the sample data were collected, such as different target languages or the proficiency levels of the sample groups (Park and French 2013). In addition, different statistical analyses of the sample data sets may have contributed to different factor structures, as estimation and rotation methods can impact the results of exploratory factor analyses (Field 2013). In effect, despite contributing positively to the internal consistency of the FLCAS measure, the discarded items become conceptual orphans with uncertain relationships to the underlying factors. In the interests of parsimony, the possibility of discarding some of the items in the FLCAS also ought to be considered.
| Publication . | L1 . | Target language . | Proposed factor structure . | Methods . |
|---|---|---|---|---|
| Aida (1994) | English | Japanese |
| Principal component analysis with varimax rotation |
| Cheng et al. (1999) | Chinese | English |
| Principal component analysis with varimax rotation |
| Liu and Jackson (2008) | Chinese | English |
| Factor analysis with varimax rotation |
| Park (2014) | Korean | English |
| Maximum likelihood exploratory factor analysis with direct oblimin rotation |
| Tóth (2008) | Hungarian | English |
| Principal component analysis with direct oblimin rotation |
| Publication . | L1 . | Target language . | Proposed factor structure . | Methods . |
|---|---|---|---|---|
| Aida (1994) | English | Japanese |
| Principal component analysis with varimax rotation |
| Cheng et al. (1999) | Chinese | English |
| Principal component analysis with varimax rotation |
| Liu and Jackson (2008) | Chinese | English |
| Factor analysis with varimax rotation |
| Park (2014) | Korean | English |
| Maximum likelihood exploratory factor analysis with direct oblimin rotation |
| Tóth (2008) | Hungarian | English |
| Principal component analysis with direct oblimin rotation |
| Publication . | L1 . | Target language . | Proposed factor structure . | Methods . |
|---|---|---|---|---|
| Aida (1994) | English | Japanese |
| Principal component analysis with varimax rotation |
| Cheng et al. (1999) | Chinese | English |
| Principal component analysis with varimax rotation |
| Liu and Jackson (2008) | Chinese | English |
| Factor analysis with varimax rotation |
| Park (2014) | Korean | English |
| Maximum likelihood exploratory factor analysis with direct oblimin rotation |
| Tóth (2008) | Hungarian | English |
| Principal component analysis with direct oblimin rotation |
| Publication . | L1 . | Target language . | Proposed factor structure . | Methods . |
|---|---|---|---|---|
| Aida (1994) | English | Japanese |
| Principal component analysis with varimax rotation |
| Cheng et al. (1999) | Chinese | English |
| Principal component analysis with varimax rotation |
| Liu and Jackson (2008) | Chinese | English |
| Factor analysis with varimax rotation |
| Park (2014) | Korean | English |
| Maximum likelihood exploratory factor analysis with direct oblimin rotation |
| Tóth (2008) | Hungarian | English |
| Principal component analysis with direct oblimin rotation |
Confirming the construct validity of the FLCAS is further complicated by a lack of a clear theoretical foundation with respect to the number of factors underlying FLCA. Horwitz (2017) argued that reducing FLCA to a composite of test anxiety, communication apprehension, and fear of negative evaluation is an unacceptable simplification of the construct. Instead, Horwitz (2017) advocated for more practically oriented research aimed at assisting and alleviating FLCA in FL learners, stating ‘my point is that we don’t need to thoroughly identify the components of Language Anxiety or understand the interactions among them to help anxious learners’ (38). We might take Horwitz’ argument to suggest that a one-factor solution is optimal for conceptualizing FLCA as a construct for measurement purposes. The issue of dimensionality is important for understanding what one is getting when a specific test is employed, and accurate interpretation requires accurate conceptualization and measurement (Flake and Fried 2020). Indeed, practical experimental or intervention-based research cannot be demonstrated to be effective if the targeted variables are not measured in a valid and reliable manner appropriate to the research question—including the construct validity of the scales (Flake and Fried 2020).
The literature on the FLCAS has therefore established a measure with clear response reliability and internal consistency, with consistent pattern of validity correlates (Botes et al. 2020b). However, problematic items and an indistinct factor structure pose validity issues for the future use of the scale. The lack of clear construct validity is especially problematic should researchers aim to utilize multivariate statistical analysis techniques, such as structural equation modelling, where the results of hypotheses depend on specifying clear factor structures (Barrett 2007).
One solution to the measurement issues of the FLCAS may therefore be the use of a unidimensional scale, the S-FLCAS, if clear evidence of its validity and reliability can be obtained. Therefore, in this study, our aim was to validate the S-FLCAS to confirm that it offers a valid and reliable measurement of FLCA for use in future cross-sectional and intervention-based research.
SHORT-FORM FOREIGN LANGUAGE CLASSROOM ANXIETY SCALE
The aim of developing the S-FLCAS was to create a short-form FLCA scale that more closely resembles the 10-item French Class Anxiety Scale (Gardner 1985) and the 10-item French Use Anxiety Scale (MacIntyre and Gardner 1988) that helped inspire Horwitz et al. (1986) to create the FLCAS (MacIntyre 1992). Items to be deleted were selected through internal consistency checks using Cronbach’s alpha, removing one-by-one the items that least adversely affected the internal consistency of the scale until removing the next item led to a substantial reduction in alpha (MacIntyre 1992). The number of items were whittled down from 33 to 8, one item at a time with continuous item analyses conducted every time an item was removed, finally resulting in the 8-item S-FLCAS that showed minimal loss of internal consistency.
MacIntyre (1992) investigated the validity of the S-FLCAS by examining the internal consistencies, predictive validity, and convergent validity of the short form. The internal consistencies of the full 33-item FLCAS and the 8-item S-FLCAS were markedly similar (α = .94; α = .93; MacIntyre 1992). Dropping 25 items had a negligible effect on the Cronbach alpha coefficient. In addition, the correlations between the FLCAS and the S-FLCAS provided evidence of convergent validity (r = 0.98, p < 0.01). The FLCAS and S-FLCAS were also found to have similar correlational patterns with third variables, such as grades in language courses (r = −0.38, p < 0.01; r = −0.33, p < 0.01), achievement test scores (r = −0.48, p < 0.01; r = −0.44, p < 0.01), and self-ratings of proficiency (r = −0.61, p < 0.01; r = −0.57, p < 0.01; MacIntyre 1992), respectively. In each case, some reduction in correlation was observed, which would be expected with fewer items, a smaller range of possible scores, and less variability available for analysis. However, the observed reductions were small and did not change the substantive interpretation of the correlations. As such, the preliminary validation of the S-FLCAS yielded promising results and provided preliminary evidence of the validity and reliability of the S-FLCAS, but the analysis was incomplete.
The S-FLCAS has been regularly utilized since it was included in the Dewaele and MacIntyre (2014) study (see Shirvan and Taherian 2018; Bensalem 2021; Fathi and Mohammaddokht 2021; Kitoaka 2021; Su 2022). Recent studies featuring the S-FLCAS have examined relationships between FLCA and FLE (Uzun 2017), FLCA and need for cognition (Rezazadeh and Zarrinabadi 2020), FLCA and language proficiency (Dewaele and Alfawzan 2018), FLCA and emotional intelligence (Resnik and Dewaele 2020), and FLCA and teacher-related variables (Hung 2020), whereas another study explored gender differences in FLCA (Dewaele et al. 2016; Alenezi 2020). However, even though the S-FLCAS has been in popular use for some time, the majority of the studies utilized the measure in order to provide a single score of FLCA for cross-sectional research using correlations and regressions, with factor structures, validity, and reliability remaining unexplored (see Dewaele et al. 2019; Rezazadeh and Zarrinabadi 2020).
As such, several additional validity and reliability considerations remain for the eight-item S-FLCAS, specifically regarding its construct validity, convergent validity, and divergent validity. Given the contention regarding the factor structure of the 33-item FLCAS (see Table 1), a factor analytical study of the S-FLCAS is needed. As far as we are aware, no study has conducted a factor analysis of only the eight items of the S-FLCAS. The S-FLCAS was included in a previously published factor analysis alongside the 21 items from the FLE Scale in order to establish the independence of FLCA and FLE (see Dewaele and MacIntyre 2016). The eight anxiety items loaded on a single factor alongside two enjoyment factors, indicating that a single factor might underlie the S-FLCAS (Dewaele and MacIntyre 2016). However, more complete exploratory and confirmatory analyses are needed to fully determine the construct validity of the S-FLCAS. In addition, the relation of the S-FLCAS to variables associated with the construct of FLCAS—such as communication apprehension, test anxiety, and fear of negative evaluation—should be examined to determine convergent validity. In addition, as the original validation attempt occurred three decades ago and was unpublished (MacIntyre 1992), it would be a prudent time to re-examine the validity of the S-FLCAS, especially given its recent rise in popularity.
Thus, MacIntyre (1992) developed the S-FLCAS through the use of sound psychometric methods and investigated the preliminary validity of the measure, though the analysis stopped short of a complete validation of the short form. Although the measure was found to possess predictive validity, there was no attempt to address the factor structure underlying the S-FLCAS, and evidence of its confirmatory and discriminatory validity in a new sample is needed to avoid capitalizing on chance correlation in the previous sample. In addition, considerable time has passed since the initial validation efforts in 1992. Given increasing interest in emotion in second language acquisition (SLA), the use of the S-FLCAS might be expected to increase, especially if a full examination of the reliability and validity of the S-FLCAS supports its use as a valid and reliable measure.
METHOD
Participants
The sample consisted of n = 370 international adult foreign language learners. The mean age (standard deviation (SD)) of the sample was 27.56 years (10.25), with 168 male and 202 female participants. The majority of participants were US nationals (n = 101), followed by British (n = 70) and German nationals (n = 24). The average number of self-reported languages known by participants was 3.91 (SD = 1.79). Participants were involved in the learning of a total of 34 different languages with the majority learning Dutch (n = 50), followed by French (n = 40) and German (n = 36). More information regarding the descriptive statistics and distributions of data for age, nationality, educational level, and the L1 of the sample can be found in Supplementary Data, Tables S1–S4.
| Fit Indices . | Model 1 . | Model 2 . |
|---|---|---|
| χ2 (df) | 56.324 (20), p <0.001 | 40.414 (19), p < 0.001 |
| CFI | 0.948 | 0.969 |
| TLI | 0.927 | 0.955 |
| RMSEA | 0.099 | 0.078 |
| SRMR | 0.048 | 0.040 |
| Fit Indices . | Model 1 . | Model 2 . |
|---|---|---|
| χ2 (df) | 56.324 (20), p <0.001 | 40.414 (19), p < 0.001 |
| CFI | 0.948 | 0.969 |
| TLI | 0.927 | 0.955 |
| RMSEA | 0.099 | 0.078 |
| SRMR | 0.048 | 0.040 |
| Fit Indices . | Model 1 . | Model 2 . |
|---|---|---|
| χ2 (df) | 56.324 (20), p <0.001 | 40.414 (19), p < 0.001 |
| CFI | 0.948 | 0.969 |
| TLI | 0.927 | 0.955 |
| RMSEA | 0.099 | 0.078 |
| SRMR | 0.048 | 0.040 |
| Fit Indices . | Model 1 . | Model 2 . |
|---|---|---|
| χ2 (df) | 56.324 (20), p <0.001 | 40.414 (19), p < 0.001 |
| CFI | 0.948 | 0.969 |
| TLI | 0.927 | 0.955 |
| RMSEA | 0.099 | 0.078 |
| SRMR | 0.048 | 0.040 |
The data were collected in 2019 via the online platform SoSci Survey, utilizing snowball sampling. All ethical requirements for data collection and storage were met as stipulated by the University of [redacted].
Instruments
The following instruments were used in the validation of the S-FLCAS.
S-FLCAS
The S-FLCAS consists of eight items identified by MacIntyre (1992) and as used by Dewaele and MacIntyre (2014). The scale is aimed at measuring the broad construct of anxiety specific to foreign language learning, with items such as ‘It embarrasses me to volunteer answers in my FL class’. Two of the eight items are reverse coded (‘I don’t worry about making mistakes in FL class’ and ‘I feel confident when I speak in FL class’). Items are rated on a 5-point Likert scale, ranging from 1 (strongly disagree) to 5 (strongly agree). The sample Cronbach’s α = 0.892 and McDonald’s omega also was 0.89. The full scale is included in the Supplementary Data.
Penn State Worry Questionnaire—Abbreviated
This is an abbreviated eight-item measure that was designed to provide a general indication of worry and anxiety in adults, with items such as ‘My worries overwhelm me’ (Hopko et al. 2003). Items are measured on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The Cronbach’s alpha for the Penn State Worry Questionnaire—abbreviated (PSWQ-A) was α = 0.94 and McDonald’s omega was ω = 0.94 in this sample.
Brief fear of negative evaluation scale
This eight-item measure examined a general social anxiety and fear of being negatively judged by others, with items such as ‘I am afraid that people find fault with me’ (Rodebaugh et al. 2004). Items are measured on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The sample Cronbach’s alpha was α = 0.96 and McDonald’s omega was ω = 0.96.
Short-form foreign language enjoyment scale
This is a nine-item broad measure of positive emotion experienced in the FL classroom (Botes et al. 2022a). This short-form version of the original 21-item scale developed by Dewaele and MacIntyre (2014) contains items such as ‘I enjoy FL learning’. Items are measured on a 5-point Likert scale ranging from 1 (strongly disagree) to 5 (strongly agree). The sample Cronbach’s alpha was α = 0.89 and McDonald’s omega was ω = 0.88.
Data analysis
The validation of the S-FLCAS occurred in five sequential steps, detailed in Figure 1. The specific analyses and procedures involved in each step are described below.

Step 1: Splitting the data
In accordance with the best practice guidelines (see Marsh et al. 2005; Hagtvet and Sipos 2016), SPSS 25 was used to randomly split the data set in two. Subsequently, the two data sets were compared via t-tests to ensure that no statistically significant differences could be found between the two newly created samples. The examination of the factor structure in Step 2 utilized the first sample that was created, and the confirmation of the measurement model in Step 3 utilized the second sample.
Step 2: Investigate the factor structure
The factor structure underlying the eight-item S-FLCAS was examined via maximum likelihood estimation with oblique (promax) rotation in JASP (JASP Team 2020). Promax rotation was utilized as it would be theoretically expected that factors underlying FLCA would correlate, should a multidimensional factor structure emerge from the data (Tóth 2008). The first sample was utilized in this analysis. The factor structure was determined via the eigenvalue greater than one criterion and the scree plot.
Exploratory factor analysis with maximum likelihood estimation was chosen as the factor extraction method as it has been found to be less biased for uncovering the underlying factor structure and empirical fit than its rival principal axis factoring and generalized least square methods (de Winter and Dodou 2012). Furthermore, promax rotation was chosen as it is theoretically assumed that, should a multidimensional factor structure underlie FLCA, the factors in such a model will most likely be correlated (Field 2013). Factor loadings were categorized as low (<0.4), acceptable (0.4–0.6), or high (>0.6; Stevens 1992; Tabachnick and Fidell 2007; Kline 2014).
Step 3: Confirming the factor structure.
The factor structure identified in Step 2 of the analysis was confirmed via a confirmatory factor analysis in R utilizing the Lavaan package (Rosseel 2012). The second sample generated in Step 1 was used for this analysis. Maximum likelihood estimation with robust standard errors were used to estimate all confirmatory factor analysis (CFA) models (including the invariance models). There were no missing values. The fit of the measurement model was evaluated via the following fit indices: the root mean square error of approximation (RMSEA), the standardized root mean square residual (SRMR), the comparative fit index (CFI), and the Tucker–Lewis index (TLI). A model was deemed to have a close fit when CFI and TLI > 0.95, RMSEA <0.08, and SRMR <0.08 (Hu and Bentler 1999). Factor loadings and cross-loadings were further investigated to determine the fit of the model.
It should be noted that the size of the sample utilized in this analysis can be considered somewhat small (n = 185). It would therefore be appropriate to examine statistical power before conducting the confirmatory factor analysis. As such, an extensive power analysis was conducted alongside the confirmatory factor analysis in Step 3.
Step 4: Recombining the data set
The two halves of the data set were recombined into a single data set of n = 370. The data set was recombined to provide sufficient statistical power to conduct invariance analyses. The full data set was utilized in the reliability, validity, and invariance analyses conducted in Step 5.
Step 5: Validating the S-FLCAS
Reliability was examined via internal consistency coefficients as measured by Cronbach’s alpha and McDonald’s omega.
Both the convergent and discriminant validity of the S-FLCAS were examined. Convergent and discriminant validity were established by contrasting the S-FLCAS with the PSWQ-A and the Brief Fear of Negative Evaluation Scale (BFNES). The PSWQ-A provided a general measure of anxiety, and the BFNES provided a general indication of social anxiety. It is theoretically assumed that FLCA is related to, yet conceptually different from, general anxiety and social anxiety (Horwitz et al. 1986). Low-to-moderate correlations (0.15 ≤ r ≤ 0.35) would therefore be expected between the S-FLCAS, the PSWQ-A, and the BFNES and would provide evidence of convergent validity. In addition, the S-FLCAS should present as a distinct construct in an exploratory factor analysis alongside the PSWQ-A and the BFNES, thus providing evidence of discriminant validity. In addition, the moderate negative association between FLCA and FLE has been established in the literature (r = −0.30; see Botes et al. 2022b), and as such, the Short-form Foreign Language Enjoyment Scale (S-FLES) was further utilized to establish discriminant validity.
Furthermore, invariance testing was conducted on the S-FLCAS to demonstrate its generalizability across different subgroups in the population. Invariance was examined across gender, age groups, educational level, and L1 groups. Invariance testing was conducted via JASP (JASP Team 2020). A measure is said to be invariant when members of different groups (e.g. males and females) who have the same standing on the construct of interest receive the same observed score on the measure (Meredith 1993). Thus, if measurement invariance is established, it can be assumed that the construct of interest is measured consistently across groups and that the properties of the scale are not affected by group differences. Using multi-group CFA, we tested measurement invariance by employing a series of increasingly restrictive invariance models across each of the aforementioned groups (Meredith 1993; Millsap 2011). Specifically, we tested configural, metric, and scalar invariance, which in turn compared the freely estimated parameters, the factor loadings, and item intercepts across groups (Meredith 1993). The guidelines proposed by Cheung and Rensvold (2002) and Chen (2007) were used to evaluate invariance. Accordingly, invariance is supported when the ΔCFI ≤ −0.010, ΔRMSEA ≤ 0.015, and ΔSRMR ≤0.030 (for metric invariance), or ΔSRMR ≤0.015 (for scalar invariance) when comparing the less restrictive with the more restrictive model. The change in CFI was used as the main criterion as RMSEA and SRMR tend to over-reject invariant models when the sample size is small (Chen 2007).
The sample was overwhelmingly composed of college-educated, young adults with an English L1 (see Table S1–S4 in the Supplementary Data). We limited the number of subgroups for age, education, and L1 to two groups because additional categorizations would have resulted in severely unbalanced groups (due to small sample sizes), which may have affected the results (Yoon and Lai 2018). Therefore, this redistribution of groups was aimed at establishing groups of more or less equal sizes and of sufficient size to make invariance testing viable (e.g. Meade and Bauer 2007; Finch et al. 2018). As such, participants were grouped into two major age groups, namely, young adults (18–25 years; n = 208) and adults (26+ years; n = 162). Two major education groups were also formed, namely, those with a secondary school education (n = 117) and those with a post-secondary school education (n = 253). Furthermore, due to the overwhelming number of English L1 participants in the sample, two L1 groupings were made, namely, English L1 (n = 212) and non-English L1 (n = 158). Lastly, gender was examined via self-identified gender categories of female (n = 202) and male (n = 168).
RESULTS
Step 1: Splitting the data set
The data set was randomly split into two separate samples. The two samples were examined via t-tests to determine that no statistically significant differences were present. Statistically significant differences based on age, level of multilingualism, FLCA, and the PSWQ-A between Samples 1 and 2 of the data were investigated via t-tests (see Table 2). No statistically significant differences between the two sample groups were found. Furthermore, the descriptive statistics for each sample can be found in Table 2.
| Variable . | n . | Mean . | SD . | t-value . | p-value . |
|---|---|---|---|---|---|
| Age | −0.41 | 0.682 | |||
| Sample 1 | 185 | 27.34 | 10.11 | ||
| Sample 2 | 185 | 27.78 | 10.40 | ||
| Multilingualism | −0.32 | 0.750 | |||
| Sample 1 | 185 | 3.94 | 1.69 | ||
| Sample 2 | 185 | 4.01 | 1.89 | ||
| FLCA | −1.42 | 0.157 | |||
| Sample 1 | 185 | 2.82 | 0.92 | ||
| Sample 2 | 185 | 2.96 | 0.93 | ||
| PSWQ-A | 0.154 | 0.877 | |||
| Sample 1 | 185 | 3.22 | 1.08 | ||
| Sample 2 | 185 | 3.20 | 1.01 |
| Variable . | n . | Mean . | SD . | t-value . | p-value . |
|---|---|---|---|---|---|
| Age | −0.41 | 0.682 | |||
| Sample 1 | 185 | 27.34 | 10.11 | ||
| Sample 2 | 185 | 27.78 | 10.40 | ||
| Multilingualism | −0.32 | 0.750 | |||
| Sample 1 | 185 | 3.94 | 1.69 | ||
| Sample 2 | 185 | 4.01 | 1.89 | ||
| FLCA | −1.42 | 0.157 | |||
| Sample 1 | 185 | 2.82 | 0.92 | ||
| Sample 2 | 185 | 2.96 | 0.93 | ||
| PSWQ-A | 0.154 | 0.877 | |||
| Sample 1 | 185 | 3.22 | 1.08 | ||
| Sample 2 | 185 | 3.20 | 1.01 |
| Variable . | n . | Mean . | SD . | t-value . | p-value . |
|---|---|---|---|---|---|
| Age | −0.41 | 0.682 | |||
| Sample 1 | 185 | 27.34 | 10.11 | ||
| Sample 2 | 185 | 27.78 | 10.40 | ||
| Multilingualism | −0.32 | 0.750 | |||
| Sample 1 | 185 | 3.94 | 1.69 | ||
| Sample 2 | 185 | 4.01 | 1.89 | ||
| FLCA | −1.42 | 0.157 | |||
| Sample 1 | 185 | 2.82 | 0.92 | ||
| Sample 2 | 185 | 2.96 | 0.93 | ||
| PSWQ-A | 0.154 | 0.877 | |||
| Sample 1 | 185 | 3.22 | 1.08 | ||
| Sample 2 | 185 | 3.20 | 1.01 |
| Variable . | n . | Mean . | SD . | t-value . | p-value . |
|---|---|---|---|---|---|
| Age | −0.41 | 0.682 | |||
| Sample 1 | 185 | 27.34 | 10.11 | ||
| Sample 2 | 185 | 27.78 | 10.40 | ||
| Multilingualism | −0.32 | 0.750 | |||
| Sample 1 | 185 | 3.94 | 1.69 | ||
| Sample 2 | 185 | 4.01 | 1.89 | ||
| FLCA | −1.42 | 0.157 | |||
| Sample 1 | 185 | 2.82 | 0.92 | ||
| Sample 2 | 185 | 2.96 | 0.93 | ||
| PSWQ-A | 0.154 | 0.877 | |||
| Sample 1 | 185 | 3.22 | 1.08 | ||
| Sample 2 | 185 | 3.20 | 1.01 |
Step 2: Investigate the factor structure
The eigenvalue criterion (eigenvalue >1), generated by the exploratory factor analysis (EFA) of Sample 1, indicated that FLCA has a unidimensional factor structure as the first factor had an eigenvalue of 4.670, with all following factors indicating eigenvalues < 0.850 (see Table 3). In addition, the scree plot further demonstrated that a single factor underlies the S-FLCAS (see Figure 2) because an inflection point after the first factor is visible, and no other inflection points were shown. It is therefore with some confidence that we can state that FLCA as captured by the S-FLCAS is a unidimensional construct.

| Item . | Factor 1 . |
|---|---|
| 1. Even if I am well prepared for FL class, I feel anxious about it | 0.775a |
| 2. I always feel that the other students speak the FL better than I do | 0.611a |
| 3. I can feel my heart pounding when I’m going to be called on in FL class | 0.779a |
| 4. I don’t worry about making mistakes in FL classb | 0.476c |
| 5. I feel confident when I speak in FL classa | 0.754a |
| 6 I get nervous and confused when I am speaking in my FL class | 0.737a |
| 7. I start to panic when I have to speak without preparation in FL class | 0.840a |
| 8. It embarrasses me to volunteer answers in my FL class | 0.781a |
| Item . | Factor 1 . |
|---|---|
| 1. Even if I am well prepared for FL class, I feel anxious about it | 0.775a |
| 2. I always feel that the other students speak the FL better than I do | 0.611a |
| 3. I can feel my heart pounding when I’m going to be called on in FL class | 0.779a |
| 4. I don’t worry about making mistakes in FL classb | 0.476c |
| 5. I feel confident when I speak in FL classa | 0.754a |
| 6 I get nervous and confused when I am speaking in my FL class | 0.737a |
| 7. I start to panic when I have to speak without preparation in FL class | 0.840a |
| 8. It embarrasses me to volunteer answers in my FL class | 0.781a |
High loading (>0.6).
Reverse-scored items.
Acceptable loading (0.4–0.6).
| Item . | Factor 1 . |
|---|---|
| 1. Even if I am well prepared for FL class, I feel anxious about it | 0.775a |
| 2. I always feel that the other students speak the FL better than I do | 0.611a |
| 3. I can feel my heart pounding when I’m going to be called on in FL class | 0.779a |
| 4. I don’t worry about making mistakes in FL classb | 0.476c |
| 5. I feel confident when I speak in FL classa | 0.754a |
| 6 I get nervous and confused when I am speaking in my FL class | 0.737a |
| 7. I start to panic when I have to speak without preparation in FL class | 0.840a |
| 8. It embarrasses me to volunteer answers in my FL class | 0.781a |
| Item . | Factor 1 . |
|---|---|
| 1. Even if I am well prepared for FL class, I feel anxious about it | 0.775a |
| 2. I always feel that the other students speak the FL better than I do | 0.611a |
| 3. I can feel my heart pounding when I’m going to be called on in FL class | 0.779a |
| 4. I don’t worry about making mistakes in FL classb | 0.476c |
| 5. I feel confident when I speak in FL classa | 0.754a |
| 6 I get nervous and confused when I am speaking in my FL class | 0.737a |
| 7. I start to panic when I have to speak without preparation in FL class | 0.840a |
| 8. It embarrasses me to volunteer answers in my FL class | 0.781a |
High loading (>0.6).
Reverse-scored items.
Acceptable loading (0.4–0.6).
The factor loadings of the individual items were also acceptable (>0.60; Kline 2005), with the exception of Item 4: ‘I do not worry about making mistakes in FL class’. This item had a somewhat low factor loading (although still acceptable; Stevens 1992), which could be attributed to using a negation (‘not’) to create an item that is indicative of a lack of anxiety and therefore is reversely-scored (Conrad et al. 2004). However, on the whole, the EFA in Step 2 of the data analysis process yielded a clear unidimensional factor solution for FLCA.
Step 3: Confirming the factor structure
As the factor structure of the S-FLCAS was found to be unidimensional in Step 2 of the data analysis, Step 3 proceeded with a confirmatory factor analysis of the eight-item, one-factor S-FLCAS (see Figure 3).

Overall, the fit statistics indicated a good fit, χ2(20) = 56.324, p < 0.001. The CFI and TLI were both above the desired cut-off of >0.90 and indicated a close fit (CFI = 0.984; TLI = 0.927; Kline 2005). The SRMR further demonstrated good fit as it was below the cut-off of <0.05 (SRMR = 0.048; Kenny 2020). However, the RMSEA indicated a potential issue with fit (RMSEA = 0.099), as it was well above the desired cut-off of <0.08 (Kenny 2020).
The factor loadings and modification indices provided some insight into the RMSEA results. Item 4 had a considerably higher standard error (SE = 0.102) in comparison with other items. Items 4 and 5 are reverse scored, with Item 4 in particular using the negative adverb of ‘not’ to create a negatively worded statement (‘I don’t worry about making mistakes in FL class’). Reverse-scored items are well-known to be associated with measurement difficulties, as reverse-scored items may lead to atypical responses (Carlson et al. 2011), which in turn impacts the models’ fit statistics (Conrad et al. 2004) and may lead to Type II errors in model rejection (Woods 2006). The S-FLCAS contains two reverse-scored items (Items 4 and 5). The modification index in the confirmatory factor analysis suggested correlating Items 4 and 5 and thus ‘fixing’ the pathway between the two reverse-scored items (MI = 15.921).
Therefore, due to the known measurement difficulties caused by reverse-scored items and the results of the modification index, a second measurement model was tested with a correlation added between Items 4 and 5 (see Figure 4).

The second model demonstrated an improved, close-fitting model (see Table 4). In particular, the CFI and TLI values both increased to indicate a very close fit (>0.95; Kline 2005). In addition, the SRMR further decreased to indicate a close fit (<0.05; Kenny 2020). However, the most significant improvement could be seen in the results of the RMSEA, which now indicated an adequate fit in the second measurement model (<0.08; Kenny 2020).
Overall, the unidimensional model of the S-FLCAS provided a good fit to the data, with the caveat that if the individual items are used in SEM we would recommend correlating the reverse-scored Items 4 and 5 for the measurement model of the S-FLCAS in the future. As such, all validity and invariance analyses conducted in Step 5 of the data analysis included the symmetrical effect between Items 4 and 5.
Step 4: Recombining the data set
It should be noted that the two samples (n = 185; n = 185) utilized separately in Steps 2 and 3 of the data analyses were again recombined into a single data set (n = 370). This combined data set was used in the reliability, validity, and invariance testing in Step 5 of the data analysis.
Step 5: Validating the S-FLCAS
Reliability and internal consistency
The inter-item correlations for the eight-item S-FLCAS (see Table S5 in the Supplementary Data) also fell within the acceptable range (mean inter-item correlation = 0.505; range: 0.321–0.672; Ferketich 1991; Clark and Watson 1995), indicating the S-FLCAS items fit together conceptually and contribute unique variance to the scale. The internal consistency of the scale was acceptable, as measured by both Cronbach’s alpha (α = 0.891) and McDonald’s omega (ω = 0.893).
**p < 0.001.
**p < 0.001.
Validity
In order to establish validity, we compared the total score from the S-FLCAS with the scores from the PSWQ-A, the BFNES, and the S-FLES (see Table 5).
The S-FLCAS was found to be moderately positively correlated with general anxiety as measured by the PSWQ-A (r = 0.322, p < 0.001). Language anxiety is theoretically expected to be associated with, yet a distinct construct from, general anxiety (Horwitz et al. 1986). As such, the finding of a moderate positive correlation provided evidence of both convergent and discriminant validity.
Contrary to our expectations, no statistically significant correlation was found between the S-FLCAS and the BFNES (r = 0.008; p = 0.881). There was therefore no discernible relationship between language anxiety and fear of negative evaluation in this data set. To this end, this result was somewhat unexpected as the fear of negative evaluation has been theorized to be one of the building blocks of FLCA (Horwitz et al. 1986). Indeed, previous research using the 33-item FLCAS has found moderate positive correlations between FLCA and fear of negative evaluation (Tzoannopoulou 2016), with Šafranj and Zivlak (2019) finding that fear of negative evaluation positively predicted FLCA (β = 0.13, p < 0.05). The statistically insignificant correlation found between the S-FLCAS and BFNES reported here suggests that this component of the original FLCAS is not well reflected in the short form of the scale.
An EFA with ML estimation and oblique (promax) rotation of the eight-item S-FLCAS, the eight-item PSWQ-A, and the eight-item BFNES was conducted in order to further investigate the discriminant validity of the S-FLCAS (see Table 6). The items of each scale loaded onto unique, separate factors. Thus, the EFA clearly indicated that language anxiety as measured through the S-FLCAS, general anxiety as measured through the PSWQ-A, and fear of negative evaluation as measured by the BFNES were three distinct constructs. The results further substantiated the discriminate validity of the S-FLCAS.
| Item . | Factor 1 . | Factor 2 . | Factor 3 . |
|---|---|---|---|
| FLCA | |||
| Item 1 | 0.704a | ||
| Item 2 | 0.580b | ||
| Item 3 | 0.826a | ||
| Item 4* | 0.536b | ||
| Item 5* | 0.738a | ||
| Item 6 | 0.733a | ||
| Item 7 | 0.808a | ||
| Item 8 | 0.765a | ||
| PSWQ-A | |||
| Item 1 | 0.830a | ||
| Item 2 | 0.849a | ||
| Item 3 | 0.838a | ||
| Item 4 | 0.768a | ||
| Item 5 | 0.848a | ||
| Item 6 | 0.759a | ||
| Item 7 | 0.768a | ||
| Item 8 | 0.906a | ||
| BFNES | |||
| Item 1 | 0.873a | ||
| Item 2 | 0.885a | ||
| Item 3 | 0.881a | ||
| Item 4 | 0.892a | ||
| Item 5 | 0.848a | ||
| Item 6 | 0.850a | ||
| Item 7 | 0.897a | ||
| Item 8 | 0.816a |
| Item . | Factor 1 . | Factor 2 . | Factor 3 . |
|---|---|---|---|
| FLCA | |||
| Item 1 | 0.704a | ||
| Item 2 | 0.580b | ||
| Item 3 | 0.826a | ||
| Item 4* | 0.536b | ||
| Item 5* | 0.738a | ||
| Item 6 | 0.733a | ||
| Item 7 | 0.808a | ||
| Item 8 | 0.765a | ||
| PSWQ-A | |||
| Item 1 | 0.830a | ||
| Item 2 | 0.849a | ||
| Item 3 | 0.838a | ||
| Item 4 | 0.768a | ||
| Item 5 | 0.848a | ||
| Item 6 | 0.759a | ||
| Item 7 | 0.768a | ||
| Item 8 | 0.906a | ||
| BFNES | |||
| Item 1 | 0.873a | ||
| Item 2 | 0.885a | ||
| Item 3 | 0.881a | ||
| Item 4 | 0.892a | ||
| Item 5 | 0.848a | ||
| Item 6 | 0.850a | ||
| Item 7 | 0.897a | ||
| Item 8 | 0.816a |
Reverse-scored items.
High loading (>0.6).
Acceptable loading (0.4–0.6).
| Item . | Factor 1 . | Factor 2 . | Factor 3 . |
|---|---|---|---|
| FLCA | |||
| Item 1 | 0.704a | ||
| Item 2 | 0.580b | ||
| Item 3 | 0.826a | ||
| Item 4* | 0.536b | ||
| Item 5* | 0.738a | ||
| Item 6 | 0.733a | ||
| Item 7 | 0.808a | ||
| Item 8 | 0.765a | ||
| PSWQ-A | |||
| Item 1 | 0.830a | ||
| Item 2 | 0.849a | ||
| Item 3 | 0.838a | ||
| Item 4 | 0.768a | ||
| Item 5 | 0.848a | ||
| Item 6 | 0.759a | ||
| Item 7 | 0.768a | ||
| Item 8 | 0.906a | ||
| BFNES | |||
| Item 1 | 0.873a | ||
| Item 2 | 0.885a | ||
| Item 3 | 0.881a | ||
| Item 4 | 0.892a | ||
| Item 5 | 0.848a | ||
| Item 6 | 0.850a | ||
| Item 7 | 0.897a | ||
| Item 8 | 0.816a |
| Item . | Factor 1 . | Factor 2 . | Factor 3 . |
|---|---|---|---|
| FLCA | |||
| Item 1 | 0.704a | ||
| Item 2 | 0.580b | ||
| Item 3 | 0.826a | ||
| Item 4* | 0.536b | ||
| Item 5* | 0.738a | ||
| Item 6 | 0.733a | ||
| Item 7 | 0.808a | ||
| Item 8 | 0.765a | ||
| PSWQ-A | |||
| Item 1 | 0.830a | ||
| Item 2 | 0.849a | ||
| Item 3 | 0.838a | ||
| Item 4 | 0.768a | ||
| Item 5 | 0.848a | ||
| Item 6 | 0.759a | ||
| Item 7 | 0.768a | ||
| Item 8 | 0.906a | ||
| BFNES | |||
| Item 1 | 0.873a | ||
| Item 2 | 0.885a | ||
| Item 3 | 0.881a | ||
| Item 4 | 0.892a | ||
| Item 5 | 0.848a | ||
| Item 6 | 0.850a | ||
| Item 7 | 0.897a | ||
| Item 8 | 0.816a |
Reverse-scored items.
High loading (>0.6).
Acceptable loading (0.4–0.6).
Lastly, the moderate negative correlation found between the S-FLCAS and the S-FLES was as expected. A recent review of the literature found an overall moderate correlation between FLA and FLA (r = −0.30, k = 25, N = 13,421; Botes et al. 2022b). The S-FLCAS therefore followed the trend in the literature and provided further confirmation that FLA and FLE are two distinct constructs.
Invariance testing
We tested invariance by means of multi-group CFA models across gender, age, educational level, and L1, respectively. First, the configural invariance models provided a good fit to the data (Table 7), suggesting that the overarching factor structure was equivalent across all the gender, age, educational level, and L1 groups that we tested. Next, we tested for metric invariance and compared the configural invariance models with the metric invariance models. The ΔCFI, ΔRMSEA, and ΔSRMR values fell within the recommended cutoffs (i.e. ΔCFI −0.010 or less, ΔRMSEA ≤0.015, ΔSRMR ≤0.030), supporting invariant factor loadings across gender, age, educational level, and L1 groups (see Table 7). Lastly, we tested for scalar invariance and compared the less restrictive metric models with the more restrictive scalar models. The results suggested that the item intercepts could be assumed to be invariant across gender, age, educational level, and L1 groups because the ΔCFI, ΔRMSEA, and ΔSRMR values did not exceed the recommended cutoffs.3
| Invariance model . | χ2 . | df . | p-value . | CFI . | RMSEA . | SRMR . | ΔCFI . | ΔRMSEA . | ΔSRMR . |
|---|---|---|---|---|---|---|---|---|---|
Invariance across gender | |||||||||
| Configural | 74.997 | 38 | 0.001 | 0.974 | 0.073 | 0.038 | |||
| Metric | 80.560 | 45 | 0.001 | 0.975 | 0.065 | 0.048 | 0.001 | −0.008 | 0.010 |
| Scalar | 86.882 | 52 | 0.002 | 0.975 | 0.060 | 0.047 | 0.000 | −0.005 | −0.001 |
Invariance across age | |||||||||
| Configural | 63. 605 | 38 | 0.006 | 0.982 | 0.060 | 0.031 | |||
| Metric | 71.682 | 45 | 0.007 | 0.981 | 0.057 | 0.049 | −0.001 | −0.003 | 0.018 |
| Scalar | 78.503 | 52 | 0.010 | 0.981 | 0.052 | 0.047 | 0.000 | −0.005 | −0.002 |
Invariance across educational level | |||||||||
| Configural | 76.585 | 38 | 0.001 | 0.973 | 0.074 | 0.034 | |||
| Metric | 88.499 | 45 | 0.001 | 0.970 | 0.072 | 0.056 | −0.003 | −0.002 | 0.022 |
| Scalar | 101.810 | 52 | 0.001 | 0.965 | 0.072 | 0.054 | −0.005 | 0.000 | −0.002 |
Invariance across L1 | |||||||||
| Configural | 67.552 | 38 | 0.002 | 0.979 | 0.065 | 0.033 | |||
| Metric | 71.781 | 45 | 0.007 | 0.981 | 0.057 | 0.042 | 0.002 | −0.008 | 0.009 |
| Scalar | 79.155 | 52 | 0.009 | 0.981 | 0.053 | 0.041 | 0.000 | −0.004 | −0.001 |
| Invariance model . | χ2 . | df . | p-value . | CFI . | RMSEA . | SRMR . | ΔCFI . | ΔRMSEA . | ΔSRMR . |
|---|---|---|---|---|---|---|---|---|---|
Invariance across gender | |||||||||
| Configural | 74.997 | 38 | 0.001 | 0.974 | 0.073 | 0.038 | |||
| Metric | 80.560 | 45 | 0.001 | 0.975 | 0.065 | 0.048 | 0.001 | −0.008 | 0.010 |
| Scalar | 86.882 | 52 | 0.002 | 0.975 | 0.060 | 0.047 | 0.000 | −0.005 | −0.001 |
Invariance across age | |||||||||
| Configural | 63. 605 | 38 | 0.006 | 0.982 | 0.060 | 0.031 | |||
| Metric | 71.682 | 45 | 0.007 | 0.981 | 0.057 | 0.049 | −0.001 | −0.003 | 0.018 |
| Scalar | 78.503 | 52 | 0.010 | 0.981 | 0.052 | 0.047 | 0.000 | −0.005 | −0.002 |
Invariance across educational level | |||||||||
| Configural | 76.585 | 38 | 0.001 | 0.973 | 0.074 | 0.034 | |||
| Metric | 88.499 | 45 | 0.001 | 0.970 | 0.072 | 0.056 | −0.003 | −0.002 | 0.022 |
| Scalar | 101.810 | 52 | 0.001 | 0.965 | 0.072 | 0.054 | −0.005 | 0.000 | −0.002 |
Invariance across L1 | |||||||||
| Configural | 67.552 | 38 | 0.002 | 0.979 | 0.065 | 0.033 | |||
| Metric | 71.781 | 45 | 0.007 | 0.981 | 0.057 | 0.042 | 0.002 | −0.008 | 0.009 |
| Scalar | 79.155 | 52 | 0.009 | 0.981 | 0.053 | 0.041 | 0.000 | −0.004 | −0.001 |
| Invariance model . | χ2 . | df . | p-value . | CFI . | RMSEA . | SRMR . | ΔCFI . | ΔRMSEA . | ΔSRMR . |
|---|---|---|---|---|---|---|---|---|---|
Invariance across gender | |||||||||
| Configural | 74.997 | 38 | 0.001 | 0.974 | 0.073 | 0.038 | |||
| Metric | 80.560 | 45 | 0.001 | 0.975 | 0.065 | 0.048 | 0.001 | −0.008 | 0.010 |
| Scalar | 86.882 | 52 | 0.002 | 0.975 | 0.060 | 0.047 | 0.000 | −0.005 | −0.001 |
Invariance across age | |||||||||
| Configural | 63. 605 | 38 | 0.006 | 0.982 | 0.060 | 0.031 | |||
| Metric | 71.682 | 45 | 0.007 | 0.981 | 0.057 | 0.049 | −0.001 | −0.003 | 0.018 |
| Scalar | 78.503 | 52 | 0.010 | 0.981 | 0.052 | 0.047 | 0.000 | −0.005 | −0.002 |
Invariance across educational level | |||||||||
| Configural | 76.585 | 38 | 0.001 | 0.973 | 0.074 | 0.034 | |||
| Metric | 88.499 | 45 | 0.001 | 0.970 | 0.072 | 0.056 | −0.003 | −0.002 | 0.022 |
| Scalar | 101.810 | 52 | 0.001 | 0.965 | 0.072 | 0.054 | −0.005 | 0.000 | −0.002 |
Invariance across L1 | |||||||||
| Configural | 67.552 | 38 | 0.002 | 0.979 | 0.065 | 0.033 | |||
| Metric | 71.781 | 45 | 0.007 | 0.981 | 0.057 | 0.042 | 0.002 | −0.008 | 0.009 |
| Scalar | 79.155 | 52 | 0.009 | 0.981 | 0.053 | 0.041 | 0.000 | −0.004 | −0.001 |
| Invariance model . | χ2 . | df . | p-value . | CFI . | RMSEA . | SRMR . | ΔCFI . | ΔRMSEA . | ΔSRMR . |
|---|---|---|---|---|---|---|---|---|---|
Invariance across gender | |||||||||
| Configural | 74.997 | 38 | 0.001 | 0.974 | 0.073 | 0.038 | |||
| Metric | 80.560 | 45 | 0.001 | 0.975 | 0.065 | 0.048 | 0.001 | −0.008 | 0.010 |
| Scalar | 86.882 | 52 | 0.002 | 0.975 | 0.060 | 0.047 | 0.000 | −0.005 | −0.001 |
Invariance across age | |||||||||
| Configural | 63. 605 | 38 | 0.006 | 0.982 | 0.060 | 0.031 | |||
| Metric | 71.682 | 45 | 0.007 | 0.981 | 0.057 | 0.049 | −0.001 | −0.003 | 0.018 |
| Scalar | 78.503 | 52 | 0.010 | 0.981 | 0.052 | 0.047 | 0.000 | −0.005 | −0.002 |
Invariance across educational level | |||||||||
| Configural | 76.585 | 38 | 0.001 | 0.973 | 0.074 | 0.034 | |||
| Metric | 88.499 | 45 | 0.001 | 0.970 | 0.072 | 0.056 | −0.003 | −0.002 | 0.022 |
| Scalar | 101.810 | 52 | 0.001 | 0.965 | 0.072 | 0.054 | −0.005 | 0.000 | −0.002 |
Invariance across L1 | |||||||||
| Configural | 67.552 | 38 | 0.002 | 0.979 | 0.065 | 0.033 | |||
| Metric | 71.781 | 45 | 0.007 | 0.981 | 0.057 | 0.042 | 0.002 | −0.008 | 0.009 |
| Scalar | 79.155 | 52 | 0.009 | 0.981 | 0.053 | 0.041 | 0.000 | −0.004 | −0.001 |
Overall, our invariance results suggest that FL classroom anxiety was measured similarly by the S-FLCAS across the different groups. The scale properties (i.e. factor structure, factor loadings, and item intercepts) were fully invariant across the specified groups, suggesting that participants from different genders, ages, educational levels, and L1s understood the S-FLCAS items in comparable ways. Our results attest to the generalizability of the S-FLCAS across different subgroups of a population and indicate that latent mean comparisons on the S-FLCAS across gender, age, educational level, and L1 would be permissible, meaningful, and valid.
DISCUSSION
The S-FLCAS was found to be a valid and reliable measure after the five sequential validation steps were followed in this study.
The exploratory factor analysis uncovered a unidimensional structure underlying FLCA, with all eight items loading on a single latent variable. The unidimensional structure was further confirmed by the confirmatory factor analysis, although a minor measurement concern regarding the reverse-scored items (Items 4 and 5) was raised. The two reverse-scored items were therefore correlated as suggested by modification indices, and the second confirmatory factor analysis indicated a close fit (RMSEA = 0.078; CFI = 0.969). The confirmation of a unidimensional FLCA construct as measured by the S-FLCAS is not an unexpected finding, as Dewaele and MacIntyre’s (2016) results already indicated this possibility.
The unidimensional structure has the advantage of simplicity, in that the scale’s total score can easily be used in linear regression and correlational studies without compromising the underlying construct. In addition, when more advanced statistical techniques such as structural equation modelling are used with individual item scores, the specification of the proposed measurement model is straightforward with the recommended addition of fixing a path between the two reverse-scored items (specifically Items 4 and 5, see Figure 4). The inclusion of two reverse-scored items might reduce response set bias (Borgers et al. 2004) but it does pose some measurement limitations, as reverse-scoring has been found to adversely affect the model fit (Conrad et al. 2004) and item responses (Carlson et al. 2011). As such, we do recommend that the correlational path be included between the reverse-scored Items 4 and 5 when the S-FLCAS measurement model is used in the future. In addition, fruitful future research may be carried out by examining the value provided by the inclusion of the two negatively worded items and exploring the option of creating an only-positively worded S-FLCAS. Nevertheless, the clear unidimensional solution found to underlie the S-FLCAS can be considered a boon.
However, the acceptance of a unidimensional S-FLCAS does create some measurement contention. The S-FLCAS is meant to capture the construct of FLCA in the same manner as the full 33-item FLCAS, yet—as far as we are aware—no unidimensional solution of the FLCAS has ever been proposed. However, given that no factor structure has consistently and repeatedly been identified as underlying the FLCAS (see Table 1), and available solutions discard a substantial number of items, we suggest that the S-FLCAS and the FLCAS both measure FLCA, especially because MacIntyre (1992) found that the total scores from the two scales were very strongly correlated (r = 0.98, p < 0.01). Indeed, the numerous factor analyses of the full 33-item FLCAS may have hinted at a unidimensional structure, with previous factor analytic studies often finding a first factor that explained the majority of the variance in the data. For example, Aida (1994) and Tóth (2008) both found multidimensional solutions, but both authors’ first factors explained the considerable majority of the variance. It is likely the case that variance accounted for by the first factor is largely responsible for the pattern of validity correlations observed over the 30+ years of using the FLCAS in research. In addition, should a multidimensional structure of the full 33-item FLCAS be the preferred solution, the possibility of a hierarchical structure with a global FLCA factor cannot be discounted. As far as we are aware, a hierarchical solution of the full 33-item FLCAS has not been tested or considered in previous studies. As such, the S-FLCAS was found to have a clear unidimensional structure, which we argue does not hinder the S-FLCAS from being considered a valid short-form of the FLCAS.
Beyond the factor structure, additional validation results were promising. The S-FLCAS was found to have an acceptable internal consistency (α = 0.891; ω = 0.893). The statistically significant positive correlation between the S-FLCAS and the PSWQ-A (r = 0.322, p < 0.001) indicated both convergent and divergent validity. The result confirmed the theoretically expected relationship between FLCA and trait anxiety (Horwitz et al. 1986) but the moderate correlation further indicated that although the two constructs are related, they are independent. Furthermore, the statistically significant negative correlation found between the S-FLCAS and the S-FLES (r = −0.264, p < 0.001) further provides evidence of convergent and divergent validity as the two variables of FLCA and FLE have consistently been found to be moderately negatively correlated (see Botes et al. 2022b). The only fly in the proverbial ointment of the validation attempt is the non-significant result found between FLCA and fear of negative evaluation as measured by the S-FLCAS and the BFNES, respectively. A relationship between FLCA and fear of negative evaluation has been established in the literature (Tzoannopoulou 2016). The null result we found might be explained by the type of items that were removed in the original reduction of 33 items in the FLCAS to the 8 items in the S-FLCAS. In fact, given the substantial reduction in items, many of the items referring to the social evaluation of the FL classroom were cut. The 33-item FLCAS included four items that mention social comparison to other students in the class (e.g. ‘I keep thinking that the other students are better at languages than I am’). In addition, five of the original 33 items referred to the teacher (e.g. ‘It frightens me when I don't understand what the teacher is saying in the foreign language’). Reducing a scale by approximately 75 per cent, from 33 to 8 items, inevitably requires some sacrifice of detail in the measurement. However, given the results from MacIntyre (1992), the recent published studies using the S-FLCAS and the results in this present study, the short form seems to capture efficiently individual difference in anxiety in the FL classroom.
Strong results emerged from invariance testing which yielded overwhelmingly positive results supporting the use of the S-FLCAS. An equivalent factor structure, factor loadings, and item intercepts were found across age, gender, educational level, as well as L1 groups. Thus, FLCA was measured similarly across groups, and the use of the S-FLCAS in comparing age, gender, educational level, and L1 groups would be a valid endeavour. The importance of the invariance results should not be understated as it provided the first statistical evidence of the fairness of the S-FLCAS because the items did not function differently across groups (Kline 2013). This is an encouraging finding, as previous research examining FLCA has spanned numerous cultural, educational, and language settings (see Botes et al. 2020b). In addition, the result of full invariance across English and non-English L1 learners is an especially advantageous finding for the future use of the S-FLCAS in research. Researchers may therefore administer the S-FLCAS in English to non-English L1 FL learners with confidence, provided that the FL learners possess at least an intermediate proficiency in English.
The overall promising results on the validity and reliability of the S-FLCAS notwithstanding, the study and measure are not without limitations. First, the sample size (n = 370) placed some constraints on the data analysis, as larger sample sizes are often recommended for both structural equation modelling and invariance testing (Kenny 2020). The sample size especially affected the groupings in the invariance testing, with groups limited to two per category due to statistical power constraints. That said, the results of our simulation study using the Yoon and Lai (2018) subsampling method suggest that we can confidently conclude that our unbalanced sample sizes did not impact our conclusions concerning the invariance of the S-FLCAS across age, gender, educational level, and L1 groups. Secondly, the sample itself was also composed of a majority of highly educated young adults and L1 English speakers. These skewed demographics further limited the groupings that were possible in the invariance analyses. Lastly, the S-FLCAS itself is also limited to a given context, namely, FL anxiety experienced in the FL classroom by adult or adolescent FL learners. The S-FLCAS might not be suitable for use with non-traditional or self-taught FL learners who did not learn in a classroom and is also not advised to be administered to young children.
We hope that the benefits provided by the use of the shortened S-FLCAS, such as decreased time and resources needed for testing, will lead to a further expansion of research on FLCA. Specifically, the broadening of the nomological network of FLCA to include more positive psychology and well-being variables, thus reflecting the more holistic positive psychology approach to modern FL teaching (Dewaele, Chen et al. 2019). In addition, the shortened scale may be especially beneficial for longitudinal or experience sampling studies, the results of which would be a much needed boon in a field dominated by cross-sectional research. We also expect that future research utilizing the S-FLCAS will address some of the questions raised in this study. Indeed, the 33-item FLCAS has had three decades of psychometric and anecdotal evidence to recommend its use. Considerable future research is therefore needed to establish the S-FLCAS to the same extent. Nevertheless, the ease of use, interpretation, validity, and reliability of the measure confirmed in this study as well as the general benefits that short-form measures provide (Heene et al. 2014), undoubtedly recommend the S-FLCAS for future use in peer-reviewed research.
CONCLUSION
The study aimed to validate the S-FLCAS. Although the scale was introduced in 1992, it was only due to recent developments in the field that there has been an exponential rise in the use of the S-FLCAS. As such, a need to ensure the validity and reliability of the S-FLCAS has emerged. The validation efforts in this study uncovered a unidimensional factor structure, with all eight items loading on a single latent variable. Evidence of the internal consistency as well as the convergent and discriminant validity of the S-FLCAS was found. In addition, invariance testing confirmed that the scale properties of the S-FLCAS are fully invariant across age, gender, educational level, and L1 groups. On the whole, the psychometric evidence behind the S-FLCAS is overwhelmingly positive, and we recommend its future use in applied linguistics research.
SUPPLEMENTARY DATA
Supplementary material is available at Applied Linguistics online.
Footnotes
In the memory of Elaine K. Horwitz (1950–2022)
It should be noted that the eight-item scale was not given a specific name to differentiate it from the original 33-item FLCAS in either MacIntyre (1992) or Dewaele and MacIntyre (2014). For the sake of clarity, we decided to call the eight-item measure the Short-Form Foreign Language Classroom Anxiety Scale in this study.
The Cronbach’s alpha of the S-FLCAS can be considered high (α = .89), indicating that the items are closely related and show internal consistency (Field 2013). However, the Cronbach’s alpha cannot determine the dimensionality of a measure and as such further analyses were needed to determine the factor structure—and by extension—the validity of the measure.
To account for the unbalanced group sizes in our invariance testing, we conducted a simulation study using a subsampling approach developed by Yoon and Lai (2018). The method and results of these analyses are reported in Table S6 of the Supplementary Data. In sum, the results of the simulation study confirmed our conclusion that the S-FLCAS could be considered scalar invariant across gender, age, educational level, and L1 groups.
Funding
Supported by the Luxembourg National Research Fund (FNR) (P RIDE/15/10921377).
REFERENCES
Elouise Botes is a postdoctoral researcher in educational psychology at the University of Vienna, Austria. Her research examines positive and negative emotions in foreign language learning. Her research interests include emotions, individual differences, personality, and psychometric validation. She holds a PhD in psychology from the University of Luxembourg and a master’s in organizational psychology from the University of Stellenbosch, South Africa. Address for correspondence: University of Vienna, Universitätsstraße 7, 1010 Wien, Austria. <elouise.botes@univie.ac.at>
Lindie van der Westhuizen is a Research and Development Specialist at the University of Luxembourg. She holds a Masters of Commerce in Industrial Psychology from the University of Stellenbosch, South Africa. Her research areas include Psychological Assessment, Psychometrics, Personality Psychology and Academic Motivation. Lindie is the former Editorial Assistant of the European Journal of Psychological Assessment and previously worked as a consultant in applied research and talent management across various industries. Address for correspondence: University of Luxembourg, 2 Av. de l'Universite, 4365 Esch-sur-Alzette, Luxembourg. <lindie.vanderwesthuizen@uni.lu>
Jean-Marc Dewaele is Professor of Applied Linguistics and Multilingualism. He has published widely on individual differences in classroom emotions. He is former president of the International Association of Multilingualism and the European Second Language Association and he is General Editor of Journal of Multilingual and Multicultural Development. He won the Robert Gardner Award for Excellence in Second Language and Bilingualism Research (2016) from the International Association of Language and Social Psychology. Address for correspondence: Birkbeck University of London, Malet Street, Bloomsbury, London WC1E 7HX, UK. <j.dewaele@bbk.ac.uk>
Peter D. MacIntyre is professor of psychology at Cape Breton University (Canada). His research focusses on the psychology of language and communication, including anxiety, motivation, well-being, and willingness to communicate. He has written books or edited anthologies on Positive Psychology, Motivational Dynamics, Nonverbal Communication, Teaching Innovations, and Language Learner Individual Differences. He is president of the International Association for the Psychology of Language Learning (2018–2022) and organizer of the fourth Psychology of Language Learning conference. Address for correspondence: Cape Breton University, 1250 Grand Lake Road, Sydney, NS B1P 6L2, Canada. <peter_macintyre@cbu.ca>
Samuel Greiff is Full Professor of Educational Assessment and Psychology at University of Luxembourg. He holds a PhD in cognitive and experimental psychology from the University of Heidelberg, Germany. Prof Greiff has been awarded several research funds by diverse funding organizations, was fellow in the Luxembourg research programme of excellency, and has published articles in national and international scientific journals and books. He serves (or has served) as editor for several journals, for instance as editor-in-chief for European Journal of Psychological Assessment, as associate editor for Intelligence and Journal of Educational Psychology. His work mainly focuses on educational and psychological assessment, cognitive and non-cognitive skills, and education in the 21st century. Address for correspondence: University of Luxembourg, 2 Av. de l'Universite, 4365 Esch-sur-Alzette, Luxembourg. <samuel.greiff@uni.lu>