Abstract

This study reports a meta-analysis that synthesizes the empirical research on the role of language aptitude in second language grammar acquisition. A total of 33 study reports were identified including 17 predictive studies that investigated the correlations between aptitude and ultimate L2 attainment and 16 interactional studies that examined the interface between aptitude and the effectiveness of instructional treatments. These studies generated 309 effect sizes and involved 3,106 L2 learners. It was found that aptitude showed an overall moderate association with L2 grammar learning, r = .31, 95% CI = .25–.36. Subsequent moderator analysis demonstrated that high school students were more likely to draw on aptitude than university students and that aptitude was more strongly correlated with explicit treatments than implicit treatments. The role of aptitude was more evident for younger learners than older learners in predictive studies whereas the opposite was true in interactional studies. The results suggest that language aptitude as measured via traditional aptitude tests is a set of cognitive abilities that were more implicated in initial stages of L2 development and conscious learning conditions. The findings are valuable to resolving some long-standing controversies surrounding language aptitude.

INTRODUCTION

Language aptitude has been found to be one of the most important individual difference variables in second language acquisition (SLA) (e.g. Cochran et al. 2010 ). 1 Since the inception of aptitude research in the 1950s, marked by the publication of Carroll and Sapon’s (2002) Modern Language Aptitude Test (MLAT), 2 there has been an abundance of empirical research in various instructional settings and with learners of different age groups. Related strands of research include validation of aptitude tests, the associations between aptitude (usually with other variables such as strategies or intelligence) and aspects of L2 proficiency, the impact of aptitude on the effects of instructional treatment, the role of aptitude in different instructional settings, and the relationship between aptitude and age. Despite the prolific empirical research accumulated over the past 50 years, exactly how aptitude relates to SLA remains unclear because of the differences and disparities between the findings of the primary research.

There has also been theoretical evaluation of the role of aptitude. A main controversy has centered on whether the role of aptitude is only manifest in conscious learning. Krashen (1981) argued that the bulk of SLA relies on implicit processing of available linguistic input and that ‘what is considered second or foreign language aptitude may be directly related to conscious learning’ ( Krashen 1981 : 158). While Krashen’s dismissal of conscious learning has been challenged, there has been no answer to his doubt about the relevance of aptitude to conditions amenable to implicit learning, such as that in naturalistic settings in the absence of formal instruction. In light of the importance of aptitude, the copious empirical research, and the many unanswered questions, the time is ripe for a meta-analysis that is able to provide an estimate of the association between aptitude and SLA and account for the between-study variation pertaining to the association.

DEFINITIONS AND CHARACTERISTICS OF LANGUAGE APTITUDE

There have been two representative definitions of language aptitude that reflect different approaches to this cognitive construct in relation to theory, research, and pedagogy. According to Carroll and Sapon (2002) , language aptitude refers to a set of cognitive abilities that are ‘predictive of how well, relative to other individuals, an individual can learn a foreign language in a given amount of time and under given conditions’ ( Carroll and Sapon 2002 : 23). Central in this view is the predictive power of aptitude and its link with ultimate L2 attainment, irrespective of instruction type and learning context. Underlying such a perspective is Carroll’s (1963) preference for an eclectic approach to language instruction: there is no need to tailor instruction to accommodate individual differences. The pedagogical value of Carroll’s conceptualization of aptitude and the instruments used to measure it lies in its (primary) prognostic function of foretelling a learner’s chances of success in meeting a criterion and its (secondary) diagnostic function of detecting learning disabilities. This constitutes a product-oriented, static view of language aptitude.

A different stance is held by Robinson (2005) , who considered language aptitude as ‘cognitive abilities information processing draws on during L2 learning and performance in various contexts and at different stages ’ [emphases added] ( Robinson 2005 : 46). Robinson’s definition reflects a process-oriented, dynamic view of language aptitude and is derived from Cronbach and Snow’s (1977) argument that aptitude is sensitive to environmental factors and is either activated or inhibited as a function of the properties of different learning conditions. Thus, no instruction/treatment is effective for all learners, and maximal effects are achievable only when there is a fit between a learner’s cognitive profile and the characteristics of the instructional context. Instruction, therefore, should be adjusted or modified to cater to learners’ variation in aptitude.

Despite the different views with regard to how aptitude relates to the outcome and process of L2 learning as represented in the above two definitions, both were based on the postulation that language aptitude consists of a set of cognitive abilities that are (i) relatively immutable, (ii) distinct from intelligence and other individual difference variables (e.g. motivation), and (iii) not a learning achievement. These characteristics of aptitude, however, only apply to cognitively mature learners, not child learners whose performance on aptitude measures such as the MLAT-elementary (MLAT-E) may improve as a function of age. It is worth noting that while SLA researchers tend to agree with the ‘aptitude as cognitive abilities’ axiom, the content and characteristics of aptitude are conceived differently in educational psychology. Snow (1991) , for example, contended that aptitude can be any measureable characteristic ‘propaedeutic’ (‘required as preparation for a learning condition’) ( Snow 1991 : 205) to a learning goal, including affective (feelings and emotions), conative (motivation), and cognitive (reasoning and memory) variables. A similar view of aptitude to that of Snow was evident in how aptitude was operationalized in the Pimsleur Language Aptitude Battery (PLAB), which included ‘attitude toward the target language’ as a component of language aptitude. In this meta-analysis, aptitude is conceived as a cognitive variable that is distinct from affective and conative variables.

COMPONENTS AND MEASURES

To date, there has been little theorization about what pieces the aptitude puzzle should include that are essential to SLA, and the ‘standard’ set of components have, by default, been those measured by the MLAT, the most influential aptitude measure dominating current aptitude research. The MLAT includes five subtests that were claimed to measure three aptitude components: phonetic coding, language analytic ability, and memory, which were believed to be essential to the learning of pronunciation, grammar, and vocabulary, respectively. The MLAT was developed based on Carroll’s observations about how languages were learned/taught during that time (1950s), and some of the subtests consist of learning tasks that were identical with or similar to those that happened in foreign language classes; for example, the Paired Associates subtest required a test-taker to memorize a list of words in an unknown language and their English translations and recall afterward. Other measures of aptitude (e.g. PLAB, DLAB, VORD, LLAMA, and CANAL-F) have been created for various purposes, but they were either modeled on or validated with reference to the MLAT and did not demonstrate higher predictive validity.

Nevertheless, criticism of the MLAT has never ceased, mainly on the grounds that it was developed based on audiolingual teaching characterized by mechanical drills and rote learning and was couched in a Behaviorist view of learning. The extent to which it is sensitive to learning in more communicative approaches is arguably questionable. One proposed change is to incorporate an element of working memory, which is required for the online storage and processing of linguistic stimuli. However, the addition of working memory as an aptitude measure has only been considered theoretically, and little empirical research has been undertaken to validate working memory as an aptitude component or to map the relationship between working memory and other aptitude components (but see *Roehr and Gánem-Gutiérrez 2009). For instance, the question of whether the phonological loop of working memory, which is responsible for storing and rehearsing verbal input, is distinct from or isomorphic with the memory and phonetic coding components in traditional aptitude tests remains unresolved.

Another weakness of the MLAT and similar tests impeding aptitude research is the fuzzy underlying constructs. This criticism is justifiable on two accounts. First, the test was validated only empirically (based on the data from over 5,000 foreign language learners) but not theoretically in that it was not guided by any second language learning theory. Secondly, the five subtests do not correspond with the three hypothesized aptitude components, which makes it difficult to meaningfully interpret the related findings. For example, based on Carroll’s speculations (1981) , the Number Learning subtest of the MLAT measures three cognitive abilities, namely memory, auditory alertness, and inductive language learning ability/analytic ability.

TWO BROAD LINES OF APTITUDE RESEARCH

As discussed above, language aptitude was defined in two subtly different ways: as a variable that is predictive of ultimate L2 attainment and one that interacts with contextual factors in affecting L2 outcomes. The two definitions are reflected in two parallel tracks of empirical research: predictive and interactional. A side-by-side comparison of the two types of research can be seen in Table 1 . Within predictive research, a substantial number of studies investigated the correlation between high school and university students’ aptitude scores and end-of-semester course grades or scores on proficiency tests. Some of the studies included other individual difference variables together with aptitude, and in general, aptitude was found to be the best predictor ( Cochran et al. 2010 ) and distinct from other variables (*Gardner and Lambert 1965). A subset of predictive studies utilized the concept of aptitude to test Bley-Vroman’s (1990) Fundamental Difference Hypothesis which stated that children draw on domain-specific, implicit language acquisition mechanisms, whereas adults resort to domain-general problem-solving cognitive abilities. These studies typically investigated naturalistic learners and sought to demonstrate that language aptitude was only predictive of the L2 achievement of late starters, not of those learners who started to learn a foreign/second language in childhood.

Table 1:

Predictive and interactional aptitude research

 Predictive Interactional 
Purpose To investigate how aptitude relates to ultimate L2 outcomes To examine how aptitude mediates the effects of instructional treatments 
Overall design Correlating aptitude scores with end-of-semester grades or scores of proficiency tests Correlating aptitude scores with posttest scores or gain scores after treatment 
Control of previous knowledge No, participants take one-shot tests Yes, through pretesting 
Linguistic focus No focus on specific structures Treatments and tests focus on specific linguistic targets 
Participants A whole cohort of students Learners are divided into several groups and receive different instructional treatments 
Pedagogical implications 
  • Selection of elite learners

  • Waiving language requirements

 
  • Learners with different aptitude profiles may benefit from different types of instruction

  • Instruction should be adapted to cater to differences in learners’ aptitude

 
 Predictive Interactional 
Purpose To investigate how aptitude relates to ultimate L2 outcomes To examine how aptitude mediates the effects of instructional treatments 
Overall design Correlating aptitude scores with end-of-semester grades or scores of proficiency tests Correlating aptitude scores with posttest scores or gain scores after treatment 
Control of previous knowledge No, participants take one-shot tests Yes, through pretesting 
Linguistic focus No focus on specific structures Treatments and tests focus on specific linguistic targets 
Participants A whole cohort of students Learners are divided into several groups and receive different instructional treatments 
Pedagogical implications 
  • Selection of elite learners

  • Waiving language requirements

 
  • Learners with different aptitude profiles may benefit from different types of instruction

  • Instruction should be adapted to cater to differences in learners’ aptitude

 
Table 1:

Predictive and interactional aptitude research

 Predictive Interactional 
Purpose To investigate how aptitude relates to ultimate L2 outcomes To examine how aptitude mediates the effects of instructional treatments 
Overall design Correlating aptitude scores with end-of-semester grades or scores of proficiency tests Correlating aptitude scores with posttest scores or gain scores after treatment 
Control of previous knowledge No, participants take one-shot tests Yes, through pretesting 
Linguistic focus No focus on specific structures Treatments and tests focus on specific linguistic targets 
Participants A whole cohort of students Learners are divided into several groups and receive different instructional treatments 
Pedagogical implications 
  • Selection of elite learners

  • Waiving language requirements

 
  • Learners with different aptitude profiles may benefit from different types of instruction

  • Instruction should be adapted to cater to differences in learners’ aptitude

 
 Predictive Interactional 
Purpose To investigate how aptitude relates to ultimate L2 outcomes To examine how aptitude mediates the effects of instructional treatments 
Overall design Correlating aptitude scores with end-of-semester grades or scores of proficiency tests Correlating aptitude scores with posttest scores or gain scores after treatment 
Control of previous knowledge No, participants take one-shot tests Yes, through pretesting 
Linguistic focus No focus on specific structures Treatments and tests focus on specific linguistic targets 
Participants A whole cohort of students Learners are divided into several groups and receive different instructional treatments 
Pedagogical implications 
  • Selection of elite learners

  • Waiving language requirements

 
  • Learners with different aptitude profiles may benefit from different types of instruction

  • Instruction should be adapted to cater to differences in learners’ aptitude

 

Interactional studies are experimental, and a prominent feature of these studies is the investigation of the comparative effects of different treatment types [e.g. explicit vs. implicit in *Sheen (2007) or inductive vs. deductive in *Hwu and Sun (2012)] and how the effects were related to one or more aptitude components or global aptitude scores. The instructional treatments are characterized by consistent manipulation of variables and use of focused tasks/tests that target one or several particular linguistic structures (which contrasts with predictive studies where learners were not tested on their knowledge or use of particular structures). Interactional studies were carried out either in the classroom where instruction was delivered to learners in groups (*Sheen 2007) or in a laboratory setting where learners received instruction on an individual basis (*Li 2011) or via the computer (*Robinson 1997).

OBJECTIVES OF THIS META-ANALYSIS

The current meta-analysis sought to explore the relation of language aptitude to L2 grammar (morphosyntax) acquisition by synthesizing related empirical research. The criterion variable was restricted to grammar learning because (i) grammar was the most frequently studied among the three aspects of L2 competence (the other two being pronunciation and vocabulary), (ii) the relation of aptitude to grammar learning has been a central theme in theoretical discussions on the role of aptitude in SLA, and (iii) space constraint makes it impossible to include all measures of L2 competence.

By using rigorous statistical procedures to aggregate the results (effect sizes) from primary studies, this meta-analysis seeks to provide a numeric estimate of the role of the construct and resolve the disparities in the findings of the aptitude studies. Aside from effect size aggregation, this meta-analysis also aims to examine the impact of several potential moderating variables on the role of language aptitude in L2 grammar learning. These variables include the setting (high school vs. university vs. naturalistic), age, explicitness of instruction, and research context (class vs. lab). The decision to examine these methodological features as moderators was based on the observation that they constituted defining characteristics of the included studies and that in previous meta-analyses in applied linguistics they have been found to mediate SLA processes.

This meta-analysis sought to answer the following research questions:

  1. What is the role of language aptitude in ultimate L2 grammar acquisition? What factors mediate its role?

  2. What is the relationship between language aptitude and the effects of different instructional treatments? What factors moderate this relationship?

METHOD

Identifying primary research

The related primary studies were located through five commonly used search strategies ( Li et al. 2012 ) including examining the reference sections of primary studies, computer search, checking bookshelves in the library, seeking advice from authorities, and emailing authors to retrieve unpublished studies. The databases searched include LLBA, MLA International Bibliography, ERIC, PsychINFO, ProQuest Social Science Journals, Social Sciences Citation Index, PsycARTICLES, and ProQuest Dissertations and Theses. During the searches, various combinations of the following key words were used: aptitude, language aptitude, cognitive aptitudes, MLAT , language analytic ability, phonetic coding, rote memory , second language acquisition/learning , and foreign language learning . A total of 67 studies were identified that investigated the relationship between language aptitude and some aspects of second language acquisition. In all, 33 of the studies included grammar learning as a criterion variable and these were selected for this meta-analysis.

Selection criteria

Given that the MLAT is the most influential aptitude test that marked the beginning of aptitude research, only studies conducted since its publication 3 were included in this meta-analysis. Studies published after May 2013 (my cutoff point for data collection) were not included. With regard to aptitude measures, this meta-analysis included all studies that used traditional aptitude measures including test batteries with the same or similar components to the MLAT (such as PLAB, VORD, DLAB, and LLAMA), translated versions of the MLAT in other languages, and tests that only measure one aptitude component. Whereas the aforementioned criteria relate to the predictor variable, selection criteria relating to measures of grammar learning concern the criterion variable. To be eligible for inclusion in this meta-analysis, a study had to contain a measure of L2 grammar knowledge or use, including both focused (targeting one or several specific grammar structures; typical of interactional studies) and unfocused (tapping general grammar knowledge without a focus; typical of predictive studies) grammar tests. With respect to research design, only studies reporting correlation coefficients ( r ) were included; results based on factor analysis, multiple regression analysis, or structural equation modeling analysis were not meta-analyzable and were therefore excluded. Also, studies investigating special learners (e.g. those with learning disabilities) were not included. To minimize publication bias, the fact that studies with statistically significant results are more likely to be published or submitted for publication, this meta-analysis included both published studies and unpublished Ph.D. dissertations.

Coding

An overarching principle adhered to in this study is that data coding (and analysis) must be both empirically feasible and theoretically meaningful. One example of such a principle is the decision to classify all identified studies as either predictive or interactional, depending on whether it investigated the relation of aptitude to the end product of grammar learning or the effectiveness of instructional treatments. The decision was based on the previously discussed different roles of aptitude ( Table 1 ) and on the observed characteristics of the two distinct paradigms of research. Including all studies in the moderator analysis would have been statistically desirable because of the resultant larger sample size but it would have been theoretically unsound to do so.

The protocol for coding aptitude measures appears in Table 2 and is further elaborated below:

  1. Complete test batteries with several subtests (MLAT, PLAB, etc.) retained their names, and versions of the MLAT and PLAB in other languages were labeled Quasi-MLAT and Quasi-PLAB, respectively. The MLAT Short consists of the last three subtests and was coded as MLAT because it was said to have the same predictive and construct validity as the full-length MLAT ( Carroll and Sapon 2002 ). By the same token, the combination of parts B, E, and F of the LLAMA was coded as LLAMA.

  2. Tests that were combinations of several components (e.g. MLAT 4 + 5 in Robinson 1997) were recorded intact.

  3. With respect to the coding of aptitude components, phonetic coding ability was represented by MLAT 2, PLAB 5 and 6, and LLAMA D and E; language analytic ability by MLAT 4, PLAB 4, LLAMA F, and the Language Analysis Test by Ottó (2002: cited in *Sheen (2007)); rote memory by MLAT 5 and LLAMA B. The coded aptitude components also consisted of the subtests of the MLAT and the PLAB in other languages. Subtests that do not clearly measure a single cognitive ability were not further coded as aptitude components, such as MLAT 3, which Carroll said may measure both phonetic coding ability and English vocabulary.

Table 2:

Coding protocol for aptitude measures

Codes in meta-analysis Measures in primary studies 
Original labels MLAT, EMLAT, PLAB, LLAMA 
Quasi-MLAT, Quasi-PLAB MLAT and PLAB in other languages 
Phonetic coding ability MLAT 2, PLAB 5 & 6, LLAMA-D & LLAMA-E, corresponding subtests of MLAT and PLAB in other languages 
Language analytic ability MLAT 4, PLAB 4, LLAMA-F, Language Analysis Test Ottó (2002), corresponding subtests of MLAT and PLAB in other languages 
Rote memory MLAT 5, LLAMA-B, corresponding subtests of MLAT and PLAB in other languages 
Original labels MLAT 1, MLAT 3, MLAT 4 + 5 
Codes in meta-analysis Measures in primary studies 
Original labels MLAT, EMLAT, PLAB, LLAMA 
Quasi-MLAT, Quasi-PLAB MLAT and PLAB in other languages 
Phonetic coding ability MLAT 2, PLAB 5 & 6, LLAMA-D & LLAMA-E, corresponding subtests of MLAT and PLAB in other languages 
Language analytic ability MLAT 4, PLAB 4, LLAMA-F, Language Analysis Test Ottó (2002), corresponding subtests of MLAT and PLAB in other languages 
Rote memory MLAT 5, LLAMA-B, corresponding subtests of MLAT and PLAB in other languages 
Original labels MLAT 1, MLAT 3, MLAT 4 + 5 

Note : MLAT: Modern Language Aptitude Test; EMLAT: MLAT for elementary students, also known as MLAT-E; MLAT 1: Number Learning; MLAT 2: Phonetic Script; MLAT 3: Spelling Clues; MLAT 4: Words in Sentences; MLAT 5: Paired Associates; PLAB: The Pimsleur Language Aptitude Battery; PLAB 3: Language Analysis; PLAB 5: Sound Discrimination; PLAB 6: Sound-Symbol Association; LLAMA B: Vocabulary Learning; LLAMA D: Phonetic Memory; LLAMA E: Sound-Symbol Correspondence; LLAMA F: Grammatical Inferencing.

Table 2:

Coding protocol for aptitude measures

Codes in meta-analysis Measures in primary studies 
Original labels MLAT, EMLAT, PLAB, LLAMA 
Quasi-MLAT, Quasi-PLAB MLAT and PLAB in other languages 
Phonetic coding ability MLAT 2, PLAB 5 & 6, LLAMA-D & LLAMA-E, corresponding subtests of MLAT and PLAB in other languages 
Language analytic ability MLAT 4, PLAB 4, LLAMA-F, Language Analysis Test Ottó (2002), corresponding subtests of MLAT and PLAB in other languages 
Rote memory MLAT 5, LLAMA-B, corresponding subtests of MLAT and PLAB in other languages 
Original labels MLAT 1, MLAT 3, MLAT 4 + 5 
Codes in meta-analysis Measures in primary studies 
Original labels MLAT, EMLAT, PLAB, LLAMA 
Quasi-MLAT, Quasi-PLAB MLAT and PLAB in other languages 
Phonetic coding ability MLAT 2, PLAB 5 & 6, LLAMA-D & LLAMA-E, corresponding subtests of MLAT and PLAB in other languages 
Language analytic ability MLAT 4, PLAB 4, LLAMA-F, Language Analysis Test Ottó (2002), corresponding subtests of MLAT and PLAB in other languages 
Rote memory MLAT 5, LLAMA-B, corresponding subtests of MLAT and PLAB in other languages 
Original labels MLAT 1, MLAT 3, MLAT 4 + 5 

Note : MLAT: Modern Language Aptitude Test; EMLAT: MLAT for elementary students, also known as MLAT-E; MLAT 1: Number Learning; MLAT 2: Phonetic Script; MLAT 3: Spelling Clues; MLAT 4: Words in Sentences; MLAT 5: Paired Associates; PLAB: The Pimsleur Language Aptitude Battery; PLAB 3: Language Analysis; PLAB 5: Sound Discrimination; PLAB 6: Sound-Symbol Association; LLAMA B: Vocabulary Learning; LLAMA D: Phonetic Memory; LLAMA E: Sound-Symbol Correspondence; LLAMA F: Grammatical Inferencing.

Learners’ age was entered as reported by primary researchers. In the event that information about the average age of a sample was unavailable, other information was used to infer the age of the participants, such as taking the median of a range. In the data set, there was a group of age-related studies, whose primary purpose was to verify the hypothesis that aptitude was only implicated in adult SLA. These studies provided two types of information about participants’ age: one related to the actual age of the learners when they participated in the study, which can be called ‘age of testing’, and the other to a range (e.g. ≥13) during which learners started to learn the L2, which can be called ‘age of onset’. The ‘age’ variable coded in this meta-analysis was age of testing, not age of onset.

In interactional studies, an instructional treatment was coded as explicit or implicit depending on whether it contained elements that drew learners’ attention to linguistics forms. Explicit treatments refer to any instruction that contained metalinguistic information (e.g. *Carpenter 2008) or information about the unacceptability of an L2 utterance (e.g. *Yilmaz 2013); these also include instruction intended to facilitate learners’ awareness of the target structure such as the rule-search condition in Robinson (*1997), where learners were required to find the grammar rule exemplified in the provided linguistic data and the computerized recasts in Trofimovich et al. (*2007), which were provided following the formula: learner production + recast + form-focusing device (learner asked whether he/she noticed the difference between the recast and his/her own production). An instructional treatment was coded as implicit if there was a lack of a form-focusing device and learning was intended to be derived from exposure to linguistic exemplars only. These include purely comprehension-based instruction such as Lee (*2008) and Carpenter (*2008) and interaction-based instruction incorporating implicit corrective feedback (e.g. recasts) such as Li (*2011).

A related variable that overlaps with the explicit–implicit distinction is research context: whether a study was conducted in the laboratory or the classroom. Both explicit and implicit instructional treatments can be implemented in either a laboratory or classroom context. However, in general, laboratory treatments tend to be more explicit than classroom treatments because in the laboratory, instruction is individualized, the learner is less subject to distraction, and the intention of the instruction is more easily perceived.

Analysis

The basic unit of analysis is the Fisher’s z -score transformed from the correlation coefficients ( r ) extracted from each included study. Fisher’s z was then transformed back to r in reporting the results. Fisher’s z (and its variance) was used instead of r because it has better statistical properties such as normal distribution and stable variance. Fisher’s z was calculated via the following formula:  
z=0.5×ln(1+r1r)
As with most meta-analyses, this study followed the typical procedure of conducting an overall analysis for the construct under investigation, which might be referred to as effect size aggregation, followed by moderator analysis that aims to identify factors that account for consistent variation of the effect sizes within the sampled population. A random-effects model was used for effect size aggregation and a mixed-effects model for moderator analysis. Only categories with more than three effect sizes were subjected to effect size aggregation and moderator analysis; a regression analysis was not pursued if there were less than five cases for the predictor variable. In effect size aggregation, a within-group Q value ( Q w ) was calculated for each group of effect sizes as a measure of homogeneity. For moderator analysis, between-group Q tests ( Q b ) were utilized for categorical variables and meta-regression analyses were performed for age, a continuous variable.

RESULTS

Study characteristics

A total of 33 study reports were identified that examined the role of language aptitude in second language grammar acquisition ( Table 3 ). (Due to limited space, information regarding the methodological and substantive aspects of the studies was included as Supplementary Data available at Applied Linguistics online). Among them, 17 were predictive studies and 16 were interactional studies; 23 were journal articles or book chapters and 10 were Ph.D. dissertations. To explore publication bias, that is, whether there were studies that had not been retrieved and would have affected the results, a fail-safe N was calculated, z = 12.12, p < .00, N = 1,228. This suggests that 1,228 studies are needed to nullify the results based on the current data set, and therefore it is safe to state that the construct under investigation is well represented by the included studies.

Table 3:

Methodological features of included studies

Methodological features Predictive Interactional Total 
Number of studies 17 16 33 
Publication type 
    Journal 18 23 
    Dissertation 10 
Number of effect sizes 71 238 309 
Number of participants 1,892 1,214 3,106 
Age of participants 
    Mean 28.4 20.4 24.4 
    Range 14–60 10–30.4  
    Median 24.0 20.9  
    Standard deviation 12.8 4.3  
Aptitude measures 
 Holistic aptitude 
        MLAT 
        Quasi-MLAT 
        EMLAT 
        LLAMA 
 Aptitude components 
        Language analytic ability 12 12 24 
        Phonetic coding 
        Rote memory 12 
 Others 
        Spelling clues 
        Number learning 
        Verbal aptitude 
        MLAT 4 + MLAT 5 
Instructional setting 
    Elementary 
    High school 
    University 11 16 
    Naturalistic 
    Immersion 
    Not reported 
Treatment type 
    Explicit N/A 16 16 
    Implicit N/A 12 12 
Research context 
    Laboratory N/A 11 11 
    Classroom N/A 
Methodological features Predictive Interactional Total 
Number of studies 17 16 33 
Publication type 
    Journal 18 23 
    Dissertation 10 
Number of effect sizes 71 238 309 
Number of participants 1,892 1,214 3,106 
Age of participants 
    Mean 28.4 20.4 24.4 
    Range 14–60 10–30.4  
    Median 24.0 20.9  
    Standard deviation 12.8 4.3  
Aptitude measures 
 Holistic aptitude 
        MLAT 
        Quasi-MLAT 
        EMLAT 
        LLAMA 
 Aptitude components 
        Language analytic ability 12 12 24 
        Phonetic coding 
        Rote memory 12 
 Others 
        Spelling clues 
        Number learning 
        Verbal aptitude 
        MLAT 4 + MLAT 5 
Instructional setting 
    Elementary 
    High school 
    University 11 16 
    Naturalistic 
    Immersion 
    Not reported 
Treatment type 
    Explicit N/A 16 16 
    Implicit N/A 12 12 
Research context 
    Laboratory N/A 11 11 
    Classroom N/A 
Table 3:

Methodological features of included studies

Methodological features Predictive Interactional Total 
Number of studies 17 16 33 
Publication type 
    Journal 18 23 
    Dissertation 10 
Number of effect sizes 71 238 309 
Number of participants 1,892 1,214 3,106 
Age of participants 
    Mean 28.4 20.4 24.4 
    Range 14–60 10–30.4  
    Median 24.0 20.9  
    Standard deviation 12.8 4.3  
Aptitude measures 
 Holistic aptitude 
        MLAT 
        Quasi-MLAT 
        EMLAT 
        LLAMA 
 Aptitude components 
        Language analytic ability 12 12 24 
        Phonetic coding 
        Rote memory 12 
 Others 
        Spelling clues 
        Number learning 
        Verbal aptitude 
        MLAT 4 + MLAT 5 
Instructional setting 
    Elementary 
    High school 
    University 11 16 
    Naturalistic 
    Immersion 
    Not reported 
Treatment type 
    Explicit N/A 16 16 
    Implicit N/A 12 12 
Research context 
    Laboratory N/A 11 11 
    Classroom N/A 
Methodological features Predictive Interactional Total 
Number of studies 17 16 33 
Publication type 
    Journal 18 23 
    Dissertation 10 
Number of effect sizes 71 238 309 
Number of participants 1,892 1,214 3,106 
Age of participants 
    Mean 28.4 20.4 24.4 
    Range 14–60 10–30.4  
    Median 24.0 20.9  
    Standard deviation 12.8 4.3  
Aptitude measures 
 Holistic aptitude 
        MLAT 
        Quasi-MLAT 
        EMLAT 
        LLAMA 
 Aptitude components 
        Language analytic ability 12 12 24 
        Phonetic coding 
        Rote memory 12 
 Others 
        Spelling clues 
        Number learning 
        Verbal aptitude 
        MLAT 4 + MLAT 5 
Instructional setting 
    Elementary 
    High school 
    University 11 16 
    Naturalistic 
    Immersion 
    Not reported 
Treatment type 
    Explicit N/A 16 16 
    Implicit N/A 12 12 
Research context 
    Laboratory N/A 11 11 
    Classroom N/A 

The included studies contributed 309 effect sizes and involved 3,106 second language learners. As can been seen from Figure 1 , there has been a rapid increase in the amount of empirical research since 2007, in both the predictive and interactional domains, especially the latter. One distinctive trend is that early aptitude research was mainly predictive, which is evidenced by the fact that before 1995 there was only one interactional study [*Hauptman 1971, which investigated how aptitude related to the effects of structural (inductive) and situational (deductive) approaches in an audiolingual setting]. The recent growth of predictive research was largely attributable to the burgeoning of age-related studies (five out of the eight study reports that have appeared since 2007 relate to age effects in SLA).

Figure 1:

A chronicle of aptitude research

Figure 1:

A chronicle of aptitude research

As Table 3 shows, the mean age of the participants of the studies is 24.4 years, but the participants in the predictive studies were on average eight years older than those in interactional studies, with a range of 16–60 years. Further examination of the data showed that by and large the gap was attributable to the subset of age-related studies that typically involved older learners, with an average age of 36.8 years. Whereas over one-third of the predictive studies were age-related studies that investigated (older) naturalistic learners, the majority of the interactional studies were conducted with university students, and there were none with naturalistic learners.

With regard to aptitude measures, full-length tests generating overall aptitude scores were administered in 17 primary studies, 14 of which used the MLAT, Quasi-MLAT(s), or the EMLAT, and 3 used the LLAMA. Among the three aptitude components, language analytic ability was the most frequently studied, followed in sequence by rote memory and phonetic coding; this is not surprising given that language analytic ability was postulated to be critical for grammar learning. Spelling Clues (MLAT 3), Number Learning (MLAT 1), and Verbal Aptitude (*DeKeyser et al. 2010) also appeared in the primary research, but since they did not fit in with the three components, they were placed in a miscellaneous category.

Among the 16 interactional studies in the data set, all included explicit treatments and 11 included implicit treatments. 9 of the 16 studies examined the comparative effects of explicit and implicit treatments, and how the effects were constrained by aptitude. 11 studies were conducted in the laboratory and 5 in classroom settings. In 9 of the 11 laboratory studies, instructional treatments were provided via computer, and only 2 involved face-to-face tutorials.

Meta-analytic results

Effect size aggregation

The first set of analyses examined the overall associations between all aptitude measures (hybrid measures including complete aptitude tests and subtests) and second language grammar acquisition reported in all the included studies (i.e. both predictive and interactional studies); analyses were also performed to investigate the predictive power of overall aptitude scores (based on complete aptitude tests including all subtests) in comparison with different aptitude components and of the different aptitude tests in the data set. The following results were obtained ( Table 4 ). First, an overall medium effect (based on Cohen’s benchmarks) size was found for all included studies, r = .31, p < .01, for predictive studies, r = .30, p < .01, and for interactional studies, r = .32, p < .01. The relatively narrow confidence intervals of the three mean effect sizes suggest that the results are robust. Secondly, overall aptitude scores demonstrated similar predictive power to language analytic ability, and both showed larger effect sizes than phonetic coding and rote memory. Post hoc pairwise contrasts showed that overall aptitude scores showed significantly higher correlations than rote memory, Q b = 6.21, p = .01, and so did language analytic ability, Q b = 7.23, p = .01. Spelling clues showed a moderate association with grammar learning, r = .29, p < .01, although it is not clear what this subtest of the MLAT exactly measures. Thirdly, the MLAT showed higher predictive validity, r = .40, p < .01, than its equivalent versions in other languages (Quasi-MLAT), r = .21, p < .01, and LLAMA, r = .34, p < .01. The difference between the mean effect size for the MLAT and Quasi-MLATs was found to be significant, Q b = 7.22, p = .01.

Table 4:

Overall results

  k r p 95% CI Homogeneity 
     Lower Upper  Q w p 
Overall 
 All studies 33 .31 .00 .25 .36 50.71 .02 
 Predictive 22 .30 .00 .24 .36 33.15 .04 
 Interactional 16 .32 .00 .23 .41 17.25 .30 
Aptitude and aptitude components 
 Overall aptitude 15 .34 .00 .26 .41 16.25 .28 
 Language analytic ability 24 .35 .00 .27 .43 57.08 .00 
 Phonetic coding ability .20 .03 .02 .38 15.72 .00 
 Rote memory 12 .19 .00 .12 .28 17.17 .10 
 Spelling clues .29 .00 .20 .37 7.48 .38 
Different aptitude measures 
 MLAT .40 .00 .31 .50 5.68 .46 
 Quasi-MLAT .21 .00 .09 .33 1.55 .67 
 LLAMA .34 .00 .14 .52 1.35 .51 
  k r p 95% CI Homogeneity 
     Lower Upper  Q w p 
Overall 
 All studies 33 .31 .00 .25 .36 50.71 .02 
 Predictive 22 .30 .00 .24 .36 33.15 .04 
 Interactional 16 .32 .00 .23 .41 17.25 .30 
Aptitude and aptitude components 
 Overall aptitude 15 .34 .00 .26 .41 16.25 .28 
 Language analytic ability 24 .35 .00 .27 .43 57.08 .00 
 Phonetic coding ability .20 .03 .02 .38 15.72 .00 
 Rote memory 12 .19 .00 .12 .28 17.17 .10 
 Spelling clues .29 .00 .20 .37 7.48 .38 
Different aptitude measures 
 MLAT .40 .00 .31 .50 5.68 .46 
 Quasi-MLAT .21 .00 .09 .33 1.55 .67 
 LLAMA .34 .00 .14 .52 1.35 .51 

CI: confidence interval.

Table 4:

Overall results

  k r p 95% CI Homogeneity 
     Lower Upper  Q w p 
Overall 
 All studies 33 .31 .00 .25 .36 50.71 .02 
 Predictive 22 .30 .00 .24 .36 33.15 .04 
 Interactional 16 .32 .00 .23 .41 17.25 .30 
Aptitude and aptitude components 
 Overall aptitude 15 .34 .00 .26 .41 16.25 .28 
 Language analytic ability 24 .35 .00 .27 .43 57.08 .00 
 Phonetic coding ability .20 .03 .02 .38 15.72 .00 
 Rote memory 12 .19 .00 .12 .28 17.17 .10 
 Spelling clues .29 .00 .20 .37 7.48 .38 
Different aptitude measures 
 MLAT .40 .00 .31 .50 5.68 .46 
 Quasi-MLAT .21 .00 .09 .33 1.55 .67 
 LLAMA .34 .00 .14 .52 1.35 .51 
  k r p 95% CI Homogeneity 
     Lower Upper  Q w p 
Overall 
 All studies 33 .31 .00 .25 .36 50.71 .02 
 Predictive 22 .30 .00 .24 .36 33.15 .04 
 Interactional 16 .32 .00 .23 .41 17.25 .30 
Aptitude and aptitude components 
 Overall aptitude 15 .34 .00 .26 .41 16.25 .28 
 Language analytic ability 24 .35 .00 .27 .43 57.08 .00 
 Phonetic coding ability .20 .03 .02 .38 15.72 .00 
 Rote memory 12 .19 .00 .12 .28 17.17 .10 
 Spelling clues .29 .00 .20 .37 7.48 .38 
Different aptitude measures 
 MLAT .40 .00 .31 .50 5.68 .46 
 Quasi-MLAT .21 .00 .09 .33 1.55 .67 
 LLAMA .34 .00 .14 .52 1.35 .51 

CI: confidence interval.

Moderator analysis

A moderator analysis was performed separately for predictive and interactional studies because of their distinct theoretical bases and research designs. For predictive studies, the two examined moderators were instructional setting and age. Under instructional setting, effect sizes were aggregated for ‘high school’, ‘university’, and ‘naturalistic’. As Table 5 shows, high school students consistently showed larger effect sizes than university students. A post hoc Q test showed that the difference between the effect sizes for high school and university students was significant for hybrid aptitude measures (including complete aptitude tests and subtests), Q b = 9.24, p = .00; the same held true for language analytic ability, Q b = 8.54, p = .00, and rote memory, Q b = 6.20, p = .01. Furthermore, the mean effect size associated with naturalistic learning was also significant, which seemed to suggest that aptitude was drawn on in untutored contexts as well as in language classes.

Table 5:

Moderator analysis for predictive studies—instructional setting

  k r 95% CI p Q tests  
    Lower Upper   Q b p 
Hybrid aptitude measures a      9.24 .00 
 High school .40 .31 .51 .00   
 University .21 .13 .28 .00   
 Naturalistic .27 .12 .41 .00   
Language analytic ability      8.54 .00 
 High school .50 .37 .61 .00   
 University .26 .16 .35 .00   
Rote memory      6.2 .01 
 High school .32 .18 .45 .00   
 University .12 .03 .20 .00   
  k r 95% CI p Q tests  
    Lower Upper   Q b p 
Hybrid aptitude measures a      9.24 .00 
 High school .40 .31 .51 .00   
 University .21 .13 .28 .00   
 Naturalistic .27 .12 .41 .00   
Language analytic ability      8.54 .00 
 High school .50 .37 .61 .00   
 University .26 .16 .35 .00   
Rote memory      6.2 .01 
 High school .32 .18 .45 .00   
 University .12 .03 .20 .00   

CI: confidence interval.

a All aptitude measures including complete aptitude tests and subtests.

Table 5:

Moderator analysis for predictive studies—instructional setting

  k r 95% CI p Q tests  
    Lower Upper   Q b p 
Hybrid aptitude measures a      9.24 .00 
 High school .40 .31 .51 .00   
 University .21 .13 .28 .00   
 Naturalistic .27 .12 .41 .00   
Language analytic ability      8.54 .00 
 High school .50 .37 .61 .00   
 University .26 .16 .35 .00   
Rote memory      6.2 .01 
 High school .32 .18 .45 .00   
 University .12 .03 .20 .00   
  k r 95% CI p Q tests  
    Lower Upper   Q b p 
Hybrid aptitude measures a      9.24 .00 
 High school .40 .31 .51 .00   
 University .21 .13 .28 .00   
 Naturalistic .27 .12 .41 .00   
Language analytic ability      8.54 .00 
 High school .50 .37 .61 .00   
 University .26 .16 .35 .00   
Rote memory      6.2 .01 
 High school .32 .18 .45 .00   
 University .12 .03 .20 .00   

CI: confidence interval.

a All aptitude measures including complete aptitude tests and subtests.

A meta-regression analysis was conducted on the effect of age on aptitude-achievement associations among predictive studies. A decision was made to exclude naturalistic learners and only include instructed learners given the heterogeneous learning experiences and wide age range of the former. Mixed-effects meta-regression analyses using the unrestricted maximum likelihood method showed that the regression coefficients for the predictor ‘age’ were all in the negative direction for hybrid aptitude measures, overall aptitude scores, language analytic ability, and rote memory. The coefficients for overall aptitude scores and rote memory were statistically significant. These results suggest that in instructed settings, as far as ultimate outcome in grammar learning is concerned, the older a learner is, the less likely he/she is to draw on language aptitude. The results are shown in Table 6 and visually displayed in Figures 2–5 .

Table 6:

Moderator analysis for predictive studies—age

 Q B 95% CI p 
   Lower Upper  
Hybrid aptitude measures a 1.57 −.01 −.03 .01 .21 
Overall aptitude scores 5.19 −.03 −.06 −.00 .02 
Language analytic ability .90 −.01 −.04 .01 .34 
Rote memory 8.57 −.04 −.07 −.01 .00 
 Q B 95% CI p 
   Lower Upper  
Hybrid aptitude measures a 1.57 −.01 −.03 .01 .21 
Overall aptitude scores 5.19 −.03 −.06 −.00 .02 
Language analytic ability .90 −.01 −.04 .01 .34 
Rote memory 8.57 −.04 −.07 −.01 .00 

CI: confidence interval.

a All aptitude measures including complete aptitude tests and subtests.

Table 6:

Moderator analysis for predictive studies—age

 Q B 95% CI p 
   Lower Upper  
Hybrid aptitude measures a 1.57 −.01 −.03 .01 .21 
Overall aptitude scores 5.19 −.03 −.06 −.00 .02 
Language analytic ability .90 −.01 −.04 .01 .34 
Rote memory 8.57 −.04 −.07 −.01 .00 
 Q B 95% CI p 
   Lower Upper  
Hybrid aptitude measures a 1.57 −.01 −.03 .01 .21 
Overall aptitude scores 5.19 −.03 −.06 −.00 .02 
Language analytic ability .90 −.01 −.04 .01 .34 
Rote memory 8.57 −.04 −.07 −.01 .00 

CI: confidence interval.

a All aptitude measures including complete aptitude tests and subtests.

Figure 2:

Age effects in predictive studies: all aptitude measures Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 2:

Age effects in predictive studies: all aptitude measures Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 3:

Age effects in predictive studies: overall aptitude scores Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 3:

Age effects in predictive studies: overall aptitude scores Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 4:

Age effects in predictive studies: language analytic ability Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 4:

Age effects in predictive studies: language analytic ability Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 5:

Age effects in predictive studies: rote memory Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 5:

Age effects in predictive studies: rote memory Note : The horizontal axis represents age and the vertical axis effect sizes

The moderator analyses for interactional studies investigated the impact of three potential mediating variables: explicitness/implicitness of instruction, research context (lab vs. class), and age. As can be seen from Table 7 , when instructional treatments were divided into explicit and implicit, the former showed significantly higher correlations with aptitude than the latter for hybrid aptitude measures, Q b = 7.66, p = .01, and for language analytic ability, Q b = 5.53, p = .02; the difference for overall aptitude scores was near significant, Q b = 3.57, p = .06. What stood out is the result for rote memory: explicit ( r = .28) and implicit instructions ( r = .22) seemed to draw on this component in similar ways. Assuming that language aptitude, especially language analytic ability, is important for the learning of metalinguistic knowledge, a follow-up analysis was conducted for explicit treatments with and without metalinguistic information. However, no difference was found between them for hybrid aptitude measures or language analytic ability.

Table 7:

Moderator analysis for interactional studies—explicitness of instruction

  r k 95% CI p Q test  
    Lower Upper   Q b p 
Explicit vs. implicit 
Hybrid aptitude measures a      7.66 .01 
 Explicit .44 15 .34 .52 .00   
 Implicit .21 11 .08 .34 .00   
Overall aptitude scores      3.57 .06 
 Explicit .48 .35 .59 .00   
 Implicit .26 .06 .44 .01   
Language analytic ability      5.53 .02 
 Explicit .40 11 .28 .51 .00   
 Implicit .17 .01 .32 .04   
Rote memory      .18 .68 
 Explicit .28 .09 .46 .01   
 Implicit .22 −.03 .44 .08   
 
Explicit: with vs. without metalinguistic information 
Hybrid aptitude measures a      .00 .99 
 With metalinguistic information .42 12 .30 .53 .00   
 Without metalinguistic information .42 .28 .55 .00   
Language analytic ability      .32 .57 
 With metalinguistic information .38 .23 .50 .00   
 Without metalinguistic information .44 .27 .57 .00   
  r k 95% CI p Q test  
    Lower Upper   Q b p 
Explicit vs. implicit 
Hybrid aptitude measures a      7.66 .01 
 Explicit .44 15 .34 .52 .00   
 Implicit .21 11 .08 .34 .00   
Overall aptitude scores      3.57 .06 
 Explicit .48 .35 .59 .00   
 Implicit .26 .06 .44 .01   
Language analytic ability      5.53 .02 
 Explicit .40 11 .28 .51 .00   
 Implicit .17 .01 .32 .04   
Rote memory      .18 .68 
 Explicit .28 .09 .46 .01   
 Implicit .22 −.03 .44 .08   
 
Explicit: with vs. without metalinguistic information 
Hybrid aptitude measures a      .00 .99 
 With metalinguistic information .42 12 .30 .53 .00   
 Without metalinguistic information .42 .28 .55 .00   
Language analytic ability      .32 .57 
 With metalinguistic information .38 .23 .50 .00   
 Without metalinguistic information .44 .27 .57 .00   

CI: confidence interval.

a All aptitude measures including complete aptitude tests and subtests.

Table 7:

Moderator analysis for interactional studies—explicitness of instruction

  r k 95% CI p Q test  
    Lower Upper   Q b p 
Explicit vs. implicit 
Hybrid aptitude measures a      7.66 .01 
 Explicit .44 15 .34 .52 .00   
 Implicit .21 11 .08 .34 .00   
Overall aptitude scores      3.57 .06 
 Explicit .48 .35 .59 .00   
 Implicit .26 .06 .44 .01   
Language analytic ability      5.53 .02 
 Explicit .40 11 .28 .51 .00   
 Implicit .17 .01 .32 .04   
Rote memory      .18 .68 
 Explicit .28 .09 .46 .01   
 Implicit .22 −.03 .44 .08   
 
Explicit: with vs. without metalinguistic information 
Hybrid aptitude measures a      .00 .99 
 With metalinguistic information .42 12 .30 .53 .00   
 Without metalinguistic information .42 .28 .55 .00   
Language analytic ability      .32 .57 
 With metalinguistic information .38 .23 .50 .00   
 Without metalinguistic information .44 .27 .57 .00   
  r k 95% CI p Q test  
    Lower Upper   Q b p 
Explicit vs. implicit 
Hybrid aptitude measures a      7.66 .01 
 Explicit .44 15 .34 .52 .00   
 Implicit .21 11 .08 .34 .00   
Overall aptitude scores      3.57 .06 
 Explicit .48 .35 .59 .00   
 Implicit .26 .06 .44 .01   
Language analytic ability      5.53 .02 
 Explicit .40 11 .28 .51 .00   
 Implicit .17 .01 .32 .04   
Rote memory      .18 .68 
 Explicit .28 .09 .46 .01   
 Implicit .22 −.03 .44 .08   
 
Explicit: with vs. without metalinguistic information 
Hybrid aptitude measures a      .00 .99 
 With metalinguistic information .42 12 .30 .53 .00   
 Without metalinguistic information .42 .28 .55 .00   
Language analytic ability      .32 .57 
 With metalinguistic information .38 .23 .50 .00   
 Without metalinguistic information .44 .27 .57 .00   

CI: confidence interval.

a All aptitude measures including complete aptitude tests and subtests.

With respect to the context in which an instructional treatment was implemented, classroom studies showed smaller mean effect sizes than laboratory studies for both hybrid aptitude measures ( r = .28, classroom; r = .38, laboratory) and language analytic ability ( r = .23, classroom; r = .38, laboratory) ( Table 8 ), although the differences did not reach the significance level on the Q tests. Furthermore, given the overlap between research context (lab vs. class) and explicitness of instruction (explicitness vs. implicitness), it was necessary to examine whether there was an interaction between the two variables, that is, explicit and implicit instructions may show different relationships with aptitude in class and laboratory contexts. The results showed that such interaction indeed existed: the gap between explicit and implicit instructions was substantially larger in classroom studies than in laboratory studies. In fact, the mean effect size for implicit instruction was nearly zero in classroom settings: r = .00, p = .99 for hybrid aptitude measures, and r = .04, p = .61 for language analytic ability. In stark contrast, explicit instruction showed medium to large correlations in classroom studies: r = .46, p < .01 for hybrid aptitude measures, and r = .39, p < .01 for language analytic ability. Post hoc Q tests confirmed that the differences between the two types of instructions in relation to aptitude were statistically significant. In laboratory contexts, however, none of the differences relating to the two instruction types were significant, although the effect sizes for explicit instruction remained larger than those for implicit instruction. Furthermore, unlike the classroom studies where the effectiveness of implicit instruction seemed to have no connection with aptitude, the correlations for implicit instruction were all significant in the laboratory.

Table 8:

Moderator analysis for interactional studies—research context × explicitness interaction

   r k 95% CI p Q test  
     Lower Upper   Q b p 
Class vs. lab 
 Hybrid a      .86 .35 
  Class .28 .08 .46 .01   
  Lab .38 11 .28 .47 .00   
 Language analytic ability      1.26 .26 
  Class .23 .07 .37 .00   
  Lab .33 .22 .44 .00   
Class 
 Hybrid      8.3 .00 
  Explicit .46 .23 .46 .00   
  Implicit .00  3 b −.20 .21 .99   
 Language analytic ability      7.03 .01 
  Explicit .39 .17 .57 .00   
  Implicit .04 −.10 .17 .61   
Lab 
 Hybrid      1.8 .18 
  Explicit .42 11 .31 .53 .00   
  Implicit .31 .19 .43 .00   
 Overall      2.25 .13 
  Explicit .45 .29 .59 .00   
  Implicit .26 .07 .44 .01   
 Language analytic ability      .29 .59 
  Explicit .38 .24 .50 .00   
  Implicit .31 .11 .49 .00   
   r k 95% CI p Q test  
     Lower Upper   Q b p 
Class vs. lab 
 Hybrid a      .86 .35 
  Class .28 .08 .46 .01   
  Lab .38 11 .28 .47 .00   
 Language analytic ability      1.26 .26 
  Class .23 .07 .37 .00   
  Lab .33 .22 .44 .00   
Class 
 Hybrid      8.3 .00 
  Explicit .46 .23 .46 .00   
  Implicit .00  3 b −.20 .21 .99   
 Language analytic ability      7.03 .01 
  Explicit .39 .17 .57 .00   
  Implicit .04 −.10 .17 .61   
Lab 
 Hybrid      1.8 .18 
  Explicit .42 11 .31 .53 .00   
  Implicit .31 .19 .43 .00   
 Overall      2.25 .13 
  Explicit .45 .29 .59 .00   
  Implicit .26 .07 .44 .01   
 Language analytic ability      .29 .59 
  Explicit .38 .24 .50 .00   
  Implicit .31 .11 .49 .00   

CI: confidence interval.

a All aptitude measures including complete aptitude tests and subtests.

b Analysis was not performed for variables with less than three effect sizes.

Table 8:

Moderator analysis for interactional studies—research context × explicitness interaction

   r k 95% CI p Q test  
     Lower Upper   Q b p 
Class vs. lab 
 Hybrid a      .86 .35 
  Class .28 .08 .46 .01   
  Lab .38 11 .28 .47 .00   
 Language analytic ability      1.26 .26 
  Class .23 .07 .37 .00   
  Lab .33 .22 .44 .00   
Class 
 Hybrid      8.3 .00 
  Explicit .46 .23 .46 .00   
  Implicit .00  3 b −.20 .21 .99   
 Language analytic ability      7.03 .01 
  Explicit .39 .17 .57 .00   
  Implicit .04 −.10 .17 .61   
Lab 
 Hybrid      1.8 .18 
  Explicit .42 11 .31 .53 .00   
  Implicit .31 .19 .43 .00   
 Overall      2.25 .13 
  Explicit .45 .29 .59 .00   
  Implicit .26 .07 .44 .01   
 Language analytic ability      .29 .59 
  Explicit .38 .24 .50 .00   
  Implicit .31 .11 .49 .00   
   r k 95% CI p Q test  
     Lower Upper   Q b p 
Class vs. lab 
 Hybrid a      .86 .35 
  Class .28 .08 .46 .01   
  Lab .38 11 .28 .47 .00   
 Language analytic ability      1.26 .26 
  Class .23 .07 .37 .00   
  Lab .33 .22 .44 .00   
Class 
 Hybrid      8.3 .00 
  Explicit .46 .23 .46 .00   
  Implicit .00  3 b −.20 .21 .99   
 Language analytic ability      7.03 .01 
  Explicit .39 .17 .57 .00   
  Implicit .04 −.10 .17 .61   
Lab 
 Hybrid      1.8 .18 
  Explicit .42 11 .31 .53 .00   
  Implicit .31 .19 .43 .00   
 Overall      2.25 .13 
  Explicit .45 .29 .59 .00   
  Implicit .26 .07 .44 .01   
 Language analytic ability      .29 .59 
  Explicit .38 .24 .50 .00   
  Implicit .31 .11 .49 .00   

CI: confidence interval.

a All aptitude measures including complete aptitude tests and subtests.

b Analysis was not performed for variables with less than three effect sizes.

As with predictive studies, the effect of age in interactional studies was examined through meta-regression analysis. It was found that overall age was a positive predictor for the aptitude-instruction correlations; the result was not significant for hybrid aptitude measures, Qmodel = .54, B = .01, p = .46, but it was for language analytic ability, Qmodel = 5.3, B = .03, p < .05. The results indicated that older learners were more likely to take advantage of their language aptitude in benefiting from short-term, intensive instruction than younger learners. Figures 6 and 7 show the scatter plots and trend lines of how the effect sizes varied as a function of age.

Figure 6:

Age effects in interactional studies: hybrid aptitude measures. All aptitude measures include complete aptitude tests and subtests Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 6:

Age effects in interactional studies: hybrid aptitude measures. All aptitude measures include complete aptitude tests and subtests Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 7:

Age effects in interactional studies: language analytic ability Note : The horizontal axis represents age and the vertical axis effect sizes

Figure 7:

Age effects in interactional studies: language analytic ability Note : The horizontal axis represents age and the vertical axis effect sizes

DISCUSSION

This meta-analysis sought to tackle the conundrums surrounding the role of language aptitude in second language grammar acquisition by synthesizing the results of all related empirical studies. It was found that (i) aptitude showed an overall medium effect size in both predictive and interactional research, (ii) overall aptitude scores and language analytic ability were more predictive of grammar learning than phonetic coding ability and rote memory, (iii) high school students were more likely to draw on their language aptitude than students in university language programs, (iv) the effectiveness of explicit instruction was more related to aptitude than that of implicit instruction (and the gap was larger in classroom settings than in laboratory settings), and (v) while younger learners were more reliant on aptitude than older learners in predictive studies, the opposite was true in interactional studies. Interpretations for these findings will now be proposed by drawing on both the empirical evidence and related theoretical claims.

To begin with, the overall medium effect size found in this study, r = .31, is smaller than the estimated range, r = .4–.6, which was first reported by Carroll (1981) and subsequently cited or followed by other reviewers. The finding suggests that the commonly accepted estimate is somewhat inaccurate and that the role of language aptitude may have been overstated. Given that SLA is subject to a multitude of learner-internal and learner-external factors, the amount of variance (around 10 per cent) accounted for by aptitude as one individual difference variable should by no means be taken as small. However, the smaller-than-expected effect size is arguably also attributable to the joint influence of two methodological factors. One relates to the heterogeneity of aptitude measures used in the primary studies, a concern that is borne out by the fact that the predictive validity ( r = .21) of the so-called quasi-MLATs (tests modeled on the MLAT) was significantly lower than that of the MLAT ( r = .4), which served as the basis for Carroll’s estimation. A second likely cause for the disparity is the difference in the learner populations. In Carroll’s validation studies, learners with any previous language learning experience were excluded ( Carroll and Sapon 2002 ), a practice which was not followed in all of the subsequent aptitude research. Homogeneity of learner population seems to be a prerequisite for the effect of aptitude to become manifest because one assumption behind the concept of aptitude is that other things being equal, learners with higher aptitude learn more and faster.

The stronger association for overall aptitude scores and language analytic ability than phonetic coding ability and rote learning is of no surprise. This is because overall aptitude has greater predictive power than discrete components, and language analytic ability is integral to grammar learning—the criterion variable of this meta-analysis. The differential predictive validities of overall aptitude scores and different aptitude components suggest a need for aptitude researchers to clarify or at least be transparent about the construct under investigation. While Carroll (1981) preferred to discuss aptitude as a holistic construct, other researchers referred to a variety of cognitive abilities when addressing aptitude. For instance, Krashen (1981) treated aptitude as only consisting of language analytic ability, dismissed phonetic coding ability as irrelevant, and did not even mention rote memory. De Graaff (*1997) used a test composed of a Dutch version of MLATs 4 and 5 together with a test that measured the ability to infer word meaning from context, the construct behind this hybrid measure was referred to as aptitude.

One striking finding for the predictive studies is that the role of aptitude varied across different learning settings: it had a larger effect in high school language classes than in university language classes. This may be partly because language aptitude measured through traditional aptitude tests such as the MLAT was more implicated in initial than later stages of SLA. Compared with university students, the high school language learners had less exposure to the target language, and therefore the effect of aptitude was more evident. Carroll (1990) acknowledged that his notion of aptitude concerns the rate of learning a language ‘from scratch’ ( Carroll 1990 : 24) and there was a need for research on abilities relevant to more advanced stages of learning. Robinson (2005) also hypothesized that traditional aptitude measures were only predictive of beginning SLA, not high attainment levels. An alternative interpretation suggested by a reviewer is that compared with high school students, students in university language programs are a more ‘select’ population with less variation in aptitude (and probably higher mean aptitude scores) leading to the lower correlations between aptitude and success for this group. However, these explanations constitute theoretical postulates, hypotheses that are subject to empirical verification in primary research.

With regard to the mediating effect of setting, a significant effect size was also obtained for the age-related studies that were believed to examine ‘naturalistic acquisition’ through exposure to the target language in ‘non-tutored contexts’(*DeKeyser 2000: 518). However, the finding cannot be taken as proof for the existence of a significant link between aptitude and naturalistic learning because of uncertainty regarding the nature of the learners’ language learning experience. While it is true that the age-related studies involved learners whose length of residence in the host country was at least eight years (*DeKeyser et al. 2010), it is not clear whether or how much formal language instruction they had received, which may have contributed to the significant results. In light of the difficulty involved in identifying completely naturalistic learners, the question of whether naturalistic L2 learning is related to language aptitude will probably remain for an indefinite period.

Turning to the interactional studies, the finding that language aptitude was more likely to be drawn upon in explicit than implicit instruction sheds light on the long-standing controversy over whether traditional aptitude measures tap abilities that are fundamental only to conscious learning ( Krashen 1981 ), which most likely results from explicit instruction. The finding also testifies to Dörnyei and Skehan’s (2003) claim that aptitude ‘presupposes a requirement that there is a focus on form’ ( Dörnyei and Skehan 2003 : 600). Carroll’s (1981) comments regarding the scope of his conceptualization of aptitude are also suggestive; he pointed out that aptitude only applies to ‘deliberate’ learning in instructed settings and it is not certain how it interfaces with learning in milieus without formal instruction ( Carroll 1981 : 83). The sensitivity of traditional aptitude measures to conscious learning also has its origin in the instructional context where aptitude measures were validated and in the cognitive processes involved in the tasks included in the measures. It is well known that the MLAT, the most authoritative aptitude test, was developed in the traditional language classes that emphasized effortful, conscious learning.

A related finding that is seemingly at odds with the conclusion that aptitude is mainly involved in conscious learning is that the effect size associated with implicit instruction was also significant. However, subsequent analyses showed that the effect sizes for implicit instruction were different in laboratory and classroom contexts. While there was literally no correlation between aptitude and implicit classroom treatments, in laboratory settings the correlations for implicit treatments were all significant and not significantly different from those for explicit treatments. This is likely due to the fact that (i) in the laboratory linguistic forms are more salient than in the classroom, and (ii) the implicit instruction per se was likely not as implicit as it was intended to be. In Robinson (*2002), for example, the learners in the implicit condition were asked to memorize some language material, which resembles a working memory task that requires conscious processing.

It is also worth pointing out that while implicit instruction is more likely to lead to implicit learning, it may also contribute to explicit learning when the learner engages in conscious processing of available linguistic data; this is especially true of adult learners who are educated and have some formal language learning experience. The learners involved in the instructional treatments in the interactional studies were mostly students enrolled in language classes and had at least some amount of form-focused instruction. Therefore, the possibility of the learners drawing on explicit processes cannot be ruled out even when the instruction was implicit and was purported to not direct the learners’ attention to linguistic forms.

The opposing effects of age in predictive and interactional studies are difficult to account for. As to the negative effect of age in predictive research, a reasonable speculation seems to be that there is an overlap between ‘age’ and ‘instructional setting’, that is, younger learners happened to be high school learners and older learners were coincidentally university students. As previously explained, aptitude was more sensitive to the preliminary stages of L2 learning by high school learners than the higher attainment levels of university students. With respect to the positive effect of age in mediating the relationships between language analytic ability and L2 grammar learning in interactional studies, one caveat is that 10 out of the 12 studies were conducted with learners between 20 and 30 years old, a relatively homogeneous sample that was not ideal for the moderator analysis. It would seem that among adult learners, the effect of aptitude was more evident for older learners when receiving short-term, intensive instruction under experimental conditions where distracting variables were strictly controlled. Perhaps older learners who reached more advanced stages of L2 learning had higher motivation than younger learners, which may have led motivation to explain less of the variance, thus increasing the amount of variance explained by aptitude. In other words, the role of aptitude became more prominent when motivation became less important in differentiating learner outcomes. Of course, how aptitude interfaces with motivation and other individual difference variables in affecting learning outcome is an empirical question awaiting further research.

CONCLUSION

This meta-analysis showed that the importance of aptitude has been somewhat exaggerated, that it is predictive of initial L2 grammatical competence and less so of later stages of learning, and that it is a conscious construct that affects learning outcome in explicit conditions. The meta-analysis also shows that there are issues in need of immediate attention from aptitude researchers. First, the construct of aptitude needs to be theorized, clarified, and unified. There has been confusion about what aptitude consists of, and understanding of the concept has been based on what is tested by the MLAT or similar tests. The MLAT was validated with a large number of L2 learners and therefore achieved strong predictive validity, but it lacks construct validity. A second cause of concern relates to the criterion variable. In the primary studies, L2 grammar attainment was measured using a variety of tests such as grammaticality judgment test, error correction, reading comprehension, written production, to name only a few. Ellis and colleagues ( 2009 ) conducted a series of studies that demonstrated that variation in test format leads to change in the type of knowledge tapped by the test, which in turn affects the results relating to the construct under investigation. For example, learners’ judgments about grammatical items likely reflect their implicit knowledge about the target language and responses to ungrammatical items represent learners’ explicit knowledge. The strength or presence/absence of association between aptitude and L2 learning may to a large extent be a function of how the learning outcome is measured.

NOTES

1 References with an asterisk were also included in the meta-analysis and are listed in the Supplementary Data available at Applied Linguistics online.
2 See the 2002 edition of the manual for the test.
3 The MLAT was published in 1959.

REFERENCES

Bley-Vroman
R
The logical problem of foreign language learning
Linguistic Analysis
 , 
1990
, vol. 
20
 (pg. 
3
-
49
)
Carroll
J
A model of school learning
Teachers College Record
 , 
1963
, vol. 
64
 (pg. 
723
-
33
)
Carroll
J
Diller
K
Twenty-five years of research on foreign language aptitude
Individual Differences and Universals in Language Learning Aptitude
 , 
1981
Newbury House
(pg. 
83
-
118
)
Carroll
J
Parry
T
Stansfield
C
Cognitive abilities in foreign language aptitude: Then and now
Language Aptitude Reconsidered
 , 
1990
Prentice Hall
(pg. 
11
-
29
)
Carroll
J
Sapon
S
Manual for the MLAT
 , 
2002
Second Language Testing, Inc
Cochran
J
McCallum
R
Bell
S
Three A's: How do attributions, attitudes, and aptitude contribute to foreign language learning?
Foreign Language Annals
 , 
2010
, vol. 
43
 (pg. 
566
-
82
)
Cronbach
L
Snow
R
Aptitudes and Instructional Methods
 , 
1977
Irvington Publishers, Inc
Dörnyei
Z
Skehan
P
Catherine
D
Long
M
Individual differences in second language learning
Handbook of Second Language Acquisition
 , 
2003
Blackwell
(pg. 
589
-
630
)
Ellis
R
Loewen
S
Elder
C
Erlam
R
Philp
J
Reinders
H
Implicit and Explicit Knowledge in Second Language Learning, Testing and Teaching
 , 
2009
Multilingual Matters
Krashen
S
Diller
K C
Aptitude and attitude in relation to second language acquisition and learning
Individual Differences and Universals in Language Learning Aptitude
 , 
1981
Newbury House
(pg. 
155
-
75
)
Li
S
Shintani
N
Ellis
R
Doing meta-analysis in SLA: Practices, choices, and standards
Contemporary Foreign Language Studies
 , 
2012
, vol. 
384
 (pg. 
1
-
17
)
Robinson
P
Aptitude and second language acquisition
Annual Review of Applied Linguistics
 , 
2005
, vol. 
25
 (pg. 
46
-
73
)
Snow
R E
Aptitude-treatment interaction as a framework for research on individual differences in psychotherapy
Journal of Consulting and Clinical Psychology
 , 
1991
, vol. 
59
 
2
(pg. 
205
-
16
)

Supplementary data