Number transcoding (e.g., writing 29 when hearing “twenty-nine”) is one of the most basic numerical abilities required in daily life and is paramount for mathematics achievement. The aim of this study is to investigate psychometric properties of an Arabic number-writing task and its capacity to identify children with mathematics difficulties. We assessed 786 children (55% girls) from first to fourth grades, who were classified as children with mathematics difficulties (*n* = 103) or controls (*n* = 683). Although error rates were low, the task presented adequate internal consistency (0.91). Analyses revealed effective diagnostic accuracy in first and second school grades (specificity equals to 0.67 and 0.76 respectively, and sensitivity equals to 0.70 and 0.88 respectively). Moreover, items tapping the understanding of place-value syntax were the most sensitive to mathematics achievement. Overall, we propose that number transcoding is a useful tool for the assessment of mathematics abilities in early elementary school.

## Introduction

Daily activities require the communication of numerical information, such as registering a telephone number or making mental calculations. Besides that, being able to manipulate numbers is one of the first steps in mathematical learning, which begins to be formally trained in kindergarten. Learning the Arabic notation is one of the main challenges faced by young children in the first years of school, especially because of its place–value syntax (Geary, 2000). A useful tool for investigating children's knowledge of numerical syntax is the number transcoding task. This task requires the conversion of numerical symbols between verbal and Arabic numerical notations (Deloche & Seron, 1987).

The verbal number system is composed by a lexicon that designates some numbers (e.g., *five*, *eleven*), the bases by which they are multiplied (e.g., “ty” in *seventy*; *hundred*), as well as by a syntax that organizes these lexical units to represent any possible quantity. In turn, the Arabic number system possesses a lexicon of only 10 elements. Its basic syntactic principle is the place–value, according to which the actual value of a digit is given by its position in the number.

The ADAPT model (A Developmental, Asemantic, and Procedural Transcoding) by Barrouillet, Camos, Perruchet, and Seron (2004) accounts for this conversion from the verbal–oral to the Arabic form by means of representing information in phonological short-term memory, and by the lexical retrieval and rule application, which are driven by condition–action rules. When the lexical units in the verbal input match an Arabic form stored in long-term memory (LTM; e.g., *one* → 1, *fifteen* → 15), then the output is directly retrieved. Otherwise, specific rules are triggered, and operate recursively in the verbal string present in the input in order to build the correct output in the Arabic notation. The conditions that trigger a given rule can be either the class of the lexical primitives (unit, decade, hundreds, for example) or the presence of empty slots. There are eight different procedures triggered by the rules, such as “finding the positional value of the lexical primitive” (how many slots the frame must have), “filling empty slot with 0,” among others. These rules are devoted to (i) the retrieval of information from LTM (called P1 rules, responsible for retrieving “3” from its verbal form), (ii) to manage the size of digital chains (P2 and P3 rules; in “2003,” these rules create a frame of four slots), and (iii) to fill these slots (if there are any empty slots, P4 rules will fill them with 0 s).

Concerning the development of number transcoding in children, evidence suggests that the acquisition of the numerical lexicon (Wynn, 1992) and basic principles of numerical syntax (Barrouillet, Thevenot, & Fayol, 2010) are already acquired even before elementary school. During the first school years, the development of number transcoding skills is highly influenced by numerical length (quantity of digits) and syntactic complexity (quantity of transcoding rules). By the beginning of the second grade children already master the writing and reading of two-digit numbers, showing major difficulties in the transcoding of three- and four-digit numbers (Camos, 2008; Moura et al., 2013; Power & Dal Martello, 1990, 1997; Seron, Deloche, & Noël, 1992). Most of these difficulties are due to the place–value syntax of these larger numbers. In third and fourth graders, difficulties in number transcoding are scarce, and concentrated in three- and four-digit numbers with a more complex syntactic structure, such as the ones containing internal zeros (Moura et al., 2013; Sullivan, Macaruso, & Sokol, 1996). Therefore, numerical transcoding abilities for numbers up to four digits appear to be fully achieved in typically developing children after 3 years of formal education (Nöel & Turconi, 1999).

Only few studies have investigated the association between number transcoding and arithmetic achievement in school children. Examining first graders, Geary, Hoard, and Hamson (1999) and Geary, Hamson, and Hoard (2000) found a significant association between reading and writing of small numbers and formal mathematics achievement. Using a longitudinal approach, Moeller, Pixner, Zuber, Kaufmann, and Nuerk (2011) showed that, compared with working memory capacity and non-symbolic representations of numbers, the knowledge of place–value syntax in the end of first grade is the best predictor of mathematics achievement 2 years later. Furthermore, syntactic errors in an Arabic number-writing task and the decade-unit compatibility effect in a two-digit number comparison task (Nuerk, Weger, & Willmes, 2001) have proved to be particularly important to characterize and predict mathematics achievement in children (Moeller et al., 2011).

Difficulties in number transcoding have also been observed in children with developmental dyscalculia or mathematics learning difficulties. Studies suggest that writing and reading Arabic numbers impose relevant obstacles to younger children with mathematics learning difficulties aging ∼7 years old (Geary et al. 1999,, 2000). In turn, in older children (8 and 9 years old) these difficulties in number transcoding seem to be already overcome (Landerl, Bevan, & Butterworth, 2004) or restrict to planning times (van Loosbroek, Dirkx, Hulstijn, & Janssen, 2009). This issue was investigated in deeper detail by Moura and colleagues (2013), using more complex transcoding tasks containing numbers with up to four digits, and with increasing syntactic complexity. Results revealed significant differences between children with mathematics difficulties and typical achievers, from the first to the fourth grades, in both Arabic number reading and writing, but with effect sizes decreasing with grade. Importantly, in middle elementary grades, children with mathematics difficulties showed higher error rates in numbers with higher syntactic complexity. Moreover, an analysis of the erroneous responses suggested that, in early elementary school, children with mathematics difficulties struggle with both place–value syntax of Arabic numbers and with the acquisition of a numerical lexicon. In middle elementary school, the difficulties observed in children with mathematics difficulties were specific to the syntactic composition of Arabic numbers. The authors thus argued that, after the first school grades, children with mathematics difficulties are able to compensate at least part of their number transcoding deficits.

In summary, the literature on number processing and mathematics difficulties indicates that transcoding tasks are sensitive to and have a good predictive validity for mathematics difficulties (Moeller et al., 2011; Moura et al., 2013). Moreover, its cognitive underpinnings have been well characterized by current information processing models (Barrouillet et al., 2004; Camos, 2008; Cipolotti & Butterworth, 1995). Nevertheless, the diagnostic properties of number transcoding remain largely unexplored. In view of the above, one may consider the usefulness of number transcoding tasks in the screening of mathematics difficulties.

To our knowledge, there is no standardized task for assessing number transcoding abilities in school children. Even though number transcoding tasks are largely used in the investigation of numerical abilities in children (Geary et al. 1999, 2000; Landerl et al. 2004; Moura et al. 2013) and adults suffering from neurological impairments (Deloche & Seron, 1982a, 1982b; Seron & Deloche, 1983, 1984), there are no reports on reliability, validity and item properties of such tasks. In general, studies using number transcoding tasks are conducted in the context of pure experimental neuropsychology, in which psychometric properties are presumed and never explicitly investigated.

The aim of this study is to determine reference values and psychometric properties of a verbal to Arabic transcoding task in Brazilian school-aged children. In the present study, we assessed number transcoding by means of a number dictation task, in which numbers are orally presented and the child should write them in their Arabic form. The task was previously designed in the context of a wider investigation of mathematical abilities in children (Haase et al., 2014; Lopes-Silva, Moura, Júlio-Costa, Haase, & Wood, 2014; Moura et al, 2013). We reported normative parameters such as mean, range values, and percentiles for first to fourth grades obtained from a large sample of school children. In addition, the diagnostic accuracy of the number-writing task in the detection of children at risk for mathematical difficulties, as well as the influence of place–value syntax in children's achievement, was investigated.

## Method

### Participants

The sample was constituted by children attending to first to fourth grades in both public and private schools in the Brazilian cities of Belo Horizonte and Mariana. Data collection took place in 10 schools in Belo Horizonte (7 public), and 2 schools in Mariana (1 public). In Brazil, public schools are mostly attended to by children of lower to middle socioeconomic status. All study procedures were approved by the local university ethics committee.

In total, 985 children (85% from public schools) were assessed using the following three tasks: arithmetics and single-word spelling subtests of the Brazilian School Achievement Test (Teste do Desempenho Escolar, TDE, Stein, 1994), and the Arabic number-writing task. Testing was conducted in classrooms of 10–20 pupils. Children with mathematics difficulties were those with performance below the 25th percentile in the arithmetics subtest and the performance above the 25th percentile in the spelling subtest. Children with performance above the 25th percentile in both TDE subtests were classified as controls.

## Instruments

### Number transcoding task

#### Arabic number-writing task

Children were instructed to write down the Arabic numerals that corresponded to the dictated numbers (one-hundred and fifty → “150”). The task was composed by 28 items with one- to four-digit numbers. The use of three- and four-digit numbers intended to avoid numbers with strong lexical entries. The three- and four-digit numbers were grouped into three categories according to their complexity level (low, moderate, and high complexity numbers), which were defined based exclusively on the number of algorithmic transcoding rules necessary to transcode each individual item. This criterion was based upon the ADAPT model, which relates item complexity to the number of algorithmic rules necessary to transcode a number (Barrouillet et al., 2004): the more transcoding steps must be performed, the more difficult is an individual item. The administration of the Arabic number-writing task lasted for about 5 min in individual assessments, while in collective assessments this duration increased to ∼10–15 min. One point was assigned to each correct written number. There was no interruption criteria and no time limits, and one point was attributed to each correct answer.

### General School Achievement

#### School achievement test

The TDE (Stein, 1994) is the most widely used standardized test of school achievement in Brazil. The TDE comprises three subtests tapping basic educational skills: single-word reading (which was not used in the present study), single-word spelling to dictation, and basic arithmetic operations. The word spelling subtest consists of 34 dictated words with increasing complexity. The examiner dictated a word and afterwards a sentence containing this word, and finally repeated the word once more. One point was assigned to each correctly written word. The arithmetic subtest is composed of three simple oral word problems that require written responses (e.g., “John had nine stickers. He lost three. How many stickers does he have now?”) and 45 basic arithmetic calculations of increasing complexity that are presented and answered in writing (e.g., “1 + 1 = ?” and “(−4) × (−8) = ?.” One point was assigned to each correct calculation. Reliability coefficients (Cronbach's *α*) are ∼0.8 or higher. Children are instructed to work as much as they can, without time limits.

### Procedures

Testing was carried out in group sessions in children's own schools, in a separate silent classroom. The authors administered the task along with psychology undergraduate students. Sessions took ∼1 h, starting with Arabic number writing, followed by single-word spelling and arithmetics subtests of TDE. In general, children understood the tasks' instruction and could follow them adequately.

### Statistical analyses

Descriptive, reliability, internal consistency, and item analyses were carried out using R (R Development Core Team, 2011). Internal consistency was calculated using the Kuder-Richardson Formula 20 (KR-20), as items were coded as dichotomous variables. Receiver operating characteristic (ROC) analyses, mixed-ANOVA models, and Pearson's correlation analyses were carried out using SPSS version 20.0.

The ANOVAs included error rates in the Arabic number-writing task in the three levels of syntactic complexity (low, moderate, or high) as within-subjects factor, and mathematical ability (control or children with mathematics difficulties) as between-subjects factor. Whenever the assumption of sphericity was violated, the Greenhouse–Geisser correction was applied to the estimation of statistics. Finally, to approximate a normal distribution, error rates were arcsine transformed.

To interpret the ROC analyses, we considered the criteria established by Swets (1988), according to which *AUC* scores >0.7 indicate acceptable (moderate) levels of diagnostic accuracy. Moreover, interpretations were made considering the lower limits of a 95% confidence interval in order to ensure the reliability of our estimates. For the cutoff scores, we sought for the best balance between specificity and sensitivity values.

## Results

Children with insufficient performance (<25th percentile) in the spelling subtest only (*n* = 189) were not included in this study. Outliers (defined as 1.5 times the interquartile range below the first quartile, in each grade), and children with missing values (*n* = 10) in the Arabic number-writing task were also excluded from the analysis.

The final sample was then composed by 786 children (55% girls), with a mean age of 9 years 5 months (*SD* = 1 year 1 month), ranging from 6 to 12 years. There were 55 children in the first grade, 249 in the second grade, 225 in the third, and 257 in the fourth grade (see Table 1 for sample sizes displayed separately for school grade and group). The control group (*n* = 683) and children with mathematics difficulties (*n* = 103) did not differ regarding age (*t*[784] = −0.59, *p* = .55).

Mean number transcoding scores | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

First grade | Second grade | Third grade | Fourth grade | |||||||||

Control children | 16.80 (3.22) | 24.63 (3.11) | 27.60 (0.67) | 27.85 (.36) | ||||||||

Children with MD | 12.80 (3.33) | 15.92 (3.12) | 26.50 (1.66) | 27.34 (1.15) | ||||||||

Reliability | 0.79 | 0.88 | 0.46 | 0.33 | ||||||||

Overall | Control | MD | Overall | Control | MD | Overall | Control | MD | Overall | Control | MD | |

Sample size | 55 | 45 | 10 | 249 | 224 | 25 | 225 | 189 | 36 | 257 | 225 | 32 |

Total scores | Cumulative percentiles | |||||||||||

8 | 2 | 10 | ||||||||||

9 | 4 | 20 | ||||||||||

10 | 5 | 30 | ||||||||||

11 | 7 | 40 | ||||||||||

12 | 15 | 9 | 50 | 1 | ||||||||

13 | 24 | 18 | 70 | 3 | 12 | |||||||

14 | 40 | 33 | 90 | 4 | 32 | |||||||

15 | 53 | 44 | 100 | 5 | 40 | |||||||

16 | 55 | 47 | 6 | 52 | ||||||||

17 | 64 | 58 | 7 | 60 | ||||||||

18 | 67 | 62 | 8 | 72 | ||||||||

19 | 76 | 71 | 12 | 5 | 76 | |||||||

20 | 87 | 84 | 21 | 14 | 88 | |||||||

21 | 95 | 94 | 31 | 23 | 100 | |||||||

22 | 100 | 100 | 37 | 30 | ||||||||

23 | 45 | 39 | 2 | 11 | ||||||||

24 | 51 | 46 | 2 | 14 | ||||||||

25 | 58 | 53 | 4 | 25 | 2 | 16 | ||||||

26 | 62 | 58 | 15 | 11 | 36 | 3 | 22 | |||||

27 | 73 | 70 | 35 | 29 | 64 | 17 | 15 | 28 | ||||

28 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |

Mean number transcoding scores | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

First grade | Second grade | Third grade | Fourth grade | |||||||||

Control children | 16.80 (3.22) | 24.63 (3.11) | 27.60 (0.67) | 27.85 (.36) | ||||||||

Children with MD | 12.80 (3.33) | 15.92 (3.12) | 26.50 (1.66) | 27.34 (1.15) | ||||||||

Reliability | 0.79 | 0.88 | 0.46 | 0.33 | ||||||||

Overall | Control | MD | Overall | Control | MD | Overall | Control | MD | Overall | Control | MD | |

Sample size | 55 | 45 | 10 | 249 | 224 | 25 | 225 | 189 | 36 | 257 | 225 | 32 |

Total scores | Cumulative percentiles | |||||||||||

8 | 2 | 10 | ||||||||||

9 | 4 | 20 | ||||||||||

10 | 5 | 30 | ||||||||||

11 | 7 | 40 | ||||||||||

12 | 15 | 9 | 50 | 1 | ||||||||

13 | 24 | 18 | 70 | 3 | 12 | |||||||

14 | 40 | 33 | 90 | 4 | 32 | |||||||

15 | 53 | 44 | 100 | 5 | 40 | |||||||

16 | 55 | 47 | 6 | 52 | ||||||||

17 | 64 | 58 | 7 | 60 | ||||||||

18 | 67 | 62 | 8 | 72 | ||||||||

19 | 76 | 71 | 12 | 5 | 76 | |||||||

20 | 87 | 84 | 21 | 14 | 88 | |||||||

21 | 95 | 94 | 31 | 23 | 100 | |||||||

22 | 100 | 100 | 37 | 30 | ||||||||

23 | 45 | 39 | 2 | 11 | ||||||||

24 | 51 | 46 | 2 | 14 | ||||||||

25 | 58 | 53 | 4 | 25 | 2 | 16 | ||||||

26 | 62 | 58 | 15 | 11 | 36 | 3 | 22 | |||||

27 | 73 | 70 | 35 | 29 | 64 | 17 | 15 | 28 | ||||

28 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |

*Note*: Numbers in brackets represent *SDs*.

MD = mathematics difficulties.

In total, 55% of all children completed the transcoding task flawlessly. When analyzing groups separately, 57% of control children and 36% of children with mathematics difficulties did not commit any errors. The rate of correct items increased along with grade (Table 1, Fig. 1). As percentile distributions in Table 1 suggests, the task showed a ceiling effect in the third and fourth grades.

### Reliability and Internal Consistency of the Arabic Number-Writing Task

A high internal consistency was revealed by the KR-20 formula (*r* = .91), when examining the whole sample. When assessing school grades separately, high KR-20 indexes were observed in the first and second grades but not in higher grades (Table 1). The high internal consistency was further confirmed by a split-half analysis of the whole sample (*r* = .94).

### Item Analysis

Error rates per item were calculated by dividing the total amount of incorrect answers by the overall number of responses. Error rates per item category were calculated by dividing the number of incorrect responses in those items belonging to a certain category by the number of items and the total number of responses to those items. Table 2 depicts error rates for each item, and Fig. 1 depicts error rates separately by grade and children's group. For individual items, error rates varied from 0% to 28%, being particularly small for one- and two-digit numbers. Among one- and two-digit numbers, the most difficult item presented an error rate of modest 3%. Two-digit numbers imposed noticeable difficulties only for first graders with mathematics difficulties. Among three- and four-digit numbers, higher error rates were observed in control children attending the first grade and children with mathematics difficulties attending both first and second grades. Third graders with mathematics difficulties still showed some difficulties in transcoding the more syntactically complex numbers. In fourth grade, both groups showed comparable and almost flawless performance.

Item | Number | Complexity | Rules (ADAPT) | Error rate | Item-total correlation |
---|---|---|---|---|---|

1 | 4 | Null | 2 | 0.000 | — |

2 | 7 | Null | 2 | 0.000 | — |

3 | 1 | Null | 2 | 0.003 | 0.076 |

4 | 11 | Null | 2 | 0.003 | 0.101 |

5 | 40 | Null | 2 | 0.010 | 0.240 |

6 | 16 | Null | 3 | 0.003 | 0.051 |

7 | 30 | Null | 2 | 0.005 | 0.253 |

8 | 73 | Null | 3 | 0.034 | 0.367 |

9 | 13 | Null | 2 | 0.006 | 0.047 |

10 | 68 | Null | 3 | 0.031 | 0.332 |

11 | 80 | Null | 2 | 0.005 | 0.266 |

12 | 25 | Null | 3 | 0.000 | — |

13 | 200 | Low | 3 | 0.033 | 0.543 |

14 | 109 | Moderate | 4 | 0.046 | 0.619 |

15 | 150 | Low | 3 | 0.059 | 0.717 |

16 | 101 | Moderate | 4 | 0.045 | 0.630 |

17 | 700 | Low | 3 | 0.057 | 0.590 |

18 | 643 | High | 5 | 0.093 | 0.755 |

19 | 8,000 | Low | 3 | 0.107 | 0.632 |

20 | 190 | Low | 3 | 0.080 | 0.714 |

21 | 1,002 | Moderate | 4 | 0.182 | 0.665 |

22 | 951 | High | 5 | 0.111 | 0.747 |

23 | 1,015 | Moderate | 4 | 0.207 | 0.804 |

24 | 2,609 | High | 7 | 0.271 | 0.806 |

25 | 1,300 | Moderate | 4 | 0.221 | 0.851 |

26 | 3,791 | High | 7 | 0.276 | 0.788 |

27 | 1,060 | Moderate | 4 | 0.261 | 0.780 |

28 | 4,701 | High | 7 | 0.266 | 0.810 |

Item | Number | Complexity | Rules (ADAPT) | Error rate | Item-total correlation |
---|---|---|---|---|---|

1 | 4 | Null | 2 | 0.000 | — |

2 | 7 | Null | 2 | 0.000 | — |

3 | 1 | Null | 2 | 0.003 | 0.076 |

4 | 11 | Null | 2 | 0.003 | 0.101 |

5 | 40 | Null | 2 | 0.010 | 0.240 |

6 | 16 | Null | 3 | 0.003 | 0.051 |

7 | 30 | Null | 2 | 0.005 | 0.253 |

8 | 73 | Null | 3 | 0.034 | 0.367 |

9 | 13 | Null | 2 | 0.006 | 0.047 |

10 | 68 | Null | 3 | 0.031 | 0.332 |

11 | 80 | Null | 2 | 0.005 | 0.266 |

12 | 25 | Null | 3 | 0.000 | — |

13 | 200 | Low | 3 | 0.033 | 0.543 |

14 | 109 | Moderate | 4 | 0.046 | 0.619 |

15 | 150 | Low | 3 | 0.059 | 0.717 |

16 | 101 | Moderate | 4 | 0.045 | 0.630 |

17 | 700 | Low | 3 | 0.057 | 0.590 |

18 | 643 | High | 5 | 0.093 | 0.755 |

19 | 8,000 | Low | 3 | 0.107 | 0.632 |

20 | 190 | Low | 3 | 0.080 | 0.714 |

21 | 1,002 | Moderate | 4 | 0.182 | 0.665 |

22 | 951 | High | 5 | 0.111 | 0.747 |

23 | 1,015 | Moderate | 4 | 0.207 | 0.804 |

24 | 2,609 | High | 7 | 0.271 | 0.806 |

25 | 1,300 | Moderate | 4 | 0.221 | 0.851 |

26 | 3,791 | High | 7 | 0.276 | 0.788 |

27 | 1,060 | Moderate | 4 | 0.261 | 0.780 |

28 | 4,701 | High | 7 | 0.266 | 0.810 |

For the analyses of the item discriminability, item-total correlations were calculated (Table 2). One- and two-digit numbers showed low discriminability indexes (i.e., <0.40), which are in line with the very low error rates presented by these items. In turn, three- and four-digit numbers showed higher discriminability, varying from 0.54 to 0.85, thus suggesting that numbers with higher syntactical complexity are more discriminative for testing purposes.

### Task Accuracy

Accuracy of the Arabic number-writing task in discriminating children with mathematics difficulties was estimated with ROC analysis. Accuracy of Arabic number-writing task in identifying children with mathematics learning difficulties is moderate in the first grade and high in the second grade (*AUC* > 0.9; Table 3). However, in the next two grades the task did not show the same efficiency, achieving only a low accuracy (*AUC* < 0.7) in the fourth grade.

Grade | AUC | Std. error | p | Confidence interval (95%) | ||||
---|---|---|---|---|---|---|---|---|

Lower | Upper | Cutoff | Spec.^{a} | Sens.^{b} | ||||

1 | 0.791 | 0.077 | .004 | 0.641 | 0.941 | 14 | 0.667 | 0.700 |

2 | 0.967 | 0.014 | <.001 | 0.940 | 0.994 | 20 | 0.762 | 0.880 |

3 | 0.706 | 0.053 | <.001 | 0.603 | 0.809 | 27 | 0.709 | 0.639 |

4 | 0.582 | 0.060 | .135 | 0.464 | 0.699 | 27 | 0.849 | 0.281 |

Global | 0.655 | 0.031 | <.001 | 0.593 | 0.716 | 27 | 0.575 | 0.650 |

Grade | AUC | Std. error | p | Confidence interval (95%) | ||||
---|---|---|---|---|---|---|---|---|

Lower | Upper | Cutoff | Spec.^{a} | Sens.^{b} | ||||

1 | 0.791 | 0.077 | .004 | 0.641 | 0.941 | 14 | 0.667 | 0.700 |

2 | 0.967 | 0.014 | <.001 | 0.940 | 0.994 | 20 | 0.762 | 0.880 |

3 | 0.706 | 0.053 | <.001 | 0.603 | 0.809 | 27 | 0.709 | 0.639 |

4 | 0.582 | 0.060 | .135 | 0.464 | 0.699 | 27 | 0.849 | 0.281 |

Global | 0.655 | 0.031 | <.001 | 0.593 | 0.716 | 27 | 0.575 | 0.650 |

*Notes*: ^{a}Specificity. ^{b}Sensitivity.

### Influence of Syntactic Complexity on Transcoding Performance

The correlation between the number of errors and the number of transcoding rules was high (*r*[784] = .83; *p* < .001). Interestingly, this correlation remains stable even after removing the effect of the quantity of digits (*r*[783] = .59; *p* < .001).

To investigate in deeper detail the influence of syntactic complexity on number transcoding, we run a series of repeated measures ANOVAs separately for each school grade, having syntactic complexity as within-subjects factor, and mathematical ability as between-subjects factor.

In the first grade, a main effect of syntactic complexity, *F*(2, 11) = 32.91, *p* < .01, *MSE* = 2.71, $\eta p2=0.38$, reflected an increase in error rates as a function of the number of syntactic rules. Contrasts showed significant differences between all three levels of syntactic complexity (low vs. moderate: *F*(1, 53) = 8.37, *p* < .01, *MSE* = 1.53, $\eta p2=0.14$; moderate vs. high: *F*(1, 53) = 30.06, *p* < .001, *MSE* = 3.85, $\eta p2=0.36$). A main effect of group, *F*(1, 53) = 7.63, *p* < .001, *MSE* = 0.62, $\eta p2=0.13$, revealed higher error rates for children with mathematics difficulties. Fig. 2 shows the effects of syntactic complexity for each group in the four school grades. Moreover, an interaction between syntactic complexity and mathematical ability was observed, *F*(1, 53) = 3.48, *p* = .04, *MSE* = 0.29, $\eta p2=0.06$. *Post hoc* tests reveal differences between all levels of item complexity in control children. In turn, children with mathematics difficulties showed comparable and better performance in items with low and moderate complexity than in items with high complexity.

In the second grade, main effects of mathematical ability, *F*(1, 25) = 153.36, *p* < .005, *MSE* = 12.85, $\eta p2=0.38$, and syntactic complexity, *F*(2, 49) = 83.44, *p* < .001, *MSE* = 6.51, $\eta p2=0.25$, were significant. The interaction between mathematical ability and syntactic complexity also was significant, *F*(2, 49) = 3.79, *p* < .001, *MSE* = 0.30, $\eta p2=0.02$. *Post hoc* comparisons revealed significant differences between all levels of item complexity in both groups of children.

In the third grade, the main effect of syntactic complexity, *F*(2, 446) = 17.63, *p* = .001, *MSE* = 0.45, $\eta p2=0.07$, was significant. The main effect of mathematical ability was also significant, *F*(1, 23) = 35.67, *p* < .001, *MSE* = 0.46, $\eta p2=0.14$, as well as the interaction between complexity and mathematical ability, *F*(2, 446) = 3.95, *p* < .001, *MSE* = 0.10, $\eta p2=0.02$. *Post hoc* comparisons revealed in both groups of children significantly better performance in items with low complexity than in items with moderate or high complexity.

In the fourth grade, the main effect of mathematical ability was significant, though with a smaller effect size than in earlier grades, *F*(1, 225) = 15.14, *p* < .001, *MSE* = 0.06, $\eta p2=0.06$. Additionally, the effect of syntactic complexity was significant, *F*(2, 510) = 9.61, *p* < .001, *MSE* = 0.14, $\eta p2=0.04$. Numbers with moderate and high complexity did not differ in their error rates, *F*(1.25) = 60, *p* = .439, *MSE* = 0.02, $\eta p2=0.002$. The interaction between group and syntactic complexity was not significant, *F*(2, 510) = 2.30, *p* = .11, *MSE* = 0.03, $\eta p2=0.01$.

In summary, our data showed that the effects of syntactic complexity and mathematical ability on number transcoding are consistent until the fourth grade, but with decreasing effect sizes. The magnitude of the effect of mathematical ability varied from 0.14 in first grade to 0.06 in fourth grade, suggesting that in later grades, children with mathematics difficulties tend to reach the performance level of control children. Likewise, effect sizes of syntactic complexity decreased across grades, ranging from 0.38 to 0.04, so that children's difficulties in transcoding more complex numbers tend to decline with schooling. Interestingly, only in the fourth grade the interaction between mathematical ability and syntactic complexity was not significant. The absence of this interaction can be attributed to the ceiling effect in fourth grade, so that all children exhibited low error rates irrespective of syntactic complexity.

## Discussion

The purpose of this study was to examine the psychometric properties of a number transcoding task and its usefulness in screening mathematics learning difficulties in children in the early school years. The Arabic number-writing task is a simple and powerful instrument for assessing children's basic number transcoding skills in early elementary school. Furthermore, the task discriminates children depending on their mathematics learning difficulties with a high degree of sensitivity and specificity. The high reliability estimates of the transcoding task are promising regarding diagnostics and evaluation of cognitive interventions in mathematics difficulties. Closer item analysis revealed a strong impact of the number of rules necessary to transcode a number correctly. Moreover, results indicate that the number of rules per item can explain most of the group differences observed between children with and without mathematics difficulties. In the following, these results will be discussed in deeper detail.

### General Test Properties

High internal consistency coefficients were observed in the transcoding task in first and second grade children, reverting in a high precision in the characterization of individual performance (Huber, 1973). More specifically, the reliability coefficients observed in the present study in first and second grades can be considered invariant according to the criteria established by Willmes (1985) and can be used confidently to estimate confidence intervals for individual performance in the clinical context. Although one can consider the Arabic number-writing task economic in its present format, particularly because of its flexibility regarding group testing and short duration, one may desire to reduce test length, particularly because of the relatively large number of very easy one- and two-digit items (see further discussion on this merit below). Test reduction seems to us to be practically feasible since the determinants of item difficulty are well known and the pool of suitable items in the numeric interval between three- and four-digit numbers is large enough. In this context, the number of rules necessary to transcode individual items is particularly important as a criterion for the establishment of different groups of items. Accordingly, the number of transcoding rules is a good criterion to distinguish the level of competence typical of children with and without mathematics difficulties. This can be illustrated by the interactions between item complexity versus mathematical ability observed in first to third grades, which reflect that performance is differently affected by syntactic complexity in the two groups. An adaptive version of the task could be constructed in which the number of rules necessary to transcode an item vary in an even more fine-grained scale than that employed in the present study.

### Characterization of Typical and Atypical Development of Transcoding Abilities

Overall, one- and two-digit numbers presented very low error rates in all school grades, regardless of children's mathematics abilities. These results are in line with the literature, which indicates that even kindergartners at risk for mathematics difficulties do not have troubles in transcoding small numbers (Landerl et al., 2004; van Loosbroek et al., 2009), but instead can retrieve the Arabic forms directly from their LTM. Moreover, two-digit numbers also showed very low error rates in all school grades but were responsible for transcoding errors in children with mathematics difficulties only in first graders, but not in higher school grades.

In contrast, three- and four-digit numbers accounted for a large proportion of score variability, with high error rates being observed in the first grade and a steady decrease in higher grades. Interestingly, second grade children without mathematics difficulties showed notable difficulties in transcoding three- and four-digit numbers. In the second grade, children receive the formal instruction necessary for mastering the syntax of these numbers. Moreover, children with mathematics difficulties seem to demand 1 year longer than control children to master the same knowledge. When analyzing the interactions between individual achievement and syntactic complexity in ANOVA models, one observes that control children attending the third and fourth grades barely committed errors in the low complexity items. In turn, children with mathematical difficulties continue to present errors when they achieve the third grade, although the error rates also decrease steadily over time. These findings corroborate previous evidence showing that syntactic complexity has a strong impact on error rates in transcoding tasks (Camos, 2008; Moura et al., 2013; Zuber, Pixner, Moeller, & Nuerk, 2009). A delay in the acquisition of more complex transcoding rules has already been observed in children with mathematics difficulties (Moura et al., 2013), and typically developing children with lower working memory capacity (Camos, 2008). The present results confirmed this delay in the acquisition of transcoding rules observed in children with mathematics difficulties. To which extent these errors are also attributable to reduced working memory capacity has to be investigated in future studies.

The persistence of the effect of syntactic complexity in all grades constitutes strong evidence for the prominent role of rules in elucidating transcoding abilities even in third and fourth grades. Children with mathematics difficulties struggle in learning the more complex transcoding rules, as can be inferred from wrong frame errors (Moura et al., 2013). Wrong frame errors reflect the absence of knowledge of the magnitude intrinsic to each position in the digit sequence, that is, of place–value. Several studies have related the knowledge of place–value syntax with achievement in more complex numerical abilities, such as arithmetics (Mazzocco, Murphy, Brown, Rinne, & Herold, 2013; Moeller, Pixner, Kaufmann, & Nuerk, 2009; Moeller et al., 2011). Together, these pieces of evidence indicate that more abstract levels of numerical representation such as place–value knowledge can be assessed by means of the performance in the transcoding task and reinforces its utility when trying to predict arithmetics abilities of individual children.

Together, the effect of syntactic complexity and the high correlation between error rates and the number of transcoding rules suggest that working memory is an important variable associated to number writing. Working memory capacity has been associated in the transcoding research to storing the verbal string, searching in the LTM for lexical entries, parsing the previously non-acquired strings, and applying the procedural rules (Barrouillet et al., 2004; Lochy & Censabela, 2005). In previous research, we have also found this association between the number of rules and working memory, which suggests an implicit link between the syntactic complexity and working memory skills (Moura et al., 2013). Camos (2008) directly addressed this issue by investigating children with different levels of verbal working memory abilities. This author found a robust association between number of transcoding rules and the number-writing performance as suggested by the ADAPT model. Moreover, Lopes-Silva et al. (2014) showed that the influence of verbal working memory on number transcoding is mediated by phonemic awareness. According to the ADAPT model, phonological encoding is the first step in the number transcoding process. Further evidence suggests that visuospatial working memory capacity may be associated to syntactic transcoding errors related to the unit-decade inversion rule present in languages such as German, Dutch, and Czech (Pixner et al., 2011; Zuber et al., 2009). This indicates that the Arabic number-writing task is theoretically grounded on a cognitive model with high content and construct validity.

### Task Discriminability

For the first time, diagnostic accuracy of Arabic number-writing task was assessed by means of ROC analyses. According to established criteria (Swets, 1988), moderate and high accuracy estimates were observed in the first and second grades, respectively, while in third and fourth grades the accuracy clearly insufficient. Difficulties with number transcoding might remain traceable in higher school grades, but the Arabic number-writing task in its present format is too easy to discriminate mathematics difficulties. It is possible that an adaptation of the task with the inclusion of more complex five- and six-digit numbers would support higher group discriminability. However, it is also possible that the cognitive profile of mathematics difficulties, as measured by the Arabic number-writing task, is not stable over time, so that difficulties experienced in early phases can be, eventually, overcome, and then new difficulties may appear (Geary et al., 2000; Gersten, Jordan, & Flojo, 2005). If this is the case, good discriminability of number transcoding tasks regarding mathematics difficulties may be limited to the first two school grades.

### Future Perspectives

Further research is needed to support the use of the Arabic number-writing task in the clinical–epidemiological and research context. Validity of the Arabic number-writing task as a screening instrument should be established by means of longitudinal studies investigating its power in predicting mathematics learning difficulties. But the Arabic number-writing task also can be useful in the individual clinical assessment and in theoretical research. A further development in the individual assessment context is the design of an automatic algorithm for item generation, which allows the construction of more individualized and adaptive versions of the Arabic number-writing task (e.g., Arendasy, Sommer, & Mayr, 2012). The ADAPT model provides a very valuable basis to generate items in all difficulty levels. The estimates of item difficulty obtained from large-sample studies such as the present one establish the basis for such further developments. Since transcoding tasks combine both diagnostic sensitivity and specificity regarding mathematics achievement with a solid theoretical basis of the cognitive mechanisms driving individual performance, automatic item generation may reveal to be very valuable in the construction of adaptive and flexible instruments best suitable not only to characterize individual performance but also to evaluate the impact of interventions designed to remediate the negative impact of mathematics difficulties on cognition and performance.

### Practical Implications

The good psychometric properties of the Arabic number-writing Task together with its simple administration and consistent theoretical ground make of it a useful tool for assessing basic numerical skills of young children in both clinical and research contexts. The task may provide a quick and cheap way for screening first and second graders at risk of mathematical difficulties both collectively, at school, and individually, in clinical settings. The benefits of the early identification of children with possible major difficulties in mathematics are incommensurable. It enables early intervention efforts, thus minimizing future consequences of low numeracy, such as low incomes and less job opportunities (Bynner & Parsons, 1997).

## Funding

This work was supported by CAPES/DAAD Probral Program, Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, 307006/2008-5, 401232/2009-3), and Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG, APQ-02755-SHA, and APQ-03289-10). VGH is supported by a CNPq fellowship (308157/2011-7). GW is supported by an FWF research project (No. P22577).

## Conflict of Interest

None declared.

## References

*R Foundation for Statistical Computing*, Vienna, Austria. Retrieved November 25, 2014, from http://www.R-project.org/

Report: ED406585. 53pp. Jan 1997