Randomized trial of iReadMore word reading training and brain stimulation in central alexia

Woodhead et al. present results from a randomized trial of a novel reading therapy app (‘iReadMore’) coupled with anodal transcranial direct current stimulation (tDCS) in patients with post-stroke central alexia. Use of iReadMore improves reading accuracy for trained words, while concurrent tDCS facilitates training and improves generalization to untrained stimuli.


Sample Size Calculation
A previous study using a prototype of iReadMore in a group of stroke patients with chronic pure alexia (Woodhead et al., 2013) resulted in an improvement in word reading reaction times of 149.0ms (sd = 214.5ms). This effect size was the change in trained word reading reaction times before (T2) minus after (T3) training. The sample size required to detect a comparable improvement was calculated using an online calculator from https://www.dssresearch.com/KnowledgeCenter/toolkitcalculators/samplesizecalculators.aspx. Using alpha = 5% and beta = 10%, a required sample size of n = 18 was indicated.
The expected effect size resulting from A-tDCS to the left IFG was powered based on a study by Baker and colleagues (2010). This study compared a-tDCS and sham during anomia therapy in 10 patients with chronic aphasia. On average, they observed a 14.4% improvement in picture naming accuracy following a-TDCS, compared to only 6% following sham. The benefit of a-tDCS over sham was 8.4% (s.d. = 10.2). The sample size required to detect a comparable improvement was calculated using the same calculator, again with alpha = 5% and beta = 90%, which indicated a sample size of n=13 would be required.

Training and Testing Word Lists
This word selection process is depicted below in Supplementary Figure 1s.
Baseline Testing (T1/T2): All participants were tested with all 590 words at baseline (split over T1/T2 sessions). The A, B and C lists comprised 180 words each and were matched for psycholinguistic variables. A list of 50 'Core' words with very high written frequency was also included.
Training Word Lists: After T2, a subset of words (150 word triplets) was selected for use in the training. One list was used for Block1, one for Block2, and one was left untrained. The selected items from the A, B and C lists were different for each participant but were matched within-subject for psycholinguistic variables and baseline performance (word reading accuaracy and RT). All 50 Core words were trained in both Block1 and Block2.
Testing Word Lists: A further subset of words (90 word triplets) was selected for use in all subsequent word reading tests (T3-T6). The selected items from the A, B and C lists were different for each participant, but were matched within-subject for psycholinguistic variables and baseline performance. A subset of 30 Core words was also selected for testing at T3-T6. 590 words (180 A, B and C words, plus 50 Core words) were tested at Baseline (across T1 and T2 sessions); from these, subject-specific word lists were selected for training (150 A, B and C words, plus 50 Core words); and from these, subject-specific word lists were selected for testing (90 A, B and C words, and a set list of 30 Core words). 4s iReadMore Training

Training Words and Pictures
The full training corpus of possible target words consisted of a picture representing the word, and a set of up to 9 paired written words to use as easy, medium or hard distractor items for each target word.
'Easy' distractor words shared only the first letter in common with the target word. 'Medium' distractor words shared at least 2 letters in common. 'Hard' distractor words (for words > 3 letters only) shared more than 2 letters in common.
The pictures were colour photos or drawings representing the target word. The representations of low imageability target words were abstract -see examples below. Even if these representations were not immediately understood, they became learnt through repeated exposure to allow pictorial priming of written word recognition.

Examples of low imageability pictures: 'that', 'and', and 'any'."
A customised list of all 200 words to-be-trained (150 words selected from list A, B or C, plus all 50 Core words) was created for each subject in each block. The iReadMore software started with a list of these words in a randomized order.

Exposure Phase
In the first exposure phase, the first 10 items from the list were selected. The order of the word list adjusted in response to the participant's performance in the challenge phase. The 10 words at the top of the list were selected for each subsequent exposure phase.
Each exposure phase comprised 10 trials. Initiation of each trial was self-paced. In each trial, the picture representing the target word was presented, followed by simultaneous presentations of the written and spoken word-forms. Written word duration initially matched the patient's baseline word reading speed, then adapted according to performance in the subsequent challenge phase (see Difficulty Adaptation: Global Parameters below). Stimuli recordings from a female or a male speaker were randomly selected for each trial.

5s Challenge Phase
Challenge phases comprised up to 30 trials (3 repetitions of the 10 target words presented in the preceding exposure phase), but ended when the criterion score was reached. The criterion score adapted according to performance (see Difficulty Adaptation: Global Parameters).
In each trial, a spoken word from the preceding exposure phase was presented with a written word. The written word duration initially matched the patient's baseline word reading speed, then adapted according to performance (see Difficulty Adaptation: Global Parameters). In half the trials the written and spoken stimuli were the same word, and in half they were different. The selection of the distractor item in 'different' trials adapted according to performance on a word-by-word basis (see Difficulty Adaptation: Item-Specific Parameters). Participants made a same/different response via button press and received immediate feedback. Two points were awarded for a fast correct response; one for a slow correct response; and minus one for an incorrect response. The criterion duration for fast and slow responses adapted according to performance (see Difficulty Adaptation: Global Parameters). If the participant reached the criterion score within 30 trials, they passed that level and task difficulty increased in the next exposure phase.

Difficulty Adaptation: Global Parameters
Task difficulty was reflected in three global adaptive parameters: 1) written word duration in exposure and challenge phases; 2) criterion score in the challenge phase; and 3) criterion reaction times for fast/slow correct responses in the challenge phase. All three parameters changed simultaneously when the difficulty level changed. The difficulty level began at 1, then increased incrementally when the participant passed a challenge phase. If the participant failed three successive challenge phases, then the level decreased by one. The formulae for generating the task parameters at each difficulty level ('LEVEL') are shown in the table below. The word duration was initially set to the participant's average word reading RT at baseline ('baseline_RT'). Each parameter had a maximum or minimum boundary that could not be exceeded -once this was reached, the parameter remained constant, but could revert to an easier setting if the difficulty level subsequently reduced.

Difficulty Adaptation: Item-Specific Parameters
Task difficulty was also reflected in two item-specific adaptive parameters: 1) the distractor difficulty level in the same/different task (easy, medium or hard), and 2) the position of the target word in the word list, which affected how soon the word would appear again in a subsequent exposure phase.
The distractor word selected for each 'different' trial in the challenge phase started at the easy level. In each challenge phase, a target word could be presented up to three times, and 0-3 of those trials could be 'different' trials. Distractor difficulty level (easy / medium / hard) in subsequent challenge phases was then adapted according to the following rules:

Parameter (y) Function
Min / max allowed Word duration (ms) y = baseline_RT -2 * LEVEL * baseline_RT / 100 Min = 100 Criterion score y = 20 + 0.5 * LEVEL Max = 56 Fast response criterion (ms) y = 4000 -30 * LEVEL Min = 2000 Slow response criterion (ms) y = 10000 -90 * LEVEL Min = 5000 6s 1) If a DIFFERENT trial appeared once, the distractor difficulty level moved forwards (+1) if the response was correct, or moved backwards (-1) if the response was incorrect). 2) If a DIFFERENT trial appeared more than once, the outcome for all trials was summed, e.g. +1 for each correct trial, and -1 for each incorrect trial. If the summed value was POSITIVE then the difficulty level moved upward, if it was NEGATIVE the difficulty level moved down; and if it was ZERO it stayed the same.
The position of the target word in the word list changed according to performance on all 'same' or 'different' trials for that word in the challenge phase. Each target word would be presented up to three times in each challenge phase. If any one of the trials was responded to incorrectly, the position of the target word in the word list would not change, so that the word would definitely appear in the next exposure phase. If all trials were responded correctly, the change in word position would be calculated by taking the average position change score using the following rules: The result of this position change score meant that words would be presented again soon (e.g. if the average position change score was low) or not for a long time (e.g. if the average position change score was high). The number of times that each target word in the word list was presented therefore depended on performance (see Participant Performance).

Participant Performance
Accuracy in the challenge phase was generally high, with participants answering 90.6% of trials correctly on average (s.d. = 8.4). Performance ranged from 65.0% to 97.4%. Data from all subjects can be seen in Supplementary Table 3s.
Due to the design of the item-specific difficulty adaptation, the number of times each word was presented during training correlated closely with accuracy for that word. On average over all participants and both blocks there were 76.6 presentations of each word per block, but this could vary widely depending on performance.

Central Alexia Subtypes
Participants were classified into central alexia sub-types (Surface dyslexia, Phonological dyslexia and Deep dyslexia) using classification criteria described by Whitworth et al (2014): 7s -Surface dyslexia (Marshall & Newcombe, 1973) was defined according to the presence of a regularity effect (better reading of regular compared to irregular words) and relatively preserved pseudoword reading. Word reading errors in surface dyslexia include regularisation errors (SEW>"Sue") and visual errors (SUBTLE>"Sublet"). -Phonological dyslexia was defined according to the presence of a lexicality effect (better word reading than pseudoword reading) and an imageability effect (better reading of high than low imageability words). Word reading errors include visual and/or semantic errors. -Deep dyslexia was defined according to the presence of lexicality and imageability effects. Word reading errors include purely semantic errors as well as visual and/or semantic errors.
Word reading errors on the full corpus of words tested at baseline were coded as phonological (including purely phonological errors, SEW→'sue'; and visual and/or phonological errors, DOOR→ 'doom'); semantic errors, (including purely semantic errors, APE→'monkey' and visual and/or semantic errors, CLING→'clasp'); or 'other' errors (including morphological errors, LOVELY →'loving'; and unrelated errors).
Regularity, imageability and length effects on word reading ability were identified using binary logistic regression on each participant's baseline word reading accuracy data. Only regression models that were significant were used for classification purposes: for some participants (P10, P11, P12 and P14) accuracy was very high and there was insufficient variance in the dependent variable to produce a significant model overall. In these cases, a linear regression analysis on word reading RT data was attempted, but none were significant.
Finally, the lexicality effect was determined by comparing overall percentage accuracy on the baseline word and pseudoword reading tests.  *Regression analysis was used to identify the subtype patterns. The accuracy score for these participants were too high to perform the analysis and regression analysis of reaction time data was uninformative.