-
PDF
- Split View
-
Views
-
Cite
Cite
Sheena Reilly, Eileen Cini, Lisa Gold, Sharon Goldfeld, James Law, Penny Levickis, Fiona Mensah, Angela Morgan, Jan M Nicholson, Ha N D Le, Angela Pezic, Bruce Tomblin, Melissa Wake, Louise Wardrop, Data resource profile: The Child LAnguage REpository (CLARE), International Journal of Epidemiology, Volume 47, Issue 3, June 2018, Pages 688–688j, https://doi.org/10.1093/ije/dyy034
Close - Share Icon Share
Data Resource Basics
Oral language is a characteristic that defines the human species. How this ability develops underpins the health, productivity, and social well-being of individuals.1 Whereas most children acquire speech and language skills with relative ease, many do not, placing a sizeable burden on our health, education, social and economic systems.2,3 Considering this, research in the field has been chronically underfunded and fragmented, resulting in evidence gaps, limited research capacity and uncoordinated, poorly informed and often contradictory advice for policy makers and practitioners.4,5 Although language promotion and early intervention are clearly warranted, efforts to understand how and when best to target interventions have been hampered by a lack of appropriate longitudinal data. Only a few international population cohort studies have collected the detailed language measures required for accurate descriptions of the trajectories and outcomes of children’s language phenotyping.5,6 Studies measuring language in depth have been limited by small, non-representative samples, often drawn from clinical populations and/or commencing at preschool or school age,5 and thus missing the critical early years when the foundations for language are established. In addition, little is known about genetic and/or neural underpinnings, that is, the neurobiology of developmental language disorders (DLD).7
In line with major global initiatives,8,9 researchers in Australia, the United Kingdom (UK) and the United States of America (USA) formed a consortium (The Consortium) to advance the understanding of DLD. In 2011, The Consortium received funding from the Australian National Health and Medical Research Council (Centre of Research Excellence in Child Language, #1023493) to establish a repository of child language data (Child LAnguage REpository: CLARE) thereby bringing together existing cohorts from the USA, UK and Australia which employed some common language measures assessed at multiple time points in childhood. Long-term objectives of CLARE were to: (i) characterize typical and disrupted pathways of language development; (ii) examine environmental and biological factors predicting variation in language pathways; (iii) understand how language pathways are related to social, psychological and educational development; (iv) evaluate the direct/indirect costs of DLD to families and the health care, education and welfare systems and society; and (v) identify potential for preventative/therapeutic intervention.
CLARE comprises cohort data from four distinct population groups: Group 1: language-focused studies in children with typical hearing; Group 2: language-focused studies in children with impaired hearing; Group 3: publicly available datasets; and Group 4: child development cohorts with language measures enrichment. Each of the studies included in CLARE are independently managed and have their own scientific agendas and tailored research methodologies, but have a shared interest in child speech and language measures. Most of the cohorts have published project-specific profiles that contain more detail10–13 and most include members of The Consortium.
Data Collected
Dataset production
CLARE contains data from 17 cohort studies in populations that range from approximately 80 to 19 000 in size. Each has its own protocol and institutional ethical approvals, fulfilling the requirements of patient confidentiality and information governance for data linkage and use of biological material in future studies. The data integrity (i.e. version control and cleaning) is managed and maintained by a data manager. Approval to establish CLARE was granted by the Human Research Ethics Committee at the Royal Children’s Hospital (#32261). Master documents detail principles and governance protocols, access to data and data sharing and process issues (see Figure 1); these are based on the International Childhood Cancer Cohort Consortium8 (I4C) guidelines. A Policy, Practice and Implementation Committee was established to guide the policy and clinical relevance of our work (for details see ‘Data resource’ below).
Contents
Studies of interest were identified and data dictionaries were developed and enhanced to ensure consistency across studies. All studies in Groups 1 and 2 include standardized face-to-face assessments of child language in addition to parent-reported measures of language. Each cohort contained a broad range of child, family and environmental factors (e.g. socioeconomic status, maternal education) hypothesized to influence language development, including measures of child behaviour, psychosocial well-being and cognition.
The broad aim, sample size, year and age of recruitment as well as the measurement domains are provided in Table 1. As an example, by 2017 the Early Language in Victoria Study (ELVS)14 has 11 waves of data spanning 8 months to 13 years of age, including parent and teacher reports, child-completed questionnaires (as appropriate) and face-to-face child assessment.
Overview of the studies that comprise CLARE, grouped by study type
| Study name . | Overview of aims . | . | Cohort recruited . | Measurement domains . | Funded until . | Age in 2016 . | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | Sizea . | Year . | Age . | A . | B . | C . | D . | E . | F . | G . | H . | I . | ||||
| Group 1. Language-focused studies (typical hearing) | ||||||||||||||||
| ELVS – Early Language In Victoria Study10,14 |
| 1910 | 2003 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2017 | 13 | |
| L4L – Language for Learning15 |
| 1400 | 2007 | 1–2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2011 | 10 | |
| MM - Memory Maestros16 |
| 1700 | 2012 | 6–7 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2015 | 11–12 | |
| Longitudinal Study of Children with Specific Language Impairment (USA)17 |
| 604 | 1995 | 5–6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | ✓ | 2016 | 26–27 | |
| Group 2. Language-focused studies (hearing impaired) | ||||||||||||||||
| CHIVOS – Children with Hearing Impairment Outcomes18 |
| 80 | 1999 | 7–8a | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2011 | 23–24 | |
| SCOUT – Statewide Comparison of Outcomes of Hearing Loss19 |
| 120 | 2009 | 5–6a | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2008 | 14 | |
| VicCHILD – Victorian Childhood Hearing Impairment Longitudinal Databank20,21 |
| 525–570 | 2011 | 0–18b | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2016 | 0–23 | |
| OCHL – Outcomes of Children with Hearing Loss22,23 |
| 421 | 2008 | 0–6c | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | ✓ | 2011 | 8–14 | |
| Group 3. Publicly available datasets | ||||||||||||||||
| LSAC – The Longitudinal Study of Australian Children24 |
| Infant cohort | 5107 | 2004 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2018 | 14–15 |
| Child cohort | 4983 | 2004 | 4–5 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2018 | 18–19 | ||
| MCS – Millennium Cohort Study (UK)25 |
| 19 000 | 2001 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | x | x | x | x | 2016 | 15 | |
| LSAC Child Health Checkpoint26 |
| 1875 | 2014 | 11–12 | ✓ | ✓ | ✓ | x | ✓ | ✓ | ✓ | ✓ | x | 2020 | 12–13 | |
| Group 4. Longitudinal child development studies | ||||||||||||||||
| BIS – Barwon Infant Study27 |
| 1074 | 2000 | 4 | ✓ | x | x | ✓ | x | ✓ | ✓ | ✓ | ✓ | 2017 | 6 | |
| VIHCS – The Victorian Intergenerational Health Cohort Study28 |
| 1026 | 2006–14 | 8 | ✓ | x | ✓ | x | ✓ | x | x | ✓ | x | 2018 | 8 | |
| CPOL – Classroom Promotion of Oral Language29 |
| 1368 | 2013 | 5 | x | ✓ | ✓ | x | x | x | x | ✓ | x | 2017 | 7–8 | |
| CAP – Children’s Attention Project30 |
| 600 | 2011 | 6–8 | x | ✓ | ✓ | ✓ | ✓ | x | x | ✓ | x | 2015 | 10–12 | |
| EHLS – Early Home Learning Study31 |
| 2228 | 2008 | 6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2019 | 7 | |
| MHS – Maternal Health Study32 |
| 1500 | 2003–05 | Pre-birth | ✓ | x | ✓ | x | x | x | x | x | x | 2017 | 10 | |
| Study name . | Overview of aims . | . | Cohort recruited . | Measurement domains . | Funded until . | Age in 2016 . | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | Sizea . | Year . | Age . | A . | B . | C . | D . | E . | F . | G . | H . | I . | ||||
| Group 1. Language-focused studies (typical hearing) | ||||||||||||||||
| ELVS – Early Language In Victoria Study10,14 |
| 1910 | 2003 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2017 | 13 | |
| L4L – Language for Learning15 |
| 1400 | 2007 | 1–2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2011 | 10 | |
| MM - Memory Maestros16 |
| 1700 | 2012 | 6–7 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2015 | 11–12 | |
| Longitudinal Study of Children with Specific Language Impairment (USA)17 |
| 604 | 1995 | 5–6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | ✓ | 2016 | 26–27 | |
| Group 2. Language-focused studies (hearing impaired) | ||||||||||||||||
| CHIVOS – Children with Hearing Impairment Outcomes18 |
| 80 | 1999 | 7–8a | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2011 | 23–24 | |
| SCOUT – Statewide Comparison of Outcomes of Hearing Loss19 |
| 120 | 2009 | 5–6a | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2008 | 14 | |
| VicCHILD – Victorian Childhood Hearing Impairment Longitudinal Databank20,21 |
| 525–570 | 2011 | 0–18b | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2016 | 0–23 | |
| OCHL – Outcomes of Children with Hearing Loss22,23 |
| 421 | 2008 | 0–6c | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | ✓ | 2011 | 8–14 | |
| Group 3. Publicly available datasets | ||||||||||||||||
| LSAC – The Longitudinal Study of Australian Children24 |
| Infant cohort | 5107 | 2004 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2018 | 14–15 |
| Child cohort | 4983 | 2004 | 4–5 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2018 | 18–19 | ||
| MCS – Millennium Cohort Study (UK)25 |
| 19 000 | 2001 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | x | x | x | x | 2016 | 15 | |
| LSAC Child Health Checkpoint26 |
| 1875 | 2014 | 11–12 | ✓ | ✓ | ✓ | x | ✓ | ✓ | ✓ | ✓ | x | 2020 | 12–13 | |
| Group 4. Longitudinal child development studies | ||||||||||||||||
| BIS – Barwon Infant Study27 |
| 1074 | 2000 | 4 | ✓ | x | x | ✓ | x | ✓ | ✓ | ✓ | ✓ | 2017 | 6 | |
| VIHCS – The Victorian Intergenerational Health Cohort Study28 |
| 1026 | 2006–14 | 8 | ✓ | x | ✓ | x | ✓ | x | x | ✓ | x | 2018 | 8 | |
| CPOL – Classroom Promotion of Oral Language29 |
| 1368 | 2013 | 5 | x | ✓ | ✓ | x | x | x | x | ✓ | x | 2017 | 7–8 | |
| CAP – Children’s Attention Project30 |
| 600 | 2011 | 6–8 | x | ✓ | ✓ | ✓ | ✓ | x | x | ✓ | x | 2015 | 10–12 | |
| EHLS – Early Home Learning Study31 |
| 2228 | 2008 | 6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2019 | 7 | |
| MHS – Maternal Health Study32 |
| 1500 | 2003–05 | Pre-birth | ✓ | x | ✓ | x | x | x | x | x | x | 2017 | 10 | |
Measurement domains: A = vocabulary; B = literacy; C = expressive and receptive language; D = matrices/block design; E = social/behavioural outcomes; F = quality of life; G = medicare health service use/cost; H = NAPLAN test scores for literacy/numeracy; I = biological material for DNA extraction. RCT, randomized controlled trial; ADHD, attention-deficit hyperactivity disorder.
Extensive prospectively collected, prior information also available.
VicCHILD Databank has ongoing recruitment, with a sample size of 525 and anticipated sample size of 570 by 2017; combination of prospectively and retrospectively recruited samples.
OCHL has ongoing recruitment, with anticipated sample size of 421 by 2015.
Overview of the studies that comprise CLARE, grouped by study type
| Study name . | Overview of aims . | . | Cohort recruited . | Measurement domains . | Funded until . | Age in 2016 . | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | Sizea . | Year . | Age . | A . | B . | C . | D . | E . | F . | G . | H . | I . | ||||
| Group 1. Language-focused studies (typical hearing) | ||||||||||||||||
| ELVS – Early Language In Victoria Study10,14 |
| 1910 | 2003 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2017 | 13 | |
| L4L – Language for Learning15 |
| 1400 | 2007 | 1–2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2011 | 10 | |
| MM - Memory Maestros16 |
| 1700 | 2012 | 6–7 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2015 | 11–12 | |
| Longitudinal Study of Children with Specific Language Impairment (USA)17 |
| 604 | 1995 | 5–6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | ✓ | 2016 | 26–27 | |
| Group 2. Language-focused studies (hearing impaired) | ||||||||||||||||
| CHIVOS – Children with Hearing Impairment Outcomes18 |
| 80 | 1999 | 7–8a | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2011 | 23–24 | |
| SCOUT – Statewide Comparison of Outcomes of Hearing Loss19 |
| 120 | 2009 | 5–6a | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2008 | 14 | |
| VicCHILD – Victorian Childhood Hearing Impairment Longitudinal Databank20,21 |
| 525–570 | 2011 | 0–18b | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2016 | 0–23 | |
| OCHL – Outcomes of Children with Hearing Loss22,23 |
| 421 | 2008 | 0–6c | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | ✓ | 2011 | 8–14 | |
| Group 3. Publicly available datasets | ||||||||||||||||
| LSAC – The Longitudinal Study of Australian Children24 |
| Infant cohort | 5107 | 2004 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2018 | 14–15 |
| Child cohort | 4983 | 2004 | 4–5 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2018 | 18–19 | ||
| MCS – Millennium Cohort Study (UK)25 |
| 19 000 | 2001 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | x | x | x | x | 2016 | 15 | |
| LSAC Child Health Checkpoint26 |
| 1875 | 2014 | 11–12 | ✓ | ✓ | ✓ | x | ✓ | ✓ | ✓ | ✓ | x | 2020 | 12–13 | |
| Group 4. Longitudinal child development studies | ||||||||||||||||
| BIS – Barwon Infant Study27 |
| 1074 | 2000 | 4 | ✓ | x | x | ✓ | x | ✓ | ✓ | ✓ | ✓ | 2017 | 6 | |
| VIHCS – The Victorian Intergenerational Health Cohort Study28 |
| 1026 | 2006–14 | 8 | ✓ | x | ✓ | x | ✓ | x | x | ✓ | x | 2018 | 8 | |
| CPOL – Classroom Promotion of Oral Language29 |
| 1368 | 2013 | 5 | x | ✓ | ✓ | x | x | x | x | ✓ | x | 2017 | 7–8 | |
| CAP – Children’s Attention Project30 |
| 600 | 2011 | 6–8 | x | ✓ | ✓ | ✓ | ✓ | x | x | ✓ | x | 2015 | 10–12 | |
| EHLS – Early Home Learning Study31 |
| 2228 | 2008 | 6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2019 | 7 | |
| MHS – Maternal Health Study32 |
| 1500 | 2003–05 | Pre-birth | ✓ | x | ✓ | x | x | x | x | x | x | 2017 | 10 | |
| Study name . | Overview of aims . | . | Cohort recruited . | Measurement domains . | Funded until . | Age in 2016 . | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| . | Sizea . | Year . | Age . | A . | B . | C . | D . | E . | F . | G . | H . | I . | ||||
| Group 1. Language-focused studies (typical hearing) | ||||||||||||||||
| ELVS – Early Language In Victoria Study10,14 |
| 1910 | 2003 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2017 | 13 | |
| L4L – Language for Learning15 |
| 1400 | 2007 | 1–2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2011 | 10 | |
| MM - Memory Maestros16 |
| 1700 | 2012 | 6–7 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2015 | 11–12 | |
| Longitudinal Study of Children with Specific Language Impairment (USA)17 |
| 604 | 1995 | 5–6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | ✓ | 2016 | 26–27 | |
| Group 2. Language-focused studies (hearing impaired) | ||||||||||||||||
| CHIVOS – Children with Hearing Impairment Outcomes18 |
| 80 | 1999 | 7–8a | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2011 | 23–24 | |
| SCOUT – Statewide Comparison of Outcomes of Hearing Loss19 |
| 120 | 2009 | 5–6a | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2008 | 14 | |
| VicCHILD – Victorian Childhood Hearing Impairment Longitudinal Databank20,21 |
| 525–570 | 2011 | 0–18b | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 2016 | 0–23 | |
| OCHL – Outcomes of Children with Hearing Loss22,23 |
| 421 | 2008 | 0–6c | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | ✓ | 2011 | 8–14 | |
| Group 3. Publicly available datasets | ||||||||||||||||
| LSAC – The Longitudinal Study of Australian Children24 |
| Infant cohort | 5107 | 2004 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2018 | 14–15 |
| Child cohort | 4983 | 2004 | 4–5 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2018 | 18–19 | ||
| MCS – Millennium Cohort Study (UK)25 |
| 19 000 | 2001 | 0–1 | ✓ | ✓ | ✓ | ✓ | ✓ | x | x | x | x | 2016 | 15 | |
| LSAC Child Health Checkpoint26 |
| 1875 | 2014 | 11–12 | ✓ | ✓ | ✓ | x | ✓ | ✓ | ✓ | ✓ | x | 2020 | 12–13 | |
| Group 4. Longitudinal child development studies | ||||||||||||||||
| BIS – Barwon Infant Study27 |
| 1074 | 2000 | 4 | ✓ | x | x | ✓ | x | ✓ | ✓ | ✓ | ✓ | 2017 | 6 | |
| VIHCS – The Victorian Intergenerational Health Cohort Study28 |
| 1026 | 2006–14 | 8 | ✓ | x | ✓ | x | ✓ | x | x | ✓ | x | 2018 | 8 | |
| CPOL – Classroom Promotion of Oral Language29 |
| 1368 | 2013 | 5 | x | ✓ | ✓ | x | x | x | x | ✓ | x | 2017 | 7–8 | |
| CAP – Children’s Attention Project30 |
| 600 | 2011 | 6–8 | x | ✓ | ✓ | ✓ | ✓ | x | x | ✓ | x | 2015 | 10–12 | |
| EHLS – Early Home Learning Study31 |
| 2228 | 2008 | 6 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | x | 2019 | 7 | |
| MHS – Maternal Health Study32 |
| 1500 | 2003–05 | Pre-birth | ✓ | x | ✓ | x | x | x | x | x | x | 2017 | 10 | |
Measurement domains: A = vocabulary; B = literacy; C = expressive and receptive language; D = matrices/block design; E = social/behavioural outcomes; F = quality of life; G = medicare health service use/cost; H = NAPLAN test scores for literacy/numeracy; I = biological material for DNA extraction. RCT, randomized controlled trial; ADHD, attention-deficit hyperactivity disorder.
Extensive prospectively collected, prior information also available.
VicCHILD Databank has ongoing recruitment, with a sample size of 525 and anticipated sample size of 570 by 2017; combination of prospectively and retrospectively recruited samples.
OCHL has ongoing recruitment, with anticipated sample size of 421 by 2015.
Additional language and literacy measures
Many longitudinal population cohorts either do not include measures of child language or typically measure only one very brief component of language (often vocabulary). This is because it takes a minimum of 40–60 min to administer a full-scale language assessment, and many tests require specialist administration—prohibitive requirements in large generic studies that typically have only around 90 min to collect data across all dimensions of health, well-being, development and social domains. CLARE enabled The Consortium to explore whether children likely to have DLD could be identified using a short-form language measure that could be included in population cohorts. We overcame logistic and time constraints by placing existing, validated, adaptive vocabulary [NIH Toolbox Picture Vocabulary Test (TPVT)]33 and core language tasks [CELF-4 and Recalling Sentences subtest (CELF RS)34 and Children’s Test of Nonword Repetition (CN Rep)]35 into iPad vehicles, which meant language could be captured by specialist staff in around 8 min. Receiver operating characteristic curves against full-scale language assessment were very encouraging, and the short language measure is now embedded in eight population studies (Table 2) of child development comprising almost 5000 children aged from 5 to 12 years.
Data harmonization of language and literacy measures for new and existing studies
| . | ELVS . | MM . | CheckPoint . | VicCHILD . | BIS . | CAP . | L4L . | VIHCS . |
|---|---|---|---|---|---|---|---|---|
| CELF Full | ✓ (4, 5, 7, 11) | ✓ (5–7, 10–12) | ✓ (4, 5, 9) | |||||
| CELF RSa | ✓ (8–9) | ✓ (11) | ✓ (8, 9) | |||||
| CELF Screener | ✓ (7, 10) | |||||||
| CN Rep | ✓ (11) | ✓ (8–9) | ✓ (11) | ✓ (5–7) | ✓(4) | ✓ (8, 9) | ||
| NAPLAN | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| TPVT | ✓ (11) | ✓ (8–9) | ✓ (11) | ✓ (5–7, 10–12) | ✓ (4) | ✓ (9) | ✓ (8, 9) | |
| NPVT | ✓ (5, 7) | ✓ (5, 7, 9) | ✓ (4) | |||||
| SPAT | ✓ (4, 5) | |||||||
| WASI Vocab | ✓ (5) | |||||||
| WRAT | ✓ (7) | ✓ (8–9) | ✓ (7, 10) | |||||
| Year collection was completed | 2017 | 2014 | 2015 | 2016 | 2016 | 2015 | 2016 | 2016 |
| . | ELVS . | MM . | CheckPoint . | VicCHILD . | BIS . | CAP . | L4L . | VIHCS . |
|---|---|---|---|---|---|---|---|---|
| CELF Full | ✓ (4, 5, 7, 11) | ✓ (5–7, 10–12) | ✓ (4, 5, 9) | |||||
| CELF RSa | ✓ (8–9) | ✓ (11) | ✓ (8, 9) | |||||
| CELF Screener | ✓ (7, 10) | |||||||
| CN Rep | ✓ (11) | ✓ (8–9) | ✓ (11) | ✓ (5–7) | ✓(4) | ✓ (8, 9) | ||
| NAPLAN | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| TPVT | ✓ (11) | ✓ (8–9) | ✓ (11) | ✓ (5–7, 10–12) | ✓ (4) | ✓ (9) | ✓ (8, 9) | |
| NPVT | ✓ (5, 7) | ✓ (5, 7, 9) | ✓ (4) | |||||
| SPAT | ✓ (4, 5) | |||||||
| WASI Vocab | ✓ (5) | |||||||
| WRAT | ✓ (7) | ✓ (8–9) | ✓ (7, 10) | |||||
| Year collection was completed | 2017 | 2014 | 2015 | 2016 | 2016 | 2015 | 2016 | 2016 |
NAPLAN (National Assessment Program – Literacy and Numeracy) is the national annual assessment all students take in Grades 3, 5, 7 and 9. NAPLAN is made up of tests of reading, writing, written language conventions (spelling, grammar and punctuation) and numeracy.
Numbers in brackets indicate the ages at which particular tests were collected.
CELF-4 Recalling Sentences subtest.
Data harmonization of language and literacy measures for new and existing studies
| . | ELVS . | MM . | CheckPoint . | VicCHILD . | BIS . | CAP . | L4L . | VIHCS . |
|---|---|---|---|---|---|---|---|---|
| CELF Full | ✓ (4, 5, 7, 11) | ✓ (5–7, 10–12) | ✓ (4, 5, 9) | |||||
| CELF RSa | ✓ (8–9) | ✓ (11) | ✓ (8, 9) | |||||
| CELF Screener | ✓ (7, 10) | |||||||
| CN Rep | ✓ (11) | ✓ (8–9) | ✓ (11) | ✓ (5–7) | ✓(4) | ✓ (8, 9) | ||
| NAPLAN | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| TPVT | ✓ (11) | ✓ (8–9) | ✓ (11) | ✓ (5–7, 10–12) | ✓ (4) | ✓ (9) | ✓ (8, 9) | |
| NPVT | ✓ (5, 7) | ✓ (5, 7, 9) | ✓ (4) | |||||
| SPAT | ✓ (4, 5) | |||||||
| WASI Vocab | ✓ (5) | |||||||
| WRAT | ✓ (7) | ✓ (8–9) | ✓ (7, 10) | |||||
| Year collection was completed | 2017 | 2014 | 2015 | 2016 | 2016 | 2015 | 2016 | 2016 |
| . | ELVS . | MM . | CheckPoint . | VicCHILD . | BIS . | CAP . | L4L . | VIHCS . |
|---|---|---|---|---|---|---|---|---|
| CELF Full | ✓ (4, 5, 7, 11) | ✓ (5–7, 10–12) | ✓ (4, 5, 9) | |||||
| CELF RSa | ✓ (8–9) | ✓ (11) | ✓ (8, 9) | |||||
| CELF Screener | ✓ (7, 10) | |||||||
| CN Rep | ✓ (11) | ✓ (8–9) | ✓ (11) | ✓ (5–7) | ✓(4) | ✓ (8, 9) | ||
| NAPLAN | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| TPVT | ✓ (11) | ✓ (8–9) | ✓ (11) | ✓ (5–7, 10–12) | ✓ (4) | ✓ (9) | ✓ (8, 9) | |
| NPVT | ✓ (5, 7) | ✓ (5, 7, 9) | ✓ (4) | |||||
| SPAT | ✓ (4, 5) | |||||||
| WASI Vocab | ✓ (5) | |||||||
| WRAT | ✓ (7) | ✓ (8–9) | ✓ (7, 10) | |||||
| Year collection was completed | 2017 | 2014 | 2015 | 2016 | 2016 | 2015 | 2016 | 2016 |
NAPLAN (National Assessment Program – Literacy and Numeracy) is the national annual assessment all students take in Grades 3, 5, 7 and 9. NAPLAN is made up of tests of reading, writing, written language conventions (spelling, grammar and punctuation) and numeracy.
Numbers in brackets indicate the ages at which particular tests were collected.
CELF-4 Recalling Sentences subtest.
The Consortium was active in enriching existing cohorts by including additional measures of child language and collecting additional neurobiological data, so as to facilitate cross-cohort comparative analyses and to enable pooling across studies with harmonized data (Figure 1). A key variables document was created, detailing all of the measures collected across each of the studies, which enabled a quick view of measure harmony across studies for the first time.
Data linkage
For the Australian studies, data linkage is planned/in progress with national datasets including: (i) Medicare and Pharmaceutical Benefits Scheme: universal insurance for health care for Australian citizens and permanent residents provides access to a range of medical services, lower cost prescriptions and free care as a public patient in a public hospital; and (ii) the Australian National Assessment Program – Literacy and Numeracy (NAPLAN) [http://www.nap.edu.au/naplan/naplan.html]: annual assessment for students in Years 3, 5, 7 and 9, made up of four domains including reading, writing, language conventions (spelling, grammar and punctuation) and numeracy. It is anticipated that additional datasets will be added to CLARE subject to funding.
Biological and imaging data
A long-term goal of CLARE is to have large cohorts with harmonized community child language measures that are enriched by neurobiological data that provide a clear and consistent foundation for future examinations of biological influences (i.e. phenotyping) on children’s language pathways. CLARE currently contains a DNA biorepository of more than 5900 well-phenotyped participants (see Table 3) among whom next generation sequencing methods, either whole exome or whole genome sequencing, can be conducted. These large samples are necessary so there is adequate power to detect real findings. In CLARE we aim to be primed for other global initiatives, to pool language and DNA data on a large scale to answer questions about the biology of language.
Data biorepository of DNA language samples for new and existing studies
| . | ELVS . | MM . | CheckPoint . | VicCHILD . | BIS . | CAP . | L4L . | VIHCS . |
|---|---|---|---|---|---|---|---|---|
| # DNA language samples collected | 575 | 780 | 2988 | 110 | 500 | 490 | 373 | 150a |
| DNA type | Saliva | Saliva | Saliva Multiple others | Saliva/Buccalb | Saliva Multiple others | Saliva | Saliva | Buccal |
| . | ELVS . | MM . | CheckPoint . | VicCHILD . | BIS . | CAP . | L4L . | VIHCS . |
|---|---|---|---|---|---|---|---|---|
| # DNA language samples collected | 575 | 780 | 2988 | 110 | 500 | 490 | 373 | 150a |
| DNA type | Saliva | Saliva | Saliva Multiple others | Saliva/Buccalb | Saliva Multiple others | Saliva | Saliva | Buccal |
500 by mid-2020 (50 per 6-month period).
Under 5 s given buccal swab, over 5 s given spit pots.
Data biorepository of DNA language samples for new and existing studies
| . | ELVS . | MM . | CheckPoint . | VicCHILD . | BIS . | CAP . | L4L . | VIHCS . |
|---|---|---|---|---|---|---|---|---|
| # DNA language samples collected | 575 | 780 | 2988 | 110 | 500 | 490 | 373 | 150a |
| DNA type | Saliva | Saliva | Saliva Multiple others | Saliva/Buccalb | Saliva Multiple others | Saliva | Saliva | Buccal |
| . | ELVS . | MM . | CheckPoint . | VicCHILD . | BIS . | CAP . | L4L . | VIHCS . |
|---|---|---|---|---|---|---|---|---|
| # DNA language samples collected | 575 | 780 | 2988 | 110 | 500 | 490 | 373 | 150a |
| DNA type | Saliva | Saliva | Saliva Multiple others | Saliva/Buccalb | Saliva Multiple others | Saliva | Saliva | Buccal |
500 by mid-2020 (50 per 6-month period).
Under 5 s given buccal swab, over 5 s given spit pots.
Additionally, we adopted a selective approach to magnetic resonance imaging (MRI) of the brain, assembling an affordable subset of experimentally defined cases, selected for specific phenotypic traits, and controls. A strength of our neuroimaging data is that we can confirm that our controls have had typical language development from infancy (based on the longitudinal language data). A challenge in building the MRI repository data concerns the different data acquisition methods and scanners, which can be highly variable. The development of new methods is so rapid in this field that agreed-upon protocols are often superseded within short periods of time.
Data Resource Use
Informing policy and practice
In 2013 the Australian Senate, referring the matter to the Community Affairs References Committee for inquiry and report, called for submissions on the prevalence of different types of speech, language and communication disorders and speech pathology services in Australia. Data from CLARE were used to prepare a submission that particularly focused on language trajectories, late talking and the current demand for speech pathology services across the country, designed to understand the problem and frame an appropriate response.36 A number of The Consortium’s recommendations, made both in the submission37 and the appearance before the Senate Committee (11 June 2014), were included in the final report.38
A Policy and Practice Implementation Committee of researchers, policy makers and practitioners from health and education came together to produce a series of evidence-informed Research Snapshots that are succinct, easy-to-read publications aimed at stimulating informed debate about specific topics. The Snapshot series present the research under four headings: (i) Why is the issue important? (ii) What does the research tell us? (iii) What are the implications of the research? and (iv) Considerations for policy and programmes. Practice and Policy Briefs have also been produced, and the full complement of resources are available online: [https://www.mcri.edu.au/research/centres/centre-research-excellence-child-language].
Debate about specific language impairment (SLI)
Using CLARE data, investigators systematically tested whether commonly used exclusionary criteria for DLD identified a group of children with SLI in three CLARE studies: the ELVS (Australia),14 the Longitudinal Study of Children with Specific Language Impairment (USA)17 and the Millennium Cohort Study (UK).25 These included the controversial discrepancy between verbal and non-verbal performance, and other exclusion criteria such as social disadvantage (see Table 1). In a paper entitled ‘Specific Language Impairment: a convenient label for whom?’,39 it was reported that none of these factors were useful criteria in determining child DLD. This has important clinical and policy implications because a conceptual distinction between SLI and DLD was not supported. DLD is often purported to be associated with environmental factors such as disadvantage, so children from disadvantaged backgrounds are traditionally excluded from studies of SLI, suggesting that the problem is only specific to language. However, the definition of SLI is based on arbitrary and untested cut-points and the exclusionary criteria are not well defined. SLI were considered to be intrinsic to the child (biological, psycho-linguistic or genetic) rather than the environment. We identified a sharp social gradient in language outcomes amongst 5-year-old children in each of the three studies,39 but no evidence for a distinct level of social disadvantage that conferred a clear risk of DLD or SLI. Our paper recommended the discontinuation of use of the term SLI and the exclusionary criteria and was the International Journal of Language and Communication Disorders’s most downloaded paper in 2014.
Measuring the cost of language impairment
The short and long-term costs of DLD rely on estimates largely from adults with communication impairment,40 and retrospective modelling showing that every dollar invested in speech and language therapy yielded a 6-fold increase in life-time earnings.41 In CLARE, using a combination of parent and teacher reports of service use, research records and linked Medicare records, we have demonstrated that DLD in children aged 4–9 years of age (analysis currently being extended up to 13 years) was associated with higher use of services and costs to both families and government compared with children with typical language.3 Current analyses examine the increased risk of psychological and emotional distress to individuals and their family or carers, and the impact of DLD on child- and/or parent-reported health-related quality of life (HRQOL) using validated generic utility-based measures, including the Health Utilities Index Mark 3 (HUI3), the Child Health Utility 9 D (CHU9D) and the EQ-5D as well as non-utility-based instruments such as the Pediatric Quality of Life Inventory (PedsQL).
Clinical review and recommendations for treatment
CLARE has also provided evidence and evidence review to demonstrate that treatment can be effective for improving expressive, but not receptive, language problems, that it is effective for speech disorders after 3 years of age, and is effective for major distressing stuttering that has lasted more than 12 months.42 As a result, investigators were able to provide specific recommendations for general practitioners and referral pathways for help-seeking parents,42 and subsequent invitations included an article in the Oxford Handbook of Linguistics and an annual publication of Recent Advances in Paediatrics43 which is circulated throughout North America, Europe and Asia with an audience comprising general paediatricians. A comprehensive list of publications is maintained on the Centre of Research Excellence: Child Language website [https://www.mcri.edu.au/research/centres/centre-research-excellence-child-language].
Strengths and Weaknesses
Strengths
The main strength of CLARE lies in the assembly of anonymized data from large population-based language-focused studies to address a broad range of research questions that could not be answered by any individual study alone. CLARE has also facilitated complex analyses not previously possible, including the ability to rapidly replicate findings in one or more cohorts,44,45 to compare studies internationally and to link data to publicly available datasets to address questions about health care use and costs. A unique aspect of CLARE is the inclusion of data collected directly from the child, rather than reported by the parent or teacher.
The future focus of CLARE will partly be the neurobiological underpinnings of DLD. The collection of DNA from participants in individual community cohorts may never have been considered worthwhile; however, CLARE offers new opportunities to explore the biological contribution to language pathways.
In the longer term it is also the intention of CLARE to increase the use and re-use of existing data by providing researchers with opportunities to propose and explore language-oriented research provided it is in line with individual study intentions and consents. This has the added benefit of limiting data wastage. For example, we have recently pooled individual participant data across hearing and deaf cohort studies to seek evidence for or against a threshold of hearing, or inflection point, beyond which language begins to fall behind.46
Benefits that were not necessarily anticipated include the opportunities to harmonize and share database and questionnaire design, data collection protocols, measurement and cohort maintenance strategies. This has resulted in better data as well as substantial reductions in the planning process, data collection time and cost. An increased awareness of and access to similar datasets is facilitating the ability to undertake parallel studies that are particularly useful where the primary study lacks sufficient data to examine rarer subgroups or conditions.
Weaknesses
Whereas the CLARE repository brings together data from various cohort studies, it shares many of their challenges. These include participant consent for personal data to be shared across studies and linked to government datasets, which has been collected from most individuals; however, participants enrolled in some of the early studies are not able to be included in CLARE due to ethics requiring explicit re-consent. In turn, there has been some data loss from participants who withdrew or were lost to follow-up.
Additional challenges apply to the CLARE repository including changes over time in personnel and study protocols, standardization of study data specifications and historical data collection, procedures and instruments, institutional requirements surrounding access and security and data from national cohorts not able to be physically located within the CLARE database.
Data Resource Access
The CLARE Steering Committee will consider formal requests for access to the data, bio samples and/or establishment of collaborative projects, which will be assessed for feasibility and potential overlap with ongoing work; contact [cre.cl@mcri.edu.au]. An alternative option is to establish contact and/or collaboration with a principal investigator who has considerable expertise concerning the data, accessibility, regulatory requirements and associated limitations. Further detail of CLARE can be obtained from the website [https://www.mcri.edu.au/research/centres/centre-research-excellence-child-language], including information about the researchers on the project and their contact details, as well as information about the cohorts within CLARE. The website includes references to peer-reviewed publications and brief Research Snapshots together with current news and information.
In addition to approval from the CLARE Steering Committee, it is the responsibility of the individual researcher to apply for data licenses by contacting the chief investigators from individual studies with project proposals and requests for data access. Approval by an ethics review board is also required for studies involving CLARE data, and only anonymized data can be provided on the harmonized measures across studies.
CLARE contains an unprecedented scale and wealth of data pertaining to language measures required for accurate description and evaluation of children’s language phenotyping.
Established in 2011, it contains harmonized, community-based data from language-focused cohorts in children with: (i) typical hearing; (ii) impaired hearing; (iii) publicly available datasets containing a measure of child language; and (iv) child development cohorts (with enriched language measures).
Data from 17 cohort studies are included, describing factors hypothesized to affect language development (child behaviour, psychosocial well-being, cognition and a broad range of child, family and environmental factors) and repeated measures of language including direct assessment and/or parent report.
Cohorts range from 80 to 19 000 in population size, and age ranges from birth up to 19 years.
CLARE aids complex analyses not previously possible, including an ability to rapidly compare large cohorts, link data across cohorts to address questions about health care utilization and cost and, through the enrichment of DNA samples, presents a unique opportunity to explore the neurobiological underpinnings of language disorders.
Contact the CLARE Steering Committee [cre.cl@mcri.edu.au] to enquire about data access and/or collaboration. Further details about CLARE can be found via the study website: [https://www.mcri.edu.au/research/centres/centre-research-excellence-child-language].
Funding
This work was supported by various National Health and Medical Research Council (NHMRC) grants including: Centre of Research Excellence Grant (1023493); project grants (607407, 1041947, 237106, 436958, 1005317, 491228); fellowship schemes (607315 for A.M., 1041892 for S.R., 1046518 for M.W., 1037449 and 1111160 for F.M., 1082922 for S.G.). J.M.N. is supported by the Australian Communities Foundation funding for the Transition to Contemporary Parenthood Program (Coronella sub-fund). Research at the Murdoch Children’s Research Institute is supported by the Victorian Government’s Operational Infrastructure Support Program.
Acknowledgements
The authors thank the participants, their families and teachers who contributed valuable data to the individual studies in CLARE. We acknowledge the generosity of the investigators involved in the individual studies and the contributions made by CLARE members including postdoctoral fellows, PhD students and administrative staff. Thanks to Dr Michelle Krahe for her invaluable comments and contributions in preparing this manuscript.
Conflict of interest: None declared.
