Genome-wide association study for morphological, phenological, quality, and yield traits in einkorn (Triticum monococcum L. subsp. monococcum)

Abstract Einkorn (Triticum monococcum L. subsp. monococcum, 2n = 2× = 14, AmAm) is a diploid wheat whose cultivation was widespread in the Mediterranean and European area till the Bronze Age, before it was replaced by the more productive durum and bread wheats. Although scarcely cultivated nowadays, it has gained renewed interest due to its relevant nutritional properties and as source of genetic diversity for crop breeding. However, the molecular basis of many traits of interest in einkorn remain still unknown. A panel of 160 einkorn landraces, from different parts of the distribution area, was characterized for several phenotypic traits related to morphology, phenology, quality, and yield for 4 years in two locations. An approach based on co-linearity with the A genome of bread wheat, supported also by that with Triticum urartu genome, was exploited to perform association mapping, even without an einkorn anchored genome. The association mapping approach uncovered numerous marker-trait associations; for 37 of these, a physical position was inferred by homology with the bread wheat genome. Moreover, numerous associated regions were also assigned to the available T. monococcum contigs. Among the intervals detected in this work, three overlapped with regions previously described as involved in the same trait, while four other regions were localized in proximity of loci previously described and presumably refer to the same gene/QTL. The remaining associated regions identified in this work could represent a novel and useful starting point for breeding approaches to improve the investigated traits in this neglected species.


Introduction
Plant breeding has significantly improved the production of many crops but has often exerted a bottleneck effect on the genetic diversity of several characters (Bed} o and Lá ng 2015; Louwaars 2018). Modern germplasm, such as wheat, is highly unbalanced compared to the ancestral one found in landraces due to recent selection and spread. The characterization of these genetic resources is thus strategic to better exploit them efficiently into pre-breeding programs (Balfourier et al. 2019). The exploitation of ancient landraces or wild species requires in-depth phenotypic and genetic characterization to detect superior phenotypes and to identify their underlying genes and/or linked genetic markers Zhou et al. 2017;Garcia et al. 2019;Rahimi et al. 2019;Kumar et al. 2020;Sansaloni et al. 2020). Germplasm banks currently host and preserve extensive collections of cultivars, landraces, and wild relatives of most crops, which are an excellent source of genetic variability for allele mining of important traits (Leung et al. 2015;Sehgal et al 2015;Brozynska et al. 2016;Zhang et al. 2016;Engels and Thormann 2020). The study of this largely untapped biodiversity would provide both the opportunity to recover new allelic variation, urgently required by breeding programs, and to work on the improvement of key traits directly in ancient materials.
Einkorn (Triticum monococcum L. ssp. monococcum) is a hulled diploid wheat (2n ¼ 2Â ¼ 14) which was domesticated during the Neolithic Age in the Karacada g mountain range (Turkey) from its wild ancestor T. monococcum L. ssp. boeoticum (Heun et al. 1997(Heun et al. , 2008Haldorsen et al. 2011). Einkorn quickly spread by different routes of migration throughout Europe (Brandolini et al. 2016), where it was widely used as human food for millennia before being gradually replaced, from the Bronze Age, by the polyploid wheats, characterized by higher production and free-threshing grains. Nowadays einkorn is cultivated in marginal regions and only recently, breeding efforts were started in several European countries (Hidalgo and Brandolini 2019).
Increased awareness on the nutritional properties of foods and recent trends toward low-impact and sustainable agriculture have led to a renewed interest for this ancient wheat. Einkorn composition and nutritive values are different, often better, from those of bread and durum wheat. For example, einkorn kernels are rich in proteins, lipids (mostly unsaturated fatty acids), fructan and trace elements, including Zn and Fe. Additionally, T. monococcum grains have good concentration of several antioxidant compounds (carotenoids, tocols, conjugated polyphenols, alkylresorcinols, and phytosterols) that, along with low betaamylase and lipoxygenase activities (which limit antioxidants degradation during food processing), confer to the flour excellent nutritional benefits, superior to those of all the other wheats (Hidalgo and Brandolini 2019). Although not suitable for coeliacs (Zanini et al. 2015a,b), einkorn elicits weaker toxic reactions than other Triticum species (Mølberg et al. 2005;Vincentini et al. 2007;Di Stasio et al. 2020) and thus might be tolerated by patients with some types of food sensitivities (Lombardo et al. 2015).
The T. monococcum genome shows a high gene conservation and co-linearity with durum and bread wheat, even though the polyploid species inherited their A genome from Triticum urartu (Dubcovsky et al. 1996;Bai et al. 2004;Brandolini et al. 2006). Some einkorn genetic maps are available in literature (Dubcovsky et al. 1996;Kojima et al. 1998;Taenzler et al. 2002;Singh et al. 2007;Jing et al. 2009;Yu et al. 2017) and a high-resolution map based on Diversity Arrays Technology and sequencing (DArTseq) markers was recently developed (Marino et al. 2018). Nevertheless, only few data exist about the molecular basis of traits of interest in einkorn. The soft glumes (Sog, on 7L) trait, the loci coding for seed storage proteins, and Quantitative Trait Loci (QTL) for SDS sedimentation volume and protein content have been located by Taenzler et al. (2002). Alvarez et al. (2016) mapped the earliness-related locus Eps-A m 1 in einkorn genome (on 1L) and identified EARLY FLOWERING 3 (ELF3) as the underlying candidate gene, while Chen et al. (2018) studied and mapped two stem rust resistance genes (SrTm5 on 7L and Sr60 on 5S). Very recently, Yu et al. (2019) analyzed grain sizerelated traits and identified the involved loci, by linkage and homology mapping and transcriptome profiling.
The new approach to genetic resources based on second generation sequencing techniques (genotype by sequencing, DArTseq) has paved the way to a sudden improvement of knowledge about the genomic regions involved in the control of traits of interest by genome-wide association studies (GWASs). However, the full exploitation of these genetic tools in einkorn has long been limited by the poor characterization of the available germplasm. Nonetheless, Jing et al. (2007) using association mapping characterized the phenotypic variation and identified several loci involved in the control of traits related to stress adaptation, phenology, and morphology of this species.
In this study, a panel of 160 T. monococcum ssp. monococcum landraces was subjected to association mapping analysis for complex agronomic traits related to phenology, morphology, yield, and quality. These accessions were genetically characterized with a DArTseq approach, which yielded two types of markers: (i) silicoDArT and (ii) SNP markers. Numerous markertrait associations (MTAs), related to the traits analyzed, were identified; three of these overlapped with previously described loci, while four were located in proximity. The remaining associations represent mainly novel loci involved in these traits.

Plant material, phenotyping, and statistical analysis
The panel used in this study consisted of 160 T. monococcum ssp. monococcum landraces, belonging to a core collection maintained at CREA in Sant'Angelo Lodigiano (Lodi, Italy). The set included material coming from different sites across Europe and neighboring areas (Morocco, Turkey, Near East and Caucasus). The accessions tested, selected mainly on passport data and previous research (Brandolini et al. 2016;Volante et al. 2020), represent a well-balanced sampling of current T. monococcum distribution and diversity (Heun et al. 1997;Empilli et al., 2000). Several accessions for each geographic location were included, depending on availability, as well as selections with good bread making quality. This latter was assessed by micro tests (SDS sedimentation test; Preston et al., 1982), Brabender farinograph and bread baking according to method 10-10.3 (AACC International 2012a). For further description and details, see Supplementary Table S1, Brandolini et al. (2008) and Volante et al. (2020). The field trials were performed in two locations (Sant'Angelo Lodigiano: 45 14 0 21 00 N 9 24 0 22 00 E, and Lodi: 45 18 0 12 00 N 9 30 0 45 00 E) for 4 years (from 2010-2011 to 2013-2014). About 50 plants per accession were planted in one single 2 m row for each environment. Standard cultural practices were applied, including nitrogen fertilization (40 kg/ha) at tillering and chemical weed control (Ariane II: Fluroxipir þ Clopiralid þ MCPA). The spikes were manually harvested at full maturity and seeds stored at 5 C. Whole meal flour was prepared from 10 g de-hulled kernels using a Cyclotec 1093 laboratory mill (FOSS Tecator, Denmark) and stored at À20 C until analysis.
An analysis of variance (ANOVA) was conducted throughout the set of phenotypic data, using genotypes and years as factors; the locations were used as repetitions. Broad sense heritability (H 2 ) was calculated according to Nyquist (1991): where r 2 G is the genetic variance, r 2 GE is the genotype Â environment interaction variance, r 2 e is the residual variance, E is the number of environments, and r the number of replicates. All analyses were performed using the STATGRAPHICS V R Centurion XV (Statgraphics Technologies, Inc., The Plains, VA, USA) statistical program.

Genotypic data and in silico mapping
The genotypic data for the 160 accessions consisted of a set of 33,260 DArTseq markers, described in Brandolini et al. (2016), including both SNP and SilicoDArTs. DNA extraction from 7-dayold seedlings (one plant per accession) was performed with a standard CTAB protocol as described by Stein et al. (2001). DNA concentration was quantified with a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA). The DNA was analyzed with the DArT-seq technology (Kilian et al., 2012) by the company Diversity Arrays Technology (Canberra, Australia). Twelve genotypes (i.e., the very same DNA) were fingerprinted twice (unknown to the lab in Australia). The original panel was filtered for callrate and minor allele frequency (MAF) using the PLINK software (Purcell et al. 2007). Markers with a callrate lower than 90% and with MAF lower than 5% were discarded.
An einkorn physical map was produced in silico using the most exhaustive and comprehensive genome available for wheats so far, that of Triticum aestivum. A physical position of DArTseq markers on bread wheat genome was determined by aligning the corresponding nucleotide sequences (https://www.diversityar rays.com/technology-and-resources/sequences/), supplied by Diversity Array Technologies, to the A genome of the RefSeq v1.0 reference genome (IWGSC 2018). The einkorn genetic map of Marino et al. (2018) was used to orient the physical map developed in this work. The physical positions of 184 markers in common between these two maps were exploited as a scaffold (Supplementary Table S2).
A link to the T. monococcum contigs and to T. urartu pseudomolecules was also acquired blasting the same marker reads against TGAC WGS monococcum v1 (available at https://wheat-urgi.ver sailles.inra.fr) and against GCA_003073215.1 assembly (Ling et al., 2018), respectively. BLASTN searches were performed using the tool available at the URGI (https://wheat-urgi.versailles.inra.fr/ Seq-Repository/BLAST) for T. aestivum and monococcum genome; while for T. urartu BLAST was conducted locally on downloaded genome. All the analyses were done using BLASTN with these cutoffs: a minimum coverage of 99% (only one gap is admitted) and an identity over or equal to 97% (no more than two mismatches).

Analysis of population structure
The analysis of genetic stratification in the einkorn panel was performed with two different methods: a principal component analysis (PCA) by the Tassel v5.2.0 software (Bradbury et al. 2007) and a phylogenetic clustering obtained utilizing the MEGA7 software (Kumar et al. 2016) with the neighbor-joining method and default parameters.
A Bayesian model-based analysis was achieved using the program Structure v2.3.4 (Pritchard et al. 2000). The parameters used were the following: presence of admixture, allele frequencies correlated, a burn-in period of 10,000 iterations, followed by 20,000 Monte Carlo Markov Chain (MCMC) replications, K levels from 1 to 10 and 5 runs per K-value. The free software Structure Harvester (Earl and von Holdt 2012) was used to determine the best number of clusters (K) according to the Evanno method. Once defined the most probable K-value, a final single run was performed using the same parameters listed above, but with a higher number of burn-in and MCMC iterations (100,000 and 200,000, respectively). Accessions with a 0.7 minimum membership (i.e., the probability of one individual to belong to a subgroup identified by Structure) were assigned to a subpopulation, while the remaining were considered as admixed. The number and composition of the clusters identified were integrated into the results of the neighbor-joining analysis and the PCA for cross-validation.

Analysis of linkage disequilibrium decay
The computation of pairwise linkage disequilibrium (LD; r 2 ) was conducted as in Volante et al. (2017). Briefly, the whole set of markers was analyzed with the R package "LDcorSV v1.3.1" (Mangin et al. 2012), using as covariates the Structure membership matrix and the kinship matrix calculated by Tassel. The values were binned in 10 kb windows as in Biscarini et al. (2016).
For each of the nine linkage groups (i.e., all the chromosomes including the split 2 and 4), the median value of each 10-Kb interval was calculated (this was used instead of the mean to account for a non-normal distribution of LD values at short and longdistance classes). The resulting values were plotted against physical distance and fitted to a second-degree LOESS curve using an R script (Cleveland 1979;Marroni et al. 2011). A critical value of 0.2 was set as r 2 between unlinked loci. The physical distance corresponding to a LOESS curve value of 0.2 was assumed as LD decay in the einkorn panel.

Association analysis
For the genome-wide association analysis, to combine data from multiple environments, the Best Linear Unbiased Predictors (BLUPs) of each trait for each accession were calculated using the lmer function of the lme4 R package (Bates et al., 2015). The model fitted to the data considered genotype, year, and genotype: year interaction as random factors. The ranef function was used to calculated BLUPs, which were then used for the association analysis. To identify the best covariates for each trait, we used the method described by Courtois et al. (2013), i.e., a preliminary GWAS was performed with Tassel, testing a Mixed Linear Model (MLM) corrected with (1) the kinship matrix and (2) the kinship matrix and 5 components of the PCA analysis. In all cases, the Tassel default options ("optimal compression" and P3D mode) were used. The best model was identified for each trait by calculating the Bayesian Information Criterion (BIC) as in Schwarz (1978); the BIC was computed as -2 ln(L) þ kln(n) where L is the maximized value of the likelihood function for the estimated model (provided by the GWAS analysis), k is the number of estimated parameters and n is the size of the sample. The model with the lowest BIC value was selected. The final GWAS was then performed with the most appropriate model for each trait, with Tassel using the "no compression" and "no P3D" mode to increase the power of detection (Courtois et al. 2013). A P-value of the association to the phenotypic traits was computed for each marker; the significance threshold to declare a marker as associated was set to 0.05 after correction for multiple testing, using the false discovery rate method (Benjamini and Hochberg 1995). Manhattan plots and Q-Q plots of each trait were drawn using the R package "qqman" (Turner 2014).

Statistical analysis of phenotypic results
The ANOVA conducted using year and genotype as factors indicated that both had significant effects on the phenotypic variation (Supplementary File S1), highlighting the strong effect of changing yearly climatic conditions and the broad variation available in the einkorn pool tested. This last aspect is evident in the data from Table 1, reporting the average, minimum and maximum values of the traits analyzed, expressed as average of four growing seasons (2011-2012, 2012-2013, 2013-2014, and 2014-2015) and two locations (SAL and LO), as well as their broadsense heritabilities. These latter had remarkably high values, ranging between 0.896 (GLC) and 0.981 (HDA).

Using wheat genomes to define marker location
After filtering for missing data and MAF that eliminated 61.7% of the data, 12,734 out of 33,260 einkorn DArTs were retained for subsequent analyses (Supplementary File S2).
An einkorn physical map was developed in silico assigning to 2037 markers a physical position on the bread wheat reference genome (IWGSC RefSeq v1.0). The genetic map recently published (Marino et al. 2018) was employed as a scaffold to correctly build and orient the einkorn physical map. Two portions, one for two and one for four chromosomes, showed a marker order conserved but inverted respect to bread wheat genome. Therefore, these chromosomes were split in two portions each ("up" and "down"): the distal portion of chromosome 2 was inverted respect to the physical map of T. aestivum ( Sequences of DArT markers were also aligned against the recently released version of T. urartu genome (Ling et al., 2018): 1673 markers, corresponding to 13% of the whole dataset, were linked to a physical position using the same BLAST criteria adopted with bread wheat genome, with a number of markers/chromosome ranging between 107 (Chromosome 4) and 342 (Chromosome 2; Supplementary Table S4_a).

Population structure
The results of the Principal Component, Structure and Neighbor Joining analyses are shown in Figure 1 and Supplementary  Table S1. The most probable Structure model assigned 118 accessions (corresponding to 73.3% of the panel) to 3 clusters (Supplementary Table S1): K1, containing accessions of mixed origin from the Maghreb to the Balcans (red cluster), K2 which mainly included continental accessions (green cluster), and K3 composed mainly of landraces from Turkey, Greece, and Italy (blue cluster). The first three PCs accounted for 27.6% of the variability in the panel ( Figure 1A). The clustering coming from PCA was also consistent with that resulted from the neighbor joining tree ( Figure 1B); by crossing the results from the three analyses it was evident that the two groups separated by the PC3 in the K1 cluster (red cluster, Figure 1A), corresponded to the accessions  from Maghreb and Spain, and those from East-European respectively ( Figure 1A). This clustering was consistent with that previously proposed by Brandolini et al. (2016), which in this panel of landraces observed two groups splitting from the main cluster, one of Iberia-Maghreb and one of "prealpine" accessions (mainly from Austria, Germany, and Switzerland).

Analysis of genome-wide linkage disequilibrium decay
The mean chromosome-wise decay of linkage disequilibrium, calculated as r 2 , was approximately 176 Kbp (Table 2). A heterogeneous distribution was observed over the different chromosomes, ranging from 25 kbp (chr. 6) to 395 Kbp (chr. 5). The average decay plot across all the chromosomes is shown in Figure 2.

Association analyses
In order to account for the interaction genotype Â environment on the variability of the traits, association analyses were conducted on BLUP predictors calculated for each accession. The Manhattan and QQplot for each analysis are shown in Figure 3 and Supplementary File S3. A total number of 64 markers with a reliable position on the T. aestivum genome ("mapped") were associated with the analyzed traits, and defined 37 regions ( Figure 3, Table 3), which were identified as trait acronym.chr number/region number. Furthermore, 274 associated but "unmapped" markers were identified (Supplementary Table S3).
For morphology-related traits (AWN, GLC, GLH, GRL, GRW, PLH, and SPS), the analysis detected 25 associated regions with a defined position on the bread wheat genome. The percentage of variance explained by the peak markers (r 2 ) ranged between 8.23% and 17.28%. GLH was the trait with the highest number of MTAs (eight mapped regions).
The GWAS on the phenology-related HDA resulted in three associated mapped regions (variance explained ranging between 8.45% and 14.26%), while for quality traits (CAR, SDS, and PRO) the analysis yielded in total nine associated regions (two for CAR, four for SDS, and three for PRO). The variance explained ranged between 7.8% and 15.7%.

Discussion
The renewed interest for cultivated einkorn, for millennia cropped in isolated areas of Middle East, Caucasia, and Europe, is centered on the nutritional qualities of its grain (Hidalgo and Brandolini 2019), the high resistance to pests and diseases (Zaharieva and Monneveux 2014;Serfling et al. 2016;Chen et al. 2018), and its adaptation to harsh, low-input environments. Low yields, glumes persistence after harvest (hulledness) and a limited knowledge of its diversity explain why this crop is not properly exploited (Zaharieva and Monneveux 2014).
Nevertheless, einkorn landraces were selected and grown in isolated environments and were not subjected to the "bottleneck" effects of intensive selection, thus even today display a broad range of variation for many traits. The panel of landraces used in this work was recently characterized from the phenotypic point of view (Volante et al. 2020); a deeper investigation of the genetic bases underlying such diversity may unravel valuable genes,  Figure 2 Genome-wide LD decay in einkorn accessions. The red curve represents the second grade LOESS that approximates the r 2 values; the horizontal black line shows the threshold r 2 value for unlinked markers.
potentially useful for wheat improvement. Linkage and association mapping rely necessarily on the availability of a great number of molecular markers with known genetic or physical position; a high-resolution einkorn genetic map was recently published (Marino et al. 2018), but only few markers (849)    A genome to bread wheat, we aligned the sequences of the DArT markers from a panel previously developed and described (Brandolini et al. 2016) on the annotated genome of bread wheat (IWGSC, 2018), being this latter the most exhaustive and comprehensive genome available for wheats so far. In such a way, a physical position could be approximated for 2037 out of the 12,734 markers used for the GWAS analysis (corresponding to 16% of the panel). We decided to use very stringent parameters for the BLAST analysis, despite knowing that we would lose most of the alignments, but due to the genetic distance between the two species (Hyun et al. 2020), we wanted to be as rigorous as possible and to give very robust physical localizations. In the exploitation of T. urartu genome the percentage of physically mapped markers was lower (13%: 1673) but the marker order along chromosomes of einkorn physical map was conserved with that of T. aestivum genome (Supplementary Table S4_a). These observations support our choice to use the bread wheat genome as the most appropriate tool to give a physical order to einkorn markers. The association analysis, corrected for the population structure, was therefore undertaken using the markers thus physically mapped, together with those with unknown position.
The pairwise LD (r 2 ) observed in the panel decayed to values lower than 0.2 (generally assumed as a r 2 value between unlinked loci) at 176.11 Kbp (as an average value across all the chromosomes). Such an r 2 is notably small, compared to what is generally observed in wheat species such as bread wheat (8 Mbp as an average among A, B, and D genome; , and durum wheat (from 1.6 Mb to 4.5 Mb; Maccaferri et al. 2019), while for rice similar values have been observed (150 Kb to 2 Mb for the japonica group; Courtois et al. 2013;Kumar et al. 2015). Strongly selected crops typically show high levels of LD, due to increased correlation among alleles at specific loci (Schlö tterer 2003); conversely, a short decay can be expected in a panel including landraces, which usually have a long history of isolated cultivation and were not or were poorly subjected to breeding. Accordingly, Hao et al. (2011) detected a lower LD in bread wheat landraces compared to modern cultivars; therefore, the results observed are not surprising because einkorn has a completely different history of domestication from "more successful" crops. However, it is remarkable that a significant percentage of the accessions used in this work (26.7%) were not assigned to a specific cluster by the STRUCTURE analysis, suggesting that a certain percentage of admixture was present.
The marker set used in this work (composed of 12,734 DArTseq) allowed to identify a high number of loci associated to the traits analyzed. We investigated whether the significantly associated regions (the interval defined by the peak marker 6 176 kbp, according to the extent of the LD decay) found in this study were located at the same or close physical position of previously identified genes or QTL with a known function.
A survey of the literature suggests that the regions identified on chromosomes 3 (608.8 Mb) PRO.03/01 and 5 (665.6 Mb) SDS.05/ 01 are in proximity of (although not overlapping) those identified by Yang et al. (2020) in very recent multi-locus GWAS (at 649.7 Mb for grain protein content-AX-95629522; and at 697.7 Mb for SDSsedimentation volume-AX-94927055) ( Table 4).
The two regions identified for AWN on chromosome 5 (522.1 and 551.4 Mb) are definitely not included into the interval of B1 gene recently cloned (from 681.5 to 706.8 Mb;DeWitt et al., 2020).
Since the remaining regions detected in this study were not reported in previous GWAS works, we hypothesize that most of the MTAs observed in this study, and shown in Table 3, define new and undescribed regions controlling the einkorn traits analyzed. Moreover, several markers, including both mapped and unmapped, resulted associated to multiple traits (Supplementary Table S4_d), suggesting possible pleiotropic effects of the associated QTL. It would be interesting to investigate deeper these overlapping, especially HDA-SPS and PLH-SPL, trait-association already described in bread wheat (Chai et al. 2018;Chen et al. 2020) Overall, these observations deserve further investigations and could be exploited both in the breeding of this crop and to transfer valuable alleles into more widespread species such as bread wheat.

Conclusion
An approach based on co-linearity with bread wheat was adopted to achieve a set of einkorn molecular markers with an inferred physical position; these, together with the phenotypic information from a 4 years field trial experiment, allowed to undertake an association mapping survey on a species poorly characterized from the genetic point of view. Indeed, this is the first report of a GWAS approach on einkorn using a high number of molecular markers and an extensive panel of landraces. The analyses yielded several einkorn genome regions putatively involved in the regulation of phenological, morphological, quality, and production-related traits, which deserve a deeper characterization from the functional point of view. For seven associated regions, we also found confirmation of our results in the literature on bread wheat, despite the remarkable genetic distance between the two species, and this further validates the robustness of the method. Starting from this information, a set of specific markers such as Cleaved Amplified Polymorphic Sequences (CAPSs), or markers amenable for automation and highthroughput approaches (such as Kompetitive Allele Specific PCR: KASPar markers) may be developed and used as valuable tools for the marker-assisted breeding of einkorn wheat. Unfortunately, no consideration can be done for many other markers, despite significant association P-values, because they were not mapped on the bread wheat genome, and only few could be positioned either on T. urartu genome (18) or einkorn contigs (48) (Supplementary Table S4_c). Nevertheless, it should be kept in mind that we adopted very stringent mapping criteria, which allowed a very accurate positioning of the markers but, on the other hand, led to define the most part of these as "unmapped". Despite this, we found several markers influencing multiple traits (e.g., heading date with number of spikelets per spike, and spike length with plant height), and these coregulation mechanisms found support in literature further validating our analysis. Moreover, when the mapping criteria were loosened, further associations could be mapped and found consistency with literature (data not presented).
The alignments performed to the three species let us to identify 3290 einkorn contigs linked to 3655 markers, out of them 1166 and 1412 were physically mapped on T. urartu and T. aestivum genomes, respectively. These results revealed a possible future strategy for ordering and orienting the einkorn contigs currently available but not yet anchored: the results of the BLAST alignments (maybe loosening selection criteria) performed for the whole marker dataset (also including those not passing the callrate and minor allele frequency thresholds adopted for the association mapping) coupled with the available genetic information could be exploited for this purpose.
This would boost the molecular study of still unexplored traits and unlock the efficient use of einkorn as a valuable repertoire of genetic alleles for the breeding of modern crops.

Authors contributions
A.V. performed the analysis of population structure and GWAS, wrote the first manuscript draft and managed the revision process. D.B. performed the blast of the DArT markers against the wheat genomes (T. aestivum, T. urartu, and T. monococcum) and contributed to manuscript writing. A.V. and D.B. investigated the novelty of the MTAs identified. R.M. contributed to the statistical analyses, to writing and critically revising the manuscript. A.B. conceived and supervised the work, managed in-field trials and phenotyping and revised the manuscript. All the authors read and approved the final manuscript. A.V. and D.B. contributed equally to the paper.
Code availability (software application or custom code) The original codes cited in the reference, when specified, were used.

Data availability
Supplemental files available at figshare: https://doi.org/10.25387/ g3.14975118. Table S1 includes the list of the accessions tested, their origin and the cluster assigned by the Structure analysis. Table S2 shows the physical position on bread wheat genome (IWGSC RefSeq V1.0) assigned to DArT markers of Triticum monococcum. Table S3 shows the results of the Genome-Wide Association Mapping analysis for markers unmapped on bread wheat genome. Table S4 shows the physical position of DArT markers on T. urartu genome (a; GCA_003073225.1) and T. monococcum contigs (b; TGAC WGS monococcum v1). In the third sheet (c), the physical position of the peak marker-trait associations (both mapped on bread wheat genome and unmapped) is reported. In the fourth sheet (d), markers associated to multiple traits are listed together with their physical position on the three considered genomes.
File S1 reports the Analysis Of Variance (ANOVA) of the phenotypic data. File S2 contains the genotypic data (DArTseq) of the einkorn panel used for the analysis of population structure, analysis of linkage disequilibrium decay and association mapping. Only the markers passing the callrate (90%) and Minor Allele Frequency (5%) thresholds are shown. File S3 shows the Manhattan-and QQ plot resulting from the association analyses for all the considered traits.