Insertion of a mMoshan transposable element in PpLMI1, is associated with the absence or globose phenotype of extrafloral nectaries in peach [Prunus persica (L.) Batsch].

Most commercial peach [Prunus persica (L.) Batsch] cultivars have leaves with extrafloral nectaries (EFNs). Breeders have selected this character over time, as they observed that the eglandular phenotype resulted in high susceptibility to peach powdery mildew, a major disease of peach trees. EFNs are controlled by a Mendelian locus (E), mapped on chromosome 7. However, the genetic factor underlying E was unknown. In order to address this point, we developed a mapping population of 833 individuals derived from the selfing of "Malo Konare", a Bulgarian peach cultivar, heterozygous for the trait. This progeny was used to investigate the E-locus region, along with additional resources including peach genomic resequencing data, and 271 individuals from various origins used for validation. High-resolution mapping delimited a 40.6 kbp interval including the E-locus and four genes. Moreover, three double-recombinants allowed identifying Prupe.7G121100, a LMI1-like homeodomain leucine zipper (HD-Zip) transcription factor, as a likely candidate for the trait. By comparing peach genomic resequencing data from individuals with contrasted phenotypes, a MITE-like transposable element of the hAT superfamily (mMoshan) was identified in the third exon of Prupe.7G121100. It was associated with the absence or globose phenotype of EFNs. The insertion of the transposon was positively correlated with enhanced expression of Prupe.7G121100. Furthermore, a PCR marker designed from the sequence-variants, allowed to properly assign the phenotypes of all the individuals studied. These findings provide valuable information on the genetic control of a trait poorly known so far although selected for a long time in peach.


Introduction
Most peach [P. persica (L.) Batsch] cultivars have extrafloral nectaries (EFNs), or leaf glands, on the leaf blades, petioles, stipules, or margins [1,2]. EFNs have been observed on a large diversity of species spanning over 93 families and 332 genera [3,4]. They are sugarproducing glands which provide plants with indirect defense against herbivores and fungi by attracting beneficial predatory arthropods, predominantly ants, and fungivorous mites [5,6], which play the role of bodyguards and keep leaves free from microscopic herbivores and detrimental fungi [7]. Thus, EFNs can contribute to the reduction of the pathogen load by increasing mite abundance in domatia, and therefore enhance plant-mite mutualisms. In this way, EFNs may increase the ability of EFN-bearing peach cultivars to be protected from detrimental organisms by naturallyoccurring biological control agents [5,6].
From an extensive study of the main varieties of the peach, Gregory [1] observed that gland shapes were, for the most part, well defined and preserved, and that they could serve to separate the majority of varieties into groups. Indeed, gland shape was usually homogenous on typical shoots, although some cultivars could exhibit mixed glands. This author identified four main types of leaves, those with reniform (kidney-shape) glands, those with globose glands, eglandular (glandless) leaves and those having indistinctive glands. He reported that glands varied in number over the leaves of a same tree and were smaller on the leaf margin than on the petiole. Gregory [1] also observed that reniform glands were associated with single crenate leaf margins, whereas leaf margins were doubly and deeply serrated in eglandular individuals.
In the past, fruit breeding programs had occasionally produced peach cultivars with glandless leaves; nevertheless, the effects on either detrimental fungi or herbivorous pests had not been determined [2,8]. Mathews et al. [5] however, comparing glandular and eglandular peach trees derived from the selfing of the cultivar "Lovell", observed that those trees with EFNs harbored significantly fewer herbivores than trees without EFNs. The latter also experienced lower growth and fruit production. Earlier, empirical observations showed that the absence of EFNs in peach cultivars resulted in high susceptibility to peach powdery mildew (PPM), one of the major diseases of the peach [9,10]. Additionally, in wild grape, Weber et al. [7] demonstrated that adding foliar sugar to plant leaves increased the number of mutualistic mites inhabiting leaf domatia and showed that this was negatively correlated with the level and extent of grape powdery mildew infection, a fungal disease similar to PPM. PPM is caused by Podosphaera pannosa var. persicae [11], a member of the Ascomycete fungi, which can be responsible for serious damages in peach orchards. Indeed, the disease may induce necrosis and malformation resulting in unmarketable fruits, premature drop and shoot stunting [12]. For this reason, eglandular peach seedlings were systematically discarded during the selection process of most of the breeding programs.
The Mendelian inheritance of the leaf gland phenotype was first described by Connors [13]. The trait has an incomplete dominance, the absence being recessive and the globose shape of the nectaries representing the heterozygous phenotype. Further studies allowed to map the trait on a single locus (E) on chromosome 7 of the peach [14,15], but without identifying any factor responsible for the trait and its variations. Furthermore, this same region was found associated with a minor Quantitative Trait Loci (QTL) for resistance to PPM in P. ferganensis [14,16]. These various studies contributed to provide evidence that EFNs might play a role in lowering PPM incidence and could be of most interest for limiting the populations of some classes of detrimental herbivores and fungi in the trees. Therefore, further investigations deserved to be conducted to identify the factor underlying the E locus. For this reason, in-depth study of the E locus was carried out, in the frame of our breeding program for resistance to pests and diseases in peach.
The main objectives of the current work were to develop a high-resolution map of the E locus, then investigate the underlying genomic region and identify the genetic factor involved in the variation of the leaf gland phenotype as well as its possible link to susceptibility to PPM, apart from the indirect defense to fungi provided by EFNs. Then, accessorily, develop PCR marker(s) to facilitate early selection of glandular seedlings. With this aim, an initial mapping population of 212 individuals, referred to as AF5392, then extended to 833, was developed from the selfing of "Malo Konare" (clone S5392), a peach cultivar with globose (heterozygous) leaf glands, from Bulgarian origin [17]. "Malo Konare" was selected as it was heterozygous for the trait and part of our breeding program for resistance. Peach genomic resequencing data from five contrasted cultivars and two individuals from the AF5392 (the eglandular "S10215" and the reniform "S10216") were used for in-depth investigation of the E-locus region. Additional resources were used to support our findings.
They consisted in individuals from a breeding population segregating for the trait, referred to as BC2 [18], and a collection of 149 accessions from various origins, including two accessions of wild species close to peach, Prunus davidiana (Carr.) and Prunus kansuensis (Koehne), one accession of Prunus ferganensis (Kost. & Rjab.), two double haploids ("S7324" and "S7327") and a possible triploid ("S7314"). The outcomes of this study will provide valuable information on a trait little studied in peach so far and more widely in Prunus species. Furthermore, they would benefit our breeding program aimed at developing multi-resistant elite peach cultivars.

Phenotypic evaluation
Seven hundred and seventy-nine progenies out of the initial 833 of the population AF5392 were observed over two years, among which 197 from the initial population. Two hundred and two (26%) were eglandular, 382 (49%) globose and 195 (25%) reniform. This distribution was in agreement with the (1:2:1) segregation ratio expected for a Mendelian trait in this type of population (χ 2 = 0.41). Regarding the BC2, 62 individuals had globose leaf glands and 60 had reniform leaf glands. As regards the collection, 112 cultivars and the two wild peach relatives were scored reniform, 30 globose, four eglandular and one indeterminate (Table S1). For Prunus kansuensis (S1429), a noteworthy phenotypic difference was observed: EFNs were embedded in the margin of the lower part of the leaf-blade instead of the upper ridge of the petiole in the other accessions. Regarding PER2.3 N#1 (S7314), a possible triploid scored indeterminate, no regular leaf gland was noticeable but a number of small spikes picks on the petiole, close to the leaf blade. Finally, with respect to leaf margins, a close association was observed between deeply serrated leaves and the eglandular phenotype in the population AF5392. Eglandular individuals had sharp doubly well-defined leaf serrations contrary to those with globose or reniform glands, which had leaves with rounded, shorter crenellations. Crenellations were generally slightly more pronounced in globose individuals (Fig. 1). Regarding the collection, the same association between leaf serration and the absence of EFNs was observed for the four eglandular accessions.

Genetic map of linkage group 7 and high-resolution mapping of the E locus
The map of linkage group 7 (G7) derived from the 212 initial individuals covered a total genetic distance of 80.9 cM (Fig. 2) spanning a physical distance of 19 892 186 bp (88.85% of chromosome 7).
The map was composed of 18 SNPs among which six, including ASPP900, collocated with the E locus, at 49.6 cM, spanning a physical distance of 107.8 kbp. The physical distance between the SNPs on either side of the E-locus region (SNP_IGA_776067 and SNP_IGA_777469) was 665.7 kbp for a genetic distance of 1.6 cM. No    significant deviation of marker segregation was observed (P < 0.05). In addition to the above individuals, 567 individuals from the AF5392 were genotyped with SNPs included in the E-locus region as well as in the two flanking loci. Forty-eight recombinants were observed in the above interval, among which twelve between SNP_IGA_776067 and SNP_IGA_776214 (70.5 kbp) and three double-recombinants between ASPP899 and ASPP901 (Table 1), the latter delimiting an interval of 10.7 kbp including ASPP900 and the E locus.

In silico analysis
Based on the above results, investigations were firstly carried out in the region of 10.7 kbp then extended to the 70.5-kbp genomic region between SNP_IGA_776067 and SNP_IGA_776214 (Positions Pp07:14414202 to Pp07:14484 739 respectively), for gene and variant discovery. Twelve predicted genes ( Table 2) retrieved from the Genome Database for Rosaceae (www.rosaceae.org/species/pru nus_persica/genome_v2.0.a1) were identified, among which three genes (Prupe.7G121000, Prupe.7G121100 and Prupe.7G121200) were located in the ASPP899-ASPP901 interval.
Reads from the eglandular "S10215", the reniform "S10216", "Summergrand" and "Pamirskij 5", as well as the globose "Zephyr" and "Malo Konare" were aligned onto Peach v2.0.a1 derived from the reniform peach cv. Lovell (Plov2-2 N) and compared. A total of two hundred and seventy-seven variants between "S10215" and "S10216"and heterozygous in "Malo Konare", were identified among which six SNPs and an indel in the ASPP899-ASPP901 region (Table S2). However, no relationship was observed between any of the variants and the trait, except for the indel, which clearly differentiated eglandular, reniform and globose accessions. For the other 276 variants, the eglandular 'S10215 had the same haplotype as the reniform "Summergrand" and "Lovell" h h S5392 globose, S10215 eglandular, and S10216 reniform haplotypes, are before the twelve recombinant individuals; a homozygous reniform, b homozygous eglandular, h heterozygous; breakpoints are highlighted in grey color.
(Plov2-2 N), as well as the globose "Zephyr". In contrast, the reniform "S10216" was highly similar to 'Pamirskij 5 (reniform), except for an 11-kbp region upstream of the indel, for which 'Pamirskij 5 had the same haplotype as the above four other accessions (Table S2). The indel was located in Prupe.7G121100, a gene annotated as putative homeobox-leucine zipper protein ATHB-51 (Table 2). According to the information provided in databases, Prupe.7G121100, has a DNA-binding transcription factor activity and is involved in bract formation and leaf morphogenesis. Based on these findings, 25 primers (Table S3) were developed from consensus regions between "S10215" and "S10216" in order to sequence the interval encompassing Prupe.7G121100, as well as the 100-bp gap which remained in the peach genome sequence reference [44] (Peach v2.0.a1) immediately upstream of the CG (position Pp07:14436205.0.14436304).
The sequencing of the gap region resulted in sequences 9-fold longer than expected (905 bp and 903 bp for S10215 and S10216 respectively), therefore impacting coordinates downstream (Fig. S1). Sequence comparison allowed identifying a 590-bp insertion in the last coding DNA sequence (CDS) of Prupe.7G121100, in the eglandular "S10215", as well as two additional polymorphisms due to differences in the number of CT repeats in two SSRs present in the gap region (Fig. S1). The 590-bp insertion was located between positions Pp07:14437331 and Pp07:14437332, disrupting the initial reading frame (Fig. S1). BLASTN search against NCBI database allowed finding a high similar hit (98% of identity) with an insertion fragment of 588 bp, upstream of the start codon of a chalcone isomerase (CHI) gene of peach (Sequence ID: KF990613.1). This insertion was identified as a MITE-like Moshan (mMoshan) transposable element of the hAT superfamily. BLASTN search with the sequence inserted in Prupe.7G121100, against Peach v2.0.a1, returned 91 additional highly-similar hits (> 95% identity) spanning all the chromosomes, all starting from the third 5' nucleotide of the inserted element. The most similar (100% of identity from the third 5' nucleotide) was located on chromosome 5 (Pp05:16628569-16 629 156), 460 bp upstream of the start codon of Prupe.5G208500, a homolog of AGL8 (agamous-like 8) transcription factor. Nevertheless, a fine analysis of the insertion sequence highlighted some differences with the other transposable elements. The 92 above transposons had terminal inverted repeats (TIRs) composed of 13 or 14 complementary nucleotides and 8-nucleotide target site duplication (TSD). Regarding the insertion, the TIR in 3 was composed of 13 nucleotides identical to that of 90 of the 92 transposons. However, only 10 nucleotides of the 5' TIR were complementary with those of the 3' TIR ( Fig. S1). In addition, no direct repeat sequence was observed at the target insertion site and therefore no TSD. A likely hypothesis is a deletion in the original sequence (GACGAGCCTAGGGGTGGGCAC) where "GACGAGCC" was the TSD, the deleted motif "CGAGCCTAGG" and the original 5' TIR starting with the motif "TAGGG".

Analysis with FGENESH
The analysis with FGENESH was performed for both variants, using the genomic sequence of Prupe.7G121100 supplemented by the sequence of the gap region, and various dicot plant species as models, among which P. persica, A. thaliana, M. domestica and Lycopersicon esculentum.
One single prediction was obtained with the reniform sequence (Fig. S2). The addition of the gap-region sequence would lead to a primary transcript composed of three CDS instead of two, the initial start codon being replaced by another one 431 bp upstream. Queries of GDR_RefTransV1 and NCBI database using the sequence of the resulting transcript, validated the prediction, three transcripts (P.persica_gdr_reftransV1_0044698, P. dulcis LOC117635596 and P. avium LOC 110754228) having sequences highly similar with the predicted transcript (100%, 99% and 98% identity respectively). For eglandular accessions, four different predictions were obtained, all of these including an additional CDS in the 3 region. Differences between predictions were linked to the proportion of the transposon included in the third CDS and the position of the fourth.

Expression analysis of Prupe.7G121100
Relative expression levels of Prupe.7G121100 in leaves were assessed in three eglandular, two globose and three reniform cultivars, as well as two wild species, P. davidiana P1908 and Prunus kansuensis S1429 (Fig. 3). Contrary to our initial expectations, Prupe.7G121100 had significantly higher expression levels (p < 0.001) in eglandular accessions, than in both reniform and globose individuals, either before or after normalization with PpTEF2 and PpRPL13. Normalized differential expressions ranged between 0.049 ± 0.0052 (mean ± SE) and 0.1454 ± 0.03675 for reniform individuals, 0.1423 ± 0.01 and 0.1778 ± 0.0093 for the globose ones, and between 0.6846 ± 0.0394 and 1.1818 ± 0.0627 for eglandular accessions (Fig. 3), thus showing a negative correlation between the expression level of Prupe 7G121100 (p < 0.001) and the presence of EFNs. No significant difference was observed between the two samples of Prunus kansuensis S1429 (p < 0.01). In comparison, differential expression values before normalization, were comprised between 1 ± 0.10 (mean ± SE) and 2.96 ± 0.75 for reniform individuals, 2.89 ± 0.19 and 3.62 ± 0.20 for the globose ones, and between 13.93 ± 0.77 and 23.79 ± 0.99 for eglandular accessions.

3' RACE PCR and comparison of the alleles of the transcript
Nested PCRs based on sense primers associated with the AUAP antisense primer gave single amplicons for the reniform accessions only, whereas those carried out on eglandular accessions produced mixtures of amplicons of different sizes (smears). In contrast, those carried out using the antisense primers developed from each of the four predictions, (Table S6) gave the expected results, with amplicons present or absent according to the prediction considered. Sequences derived from the amplicons confirm the insertion of the 179 first nucleotides of the transposon after position 134 of the third exon of the initial transcript, as well as the presence of a fourth exon in the eglandular accessions ( Fig. S1 and . Relative expression of Prupe.7G121100 in ten accessions contrasting for EFNs. Expressions were normalized with the constitutive genes PpTEF2 and PpRPL13. Eglandular, globose and reniform individuals are denoted by the letter E, G or R, before the accession number, respectively. The three eglandular individuals are framed red. The expression of P. kansuensis is represented by two samples: R-S1429M (leaf margin) and R-S1429P (upper petiole region). Fig. 4). This confirms that prediction #3, which includes P. persica in the model, is the only valid prediction (Fig. S2). Regarding the reniform accessions, the transcript was as expected. The size of the eglandular transcript was 99 nucleotides longer than that of the reniform one (777 and 678 nucleotides respectively) resulting in a larger predicted protein (258 and 225 amino acids respectively). Moreover, major changes were observed: a 33-amino acid open reading frame at the 3 end of the reniform transcript was replaced by another comprising 65 amino acids in the eglandular transcript (Fig. S2).

Genotyping with the ASPP900 marker
One thousand and fifty individuals in total were genotyped with the ASPP900 marker, among which two hundred and seventy-one individuals were used for validation, including 149 accessions (Table S1). For all of them except "S7314", genotypes were consistent with phenotypes and globose individuals could also be clearly differentiated from reniform ones. "S7314" was considered as a possible triploid derived from the eglandular "Prosser 2.1 N"; however, it was genotyped as globose (heterozygous) and phenotyped as undifferentiated. These discrepancies do not question the efficiency of ASPP900, but are rather due to its peculiar genotype. In addition, this raises doubts on the single parental origin of this accession.

Discussion
The aim of our study was to identify the genomic factor responsible for the presence/absence of EFNs in peach, a Mendelian trait previously mapped on chromosome 7 [14,15], but little studied so far. The fine-mapping approach allowed delimiting the trait to an interval of 10.7 kbp between positions Pp07:14428469 and Pp07:14439211. Micheletti et al. [19], using the ISPC 9 K SNP peach array [20] and a collection comprising 750 reniform and 190 globose accessions, identified a single SNP, SNP_IGA_776161 (position Pp07:14469094) as associated to the leaf-gland type. This association was not fully congruent with our observations as "S10215" (eglandular), "Zephyr" (globose), "Summergrand", "Rubira" and the peach genome reference derived from "PLov2-2 N" (reniform) had the same allele combination (C/C) for this SNP, whereas "S10216" and "Pamirskij 5", both reniform were T/T. Nevertheless, taking into account the limited number of SNPs on the array corresponding to the trait region, as well as possible misclassification of some individuals of the collection, the results of the two studies were convergent.
Coupling the results of the fine-mapping approach with the comparison of the genomic sequences of accessions contrasting for the trait, allowed identification of a single candidate gene, Prupe.7G121100, among the three genes included in the above interval, and more broadly, among the 12 genes comprised in the longer 70.5-kbp genomic region encompassing the latter. Indeed, Prupe.7G121100 has two variants: the regular one, associated with the presence of EFNs and homozygous in reniform individuals, and a second one including a 590-bp insertion homozygous in eglandular individuals. This insertion was identified as a MITE-like transposable element of the hAT superfamily, termed as mMoshan [21].
Miniature inverted-repeat transposable elements (MITEs) are non-autonomous class II transposable elements which are considered a major driving force for generating allelic diversity in plant genomes [22]. MITES account for 3.89% of the peach genome [23] with 0.16% for the 491 Moshan elements identified. Moshan elements are specific to Rosaceae and the mMoshan class is predominant with 432 elements [21]. Interestingly, two of these mMoshan elements generated no obvious target site duplication, as the element inserted in Prupe.7G121100, suggesting that these three elements were atypical. Wang et al. [21] identified 29 mMoshan which were inserted in genes, among which 14 in exons. The 29 genes were distributed over all the chromosomes but none in chromosome 7. The mMoshan in Prupe.7G121100 was not detected, probably because Peach genome v2.0.a1 was derived from the reniform double haploid "Lovell" Plov2-2 N and therefore does not include the inserted element. This author observed that genes including mMoshan elements showed relatively lower expression levels compared with genes lacking these elements and this was consistent with previous studies on MITES [24]. However, this was not the case in our study since Prupe.7G121100 demonstrated enhanced expression in eglandular individuals compared to that in globose and reniform ones, which were quite similar. mMoshan elements contain several cis-regulatory elements such as MYB and WRKY binding sites in the first third of the sequence, which could be involved in upregulation of the transcription [21]. However, when we take into account the minor differences in gene expression observed between reniform and globose individuals, and the similarity of the phenotypes of their leaf margins, as well as the incomplete dominance of the trait, this seems not correlated. It would be therefore interesting to investigate possible functional differences between the two alleles This point needs a dedicated approach.
Prupe.7G121100 was annotated as putative homeoboxleucine zipper protein ATHB-51, a member of the class I (HD-Zip I) superfamily of transcription factors. (HD-Zip) proteins are specific to plants. They include the combination of a DNA-binding homeodomain (HD) and an adjacent Leucine zipper (Zip) motif, which mediates protein-dimer formation [25]. Saddic et al. [26] identified ATHB-51 as a meristem identity regulator and named it LATE MERISTEM IDENTITY (LMI1) based on its regulation functions; accordingly, we will further refer to Prupe.7G121100 as PpLMI1. These authors showed that LMI1 was a direct target of LEAFY (LFY), a central meristem identity regulator in Arabidopsis thaliana as well as a direct upstream activator of a second meristem identity regulator, the MADS-box transcription factor CAULIFLOWER (CAL). LMI1 and LFY act together to induce CAL expression, the interaction between these three genes corresponding to a feed-forward loop transcriptional network motif [27]. LMI1 thus belongs to the complex of genes including others transcription factors, such as APETALA1 (AP1), involved in the meristem identity switch leading to flower formation [26,28]. Interestingly, the mMoshan transposable element identified on chromosome 5, with the highest percentage of identity with that inserted in PpLMI1, was in the promoter region of an homolog of AGL8 (agamous-like 8) transcription factor, a MADS-box negatively regulated by APETALA1, suggesting a possible involvement of APETALA1 in the regulation of PpLMI1. However, LMI1 has also additional LFY-independent roles in leaf morphogenesis and bract formation [25]. For instance, LMI1 regulates leaf growth and organ proportions such as stipules size in A. thaliana [29]. This is done through an endoreduplication-dependent trade-off and the activation of the mitosis blocker WEE1, which limits tissue size and cell proliferation [30]. Moreover, modifications of GhLMI1-D1b, one of its homologues, were found to be responsible for the major leaf shapes in Upland cotton (Gossypium.hirsutum) [31]. This designates LMI1-LIKE genes (along with the KNOXI genes) as evolutionary hotspots that are involved in the modification of leaf shape in angiosperms [32]. In this way, Chang et al. [33] demonstrated that LMI1-like and KNOX1 genes coordinately control leaf development in dicotyledons and that different expression patterns of these two genes lead to the formation of different leaf marginal structures. The same way, loss of function of CrLMI1, a likely ortholog of LMI1, was reported to decrease leaf serration in Capsella rubella [34]. This is in agreement with the results of our study, in which increased leaf margin serration was found strictly associated with the absence of EFNs, concurrently with the higher expression of PpLMI1. This relationship between leaf serration and absence of EFNs was already reported in previous studies [13]. These findings thus contribute to confirm the possible involvement of PpLMI1 in leaf margin structures in peach and accordingly in the phenotype of EFNs. Likewise, in cucumber (Cucumis sativus), mict, a class I HD-Zip factor which sequence had 52% of identity with LMI1, regulates multicellular trichome development [35].
Regarding the EFNs, however, molecular genetic understanding of their formation is still underdeveloped. No study to date is available in tree species and only a few ones have been published in annual plants. For instance, Hu et al. [36] identified GaNEC1, a gene encoding a PB1 domain-containing protein, as positive regulator of nectary formation in cotton, which silencing led to a reduced size of foliar nectaries. However, EFNs were located in the leaf midribs and their conformation was different than peach EFNs. Phenotypic diversity generally results from differences in the genetic organization, regulation and/or expression of underlying developmental programs [4]. In the case of EFNs, such underlying programs have been poorly investigated. The gene CRABS CLAW (CRC), a YABBY transcription factor [37,38], seems to be an earlyfunctioning regulator of the development of both floral and extraf loral nectaries in core eudicots [39,40]. But while CRC, along with several upstream MADS box floral homeotic genes and other unknown regulatory genes, may determine the location of f loral nectaries [39], different transcriptional control networks may be involved in the development of EFNs [40]. This suggests that the program needed for EFN development may be closely associated with that of the EFN-bearing organ [4], the leaf, in the case of peach trees. As a result, the functional characteristics of LMI1, its involvement in leaf morphogenesis as well as in the meristem identity switch leading to flower formation, suggest that this transcription factor might have a pivotal role in the regulation of different characteristics of the leaf. This makes PpLMI1 a most likely candidate for the presence/absence of EFNs in peach.
Therefore, a plausible hypothesis is that functional modification of PpLMI1 associated with the insertion of the HD-Zip I element might trigger endoreduplication. This would result in changes in the cell-wall composition of the lamina as well as in leaf margins, through a developmental program involving target-genes of PpLMI1, leading notably to serrated leaves and the absence of EFN. In addition, changes in cell-wall composition of the leaf blade surface could thus make easier the development of fungi, such as Podosphaera pannosa, on the leaves. Modification of the cell walls at regions undergoing pathogen attack is a common response to infection; the inability to do so, or weakened cell walls, might explain, at least in part, the susceptibility to pathogens [41]. As a result, these changes, along with the absence of the positive effects associated with the presence of domitia-inhabiting mutualist mites, and fungivore mites attracted by EFN nectar, might be responsible for enhanced susceptibility to PPM in eglandular individuals, as compared to those with EFNs. Further studies need however to be undertaken in order to assess our hypotheses.

Conclusion
In this study, we were interested in identifying the genetic factor responsible for the presence/absence of EFNs in peach. In our knowledge, this is the first time that a molecular genetic approach has been undertaken to clarify the genetic basis of this Mendelian trait, in peach and, more broadly in Rosaceae perennial crops. Based on our results, PpMLI1 appears the most likely candidate gene for this character. A comprehensive study of the genomic region including PpMLI1 study did not bring to light another alternative candidate. In addition, its characteristics, regulation functions as a meristem identity regulator as well as its role in leaf morphogenesis, make it highly plausible its involvement in the control of the presence/absence of EFNs as well as its association, at some extent, with the variation of the susceptibility to PPM, in link with cell-wall changes. However, this has to be further validated functionally. Virus induced gene silencing (VIGS) method could be considered as a relevant approach, as genetic transformation in peach is currently an obstacle. In addition, a broader study including the expression of PpMLI1 in the meristem as well as that of genes interacting with PpMLI1 or target genes such as WEE1, may be undertaken to elucidate the molecular interactions underlying this interesting trait. This study is thus a first step. Nevertheless, in the short term and from a breeder perspective, ASPP900 marker already allows differentiating the different phenotypes at the seedling level, and could then be used in peach breeding programs.

Plant material
The initial mapping population consisted of 212 individuals derived from the self-pollination of "Malo Konare" (clone S5392). "Malo Konare" is a canning peach cultivar developed in 1984 at the Fruit-growing Institute in Plovdiv (Bulgaria). It originated from the cross "Stoika" × "New Jersey Cling 97", has globose leaf glands and shows strong resistance to powdery mildew. "Stoika" was derived from "House Kling" and "Ferganskyi Zheltyi" (1973), a clone of Prunus ferganensis. The population was further extended to 833 individuals for the fine mapping of the leaf-gland region and identifying recombinants. This population is referred to as AF5392. In addition, 271 individuals were used to validate phenotype/genotype association in different genetic backgrounds: at first, 149 accessions with contrasting leaf gland phenotypes (Table S1), including 143 peach cultivars from various origins, two accessions of wild species close to peach, Prunus davidiana (Carr.) and Prunus kansuensis (Koehne), one accession of Prunus ferganensis (Kost. & Rjab.), two double haploids and a possible triploid; then a sample of 122 individuals from a complex breeding population, referred to as BC2 [18]. The latter was derived from two successive crosses (F 1 and backcross) including Prunus davidiana clone P1908 and peach cv. "Summergrand", followed by a final cross derived from a mixture of pollen of the back-cross population and "Zephyr" as maternal parent. These 271 individuals were planted in triplicate and grown in three different places: greenhouse and tunnels for the cultivars, orchards and tunnels for the BC2. All the individuals were conserved at the Prunus Biological Resource Center of INRAE in Montfavet, except the two double haploids and the possible triploid that were conserved at the Prunus-Juglans Biological Resource Center, Domaine des Jarres, 33 210 Toulenne.

DNA isolation
Samples of young leaves from each of the individuals were collected in the spring. Genomic DNA was subsequently isolated using the Qiagen DNeasy 96 Kit (https:// www.qiagen.com) according to the manufacturer's instructions. DNA of each sample was at first assessed for quality using a NanoDrop™ ND-1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA USA) and then quantified using Quant-iT™ Picogreen ® reagent (Invitrogen Ltd.2, Paisley UK). Stock solutions of genomic DNAs were then diluted to a final concentration of 40 ng/μl.

Leaf gland phenotyping
Leaf glands were observed over two years, on five to ten leaves from different parts of each of the trees (progenies and cultivars). Individuals were classified under the three phenotypes encountered: reniform, globose and eglandular (no leaf gland observed). Those trees that were planted in triplicate were scored individually. Leaf margins were examined concurrently to EFNs, as an association between the eglandular phenotype and deep leaf serration was previously reported.

Marker development and genotyping
"Malo Konare" has been genotyped earlier in the frame of the European project Fruitbreedomics [19], using the IPSC peach 9 K SNP array v1 [20]. Based on the available SNP dataset, a first set of heterozygous SNPs was selected to develop the genetic map of linkage group 7 of "Malo Konare" and insure a sufficient coverage of the group. Genotyping was done using the PCR-based KASP™ (Kompetitive Allele Specific PCR) method from LGC Biosearch technologies (https://www.biosearchtech.com/). Primertriplets (two competitive allele-specific forward primers and one common reverse primer for each marker) were developed from the 60-bp genomic sequence available on either side of the SNPs (https://www.rosaceae.org/ species/rosaceae_family_genera/IRSC_SNP_array), using Primer3 [43] under the following primer-picking conditions: optimal size of the amplicons 75 bp (min 62 bp, max 85 bp), Tm 65 • C (min 55 • C, max 72 • C), primer size 25 bp (min 20 bp, max 32 bp), max self-complementarity 7, max 3 self-complementarity 3, left primer end 61 bp. Primers triplets were compared with Peach v2.0.a1 [44], using the Basic Local Alignment Search Tool (BLAST ® ) at the Genome Database for Rosaceae (GDR: https://www.ro saceae.org/blast/). Those aligning to single positions were selected for genotyping the starting mapping population (Table S4). In a second step, additional SNP markers focused on the interval encompassing the E locus were developed in order to identify recombinant individuals. This was done using Next-Generation Sequencing (NGS) data derived from "Malo Konare". BAM files were aligned onto Peach v2.0.a1 and visualized with IGV [42]. The region containing E was examined for SNP/indel discovery and reads including heterozygous SNP/indels compatible with the KASP™ method were retrieved. Primertriplets were then developed as above (Table S4). Mix preparation and PCR reactions were performed using the KASP™ genotyping chemistry and conditions.

Genetic map of linkage group 7
In a first stage, linkage group 7 (G7) of "Malo Konare" was constructed using the mapping dataset derived from the SNP-set selected from Micheletti et al. [19]. Genotypic data were coded as F 2 -progeny type according to the JoinMap coding system. The leaf gland trait was similarly coded as a co-dominant Mendelian trait. Linkage analyses were performed using JoinMap 4.1 [45]. The recombination fraction value was set at 0.4 and grouping was performed using the independence logarithm of odds (LOD) calculation function and a minimum LOD score threshold of 3. Recombination frequencies were translated into genetic distances using the Kosambi mapping function [46]. Linkage group 7 was established using regression mapping procedure with three rounds per sample. In a second stage, SNP markers developed for the high-resolution mapping in the interval including the Elocus region were added to the genotypic data file and mapped similarly.

High-resolution mapping of the E locus
The extended population was genotyped using the SNP markers f lanking the E locus in the genetic map. Recombinant individuals in the interval were identified and genotyped with newly developed markers. Individuals identified as recombinants in the new interval were genotyped again with a new marker-set. This process was repeated iteratively until no further recombinant was observed.

In silico analysis of the region encompassing the E locus
The genomic region delimited by the SNP-pairs, which allowed identifying the most informative recombinants, was analyzed for variants. This was done by aligning the sorted.bam files of "Malo Konare", "S10215" and "S10216" onto Peach v2.0.a1 (Ppersica_298_v2.0.fa version), under IGV [42], and by comparing them. Differences observed were then compared with sorted.bam files of "Zephyr", "Summergrand", "Pamirskij 5" and "Rubira" in order to check consistency of differences regarding leaf gland phenotype, in different genetic backgrounds. The genomic region defined above was examined for the presence of predicted genes, using JBrowse on the Genome Database for Rosaceae (https:// www.rosaceae.org/jbrowse/). Positions of the observed differences were compared with those of the genes and their sub-features; then, genomic sequences of the candidate genes (CGs), associated transcripts, predicted protein sequences, homologies and gene functions were downloaded (https://www.rosaceae.org/bio_data/571). NGS reads corresponding to the position of the selected CG-variants were retrieved for "S10215" and "S10216" using IGV [42], imported into CLC Main Workbench version 12 (QIAGEN, Aarhus, Denmark), assembled de novo and compared using MUSCLE [47]. In addition, as a 100-bp gap remained in Peach genome v2.0.a1 in the region immediately upstream of the most likely CG, 25 primers were developed (Table S3) and used for Sanger sequencing of the gap region and the target CG (Genewiz, South Plainfield, NJ, USA). Assembled sequences were then compared and differences between "S10215" and "S10216" identified. Sequences of the selected CG and the gap region, were finally analyzed for comparison and possible changes in the coding sequences, as well as changes in the resulting protein, using FGENESH geneprediction program [48] with different dicot plant species as model (http://www.softberry.com)).

Gene expression analysis
Eight cultivars with contrasted phenotypes and both wild species (Prunus davidiana P1908 and Prunus kansuensis S1429) were selected for expression analysis. Foliar samples were collected from the part of the leaves including leaf glands, or from the region including the base of the leaf blade and the upper part of the petiole for the eglandular individuals. Regarding Prunus kansuensis, two samples were collected in order to make comparisons: one from the margin of the leaf blade where reniform glands were embedded, the other from the base, close to the petiole. Samples were immediately frozen in liquid nitrogen. Total RNA was isolated using the Macherey-Nagel ® NucleoSpin ® RNA Plant kit (Thermo Fischer Scientific, Waltham, MA USA) following manufacturer's instructions. RNA concentration and quality were assessed using a NanoDrop™ ND-1000 spectrophotometer (Thermo Fisher Scientific) and the Agilent 2100 Bioanalyzer System (Agilent, Santa Clara, CA USA). For reverse transcription analysis, primer pairs composed of primers on either side of the second intron of Prupe.7G121100 (Fig. S3) were designed using Primer3 [43] and GenScript ® Tool (Table S5). One microgram of total RNA per sample was then subjected to cDNA synthesis using the AffinityScript RT kit (Agilent) according to the manufacturer's instructions. A SYBR green real-time PCR assay was thereby carried out in a final volume of 15 μl of a reaction mixture containing 7 μl of 2x Brilliant III SYBR ® Green qPCR Master mix (Agilent), 0.5 μM of each primer and 100 ng of cDNA template. Reaction mixtures without cDNA were used as negative controls. Amplification reactions were run in a 96 well plate on a Stratagene Mx3005P (Agilent) under the following conditions: 95 • C for 30 s, followed by 40 cycles of denaturation at 95 • C for 10 s, annealing at 60 • C for 30 s and extension at 72 • C for 15 s. Reactions were performed using four biological and three technical replicates for each sample. Amplification values were then normalized using two genes as constitutive controls, as recommended by Bustin et al. [48]: PpTEF2 (translation elongation factor 2) and PpRPL13 (60S ribosomal protein L13), both having previously been tested and selected for their stability. Two-way analysis of variance (ANOVA) was used to assess the independent effect of the presence of EFNs and that of the insertion on the expression of Prupe.7G121100.
Tests were performed using a script of "RqPCRAnalysis" R-package [49] customized to generate box-plots with R studio [50]. Significance threshold was set to p < 0.01.

3' race PCR
Transcripts of reniform and eglandular accessions were amplified using the Invitrogen 3' Rapid Amplification of cDNA Ends (3' RACE) system (Thermo Fisher Scientific). The 3' RACE procedure was carried out as recommended by the supplier. The first strand cDNA was synthesized using 1 μg of total RNA of each of the individuals and the adapter primer (AP) targeting the poly(A) region of the mRNA. The synthesis reaction was followed by the amplification of the target cDNA in a final volume of 50 μl containing 2 μl (1/10) of the above reaction, 1x reaction buffer, 0.2 mM each dNTP, 1.5 mM MgCl2, 0.2 μM each of the following primers, the antisense abridged universal amplification primer (AUAP) provided in the kit and a custom sense primer developed in the second exon of the gene (Table S6), and 2.5 U of GoTaq ® Hot Start Polymerase (Promega). Amplification reactions were run on an Eppendorf Mastercycler epgradient (Eppendorf AG, Hamburg, Germany) under the following conditions: 94 • C (2 min) followed by 35 cycles at 94 • C (45 sec), 57 • C (45 sec), 72 • C (1.5 min) and a final extension at 72 • C (5 min). PCR products were visualized in a 1.5% agarose gel stained with ethidium bromide. Nested amplification reactions were then carried out as above using 1/5 of the second reaction, the antisense AUAP primer and additional custom sense primers developed downstream of the first sense primer. In addition, as numerous stretches of poly(A) were included in the sequence of the transposon, which interfered in the hybridization of the adapter primer (AP) to the poly(A) region of the mRNA, antisense primers were developed based on each of the predictions derived from the analysis with FGENESH (Table S6). PCRs were carried out as above except for the annealing temperature which was lowered to 55 • C.
Resulting amplicons were then sent for sequencing (Genewiz). Finally, upstream regions were amplified and sequenced to obtain the complete transcripts.

Development of the diagnostic marker ASPP900
One primer-triplet based on the PCR-based KASP™ method (https://www.biosearchtech.com/) was devel-oped in order to differentiate each of the three phenotypes encountered (Table S4). It was composed of one forward primer specific of the glandular phenotype (20 nucleotides in CDS 3 of Prupe.7G121100 starting 15 nucleotides before the insertion position), one forward primer specific of the eglandular phenotype (18 nucleotides astride the 9 last nucleotides of the transposon and the first 9 nucleotides of CDS3 after the insertion), and one common reverse primer (20 nucleotides in CDS3, starting 75 nucleotides downstream of the insertion point). Positions of the primers on the sequence are shown in Fig. S1.