Dissecting the major genetic components underlying cotton lint development

Abstract Numerous genetic loci and several functionally characterized genes have been linked to determination of lint percentage (lint%), one of the most important cotton yield components, but we still know little about the major genetic components underlying lint%. Here, we first linked the genetic loci containing MYB25-like_At and HD1_At to the fiberless seed trait of ‘SL1-7-1’ and found that MYB25-like_At and HD1_At were very lowly expressed in ‘SL1-7-1’ ovules during fiber initiation. We then dissected the genetic components involved in determination of lint% using segregating populations derived from crosses of fuzzless mutants and intermediate segregants with different lint%, which not only confirmed the HD1_At locus but identified the HD1_Dt locus as being the major genetic components contributing to fiber initiation and lint%. The segregating populations also allowed us to evaluate the relative contributions of MYB25-like_At, MYB25-like_Dt, HD1_At, and HD1_Dt to lint%. Haplotype analysis of an Upland cotton (Gossypium hirsutum) population with 723 accessions (including 81 fuzzless seed accessions) showed that lint% of the accessions with the LP allele (higher lint%) at MYB25-like_At, MYB25-like_Dt, or HD1_At was significantly higher than that with the lp allele (lower lint%). The lint% of the Upland cotton accessions with 3 or 4 LP alleles at MYB25-like and HD1 was significantly higher than that with 2 LP alleles. The results prompted us to propose a strategy for breeding high-yielding cotton varieties, i.e. pyramiding the LP alleles of MYB25-like and HD1 with new lint% LP alleles without negative impact on seed size and fiber quality.


Introduction
Cotton seeds produce long (lint) and short (fuzz) types of fibers.Both are single-celled and arise from the epidermal cells of the seed coat.The identity of lint fibers is determined at least 1 day before flowering and lint fiber initials start their outgrowth on the day of flowering (Qin et al. 2022).After ∼25 days of elongation, lint fibers approach their maximum length, which can be as long as ∼35 mm in Upland cotton (Gossypium hirsutum L.), and become mature at ∼55 days-post-anthesis (dpa) (Kim and Triplett 2001).Fuzz fibers initiate at ∼3 dpa and do not elongate to the same extent as the lint fibers and usually have a length of <5 mm (Joshi et al. 1967;Stewart 1975).Upland cotton fuzzless mutants and most Pima, Egyptian and Sea Island cotton (Gossypium barbadense L.) accessions produce seeds without fuzz fibers (Fang et al. 2018).While fiberless (lintless and fuzzless) cotton mutants have been well documented, no mutant bearing only fuzz fibers has ever been reported, thanks to the expression profile of the 2 MYB25-like homeologs during fiber initiation (Zhu et al. 2018).
Producing a high yield and high quality of lint fibers is the main goal of global cotton production despite the fuzz fibers (used for making specific paper and cellulose products) and cotton seeds (used for producing cooking oil and as a cattle feed supplement) also being valuable natural resources.Lint percentage (lint%) is one of the most important fiber yield components and is commonly used as a proxy for lint yield during breeding because of its high heritability.Increasing lint% is thus one of the major strategies adopted by cotton breeders for improving lint yield of cotton varieties.As a result, newly released cotton varieties usually have a higher lint% than obsolete or superseded ones (Ma et al. 2019;Conaty and Constable 2020).Approximately a quarter to a third of the cotton seed epidermal cells differentiate into fiber initials and eventually become lint fibers (Lang 1938;Stewart 1975).Turning more seed epidermal cells into lint fiber initials and/or repressing the development of fuzz fibers are therefore anticipated to increase the yield of lint fibers.To that end, it demands a deeper understanding the genetic and molecular basis underlying differentiation and determination of lint and fuzz fibers.Fiber mutants, including fuzzless and fiberless mutants, are indispensable genetic resource for achieving that goal.
Several genetic loci, including dominant N 1 and N 5 and recessive n 2 , n 3 , and n 4 t , contributing to fuzzless cotton seeds have been documented (Bechere et al. 2012;Fang et al. 2018;Zhu, Stiller, et al. 2021).And fl1 (fibreless1) was proposed by Turley and Kloth (2008) to be the locus contributing to the fiberless phenotype of 'SL1-7-1'.The fuzzless phenotype of the N 1 mutant is caused by loss-of-function of MYB25-like_At (At representing the A-subgenome of the tetraploid cotton) thanks to short interfering RNAs generated from the 3′ portion of the gene due to the presence of natural antisense transcripts (Wan et al. 2016).The gene underlying the n 2 mutation is likely to be the Dt-subgenome MYB25-like_Dt based on genetic mapping of the recessive fuzzless trait of G. barbadense and comparison of the expression profiles of MYB25-like_At and MYB25-like_Dt during fiber initiation in cotton accessions with different fiber phenotypes (Zhu et al. 2018).The recessive n 3 locus was inferred based on observations of its involvement in the full expression of the fuzzless seed phenotype in the homozygous n 2 genetic background (Turley and Kloth 2002) and has recently been proposed to be an allele of N 1 (Chen et al. 2020).Seeds of the n 4 t and N 5 mutants are fuzzless but with a tuft of short fibers attached to the micropyle end (socalled tufted trait) (Bechere et al. 2012;Zhu, Stiller, et al. 2021).
The n 4 t locus has been mapped to an ∼411-kb genetic interval on Chr-D04 but the responsible gene and mutation are yet to be identified (Naoumkina et al. 2021).The candidate mutation underpinning the dominant fuzzless-tufted seed phenotype of the N 5 mutant has been located to an ∼250-kb genomic region on Chr-D13.The interval contains a couple of genes expressing significantly differently between the near-isogenic lines showing the mutant and wild-type seed phenotypes, thereby they were proposed to be the candidate genes (Zhu, Stiller, et al. 2021).Four natural fiberless mutants have been reported, including 'L40' (Musaev and Abzalov 1972), fuzzless-lintless 'MCU-5' (Nadarajan and Rangasamy 1988), 'Xu142fl' (Zhang and Pan 1991), and 'SL1-7-1' (Turley and Kloth 2008).A fifth fiberless mutant ('MD17') was found among the progeny from the cross between the dominant (N 1 ) and the recessive (n 2 ) fuzzless mutants (Turley 2002).Of these fiberless mutants, 'Xu142fl' is the one that has been used in many genetic and molecular studies to explore the mechanism underpinning fiber initiation.Apart from n 2 and n 3 , 'Xu142fl' was proposed to contain a recessive lintless mutation, li 3 (Zhang and Pan 1991).The identity of Li 3 is still controversial because 2 MYB-MIXTA-like (MML) transcription factors (MYB25-like_Dt and GhMML4_Dt) located next to each other on Chr-D12 have been proposed to be the corresponding gene.Wu et al. (2018) proposed GhMML4_Dt being Li 3 based on genetic mapping using segregating populations derived from cross between the n 2 fuzzless mutant and 'Xu142fl' and the presence of a single nucleotide polymorphism (SNP) that induces a stop codon (TAA) in the 3rd exon of GhMML4_Dt in 'Xu142fl'.Using the same genetic materials, a later study mapped the li 3 mutation to the same region as that reported by Wu et al. (2018) but found that the stop codon is also present in GhMML4_Dt of G. hirsutum accessions with normal fibers, and instead, a retrotransposon insertion was found in the 2nd exon of MYB25-like_Dt in 'Xu142fl' so MYB25-like_Dt was proposed to be Li 3 (Chen et al. 2020).'SL1-7-1' was proposed to have 3 mutations, a dominant (N 1 ) and a recessive (n 3 ) mutation related to fuzz development and a recessive (fl 1 ) mutation related to lint development (Turley and Kloth 2008).Given that n 3 was proposed to be allelic to N 1 (Chen et al. 2020), further investigation is required for the mutations underlying the fiberless phenotype of 'SL1-7-1'.
The fuzzless alleles have variable degrees of negative impact on the development of lint fibers, and consequently on lint%.Cotton accessions containing homozygous N 1 have a lint% ranging from 0.7 to 23.6%, lower than homozygous n 2 accessions (24.4%) and normal fuzzy seeded accessions (37.7%) (Turley et al. 2007).The higher the number of fuzzless alleles, the lower the lint% (Turley and Kloth 2008) implies an additive effect of the fuzzless alleles on lint development.Compared to N 1 and n 2 , the fuzzless-tufted alleles n 4 t and N 5 have a weaker negative impact on lint development and there is no significant penalty on lint% and lint yield in the lines containing either mutation.These alleles thus have the potential to be used in breeding fuzzless elite Upland cotton varieties to reduce energy consumption during ginning (Bechere and Auld 2014;Zhu, Stiller, et al. 2021).
Apart from the fuzzless mutations mentioned above, many other genetic loci have been identified to be associated with lint % based on mapping of quantitative trait loci (QTL) and genomewide association studies (GWAS) in G. hirsutum and G. barbadense (Fang et al. 2017;Ma et al. 2018Ma et al. , 2019;;Yu et al. 2021;Chen et al. 2022;Zhao et al. 2022;Li et al. 2023).For instance, a GWAS with 258 Upland cotton accessions identified 71 genetic loci associated with yield traits, including 9 lint% related loci (Fang et al. 2017); and in a GWAS with 336 G. barbadense accessions, a Chr-A05 locus was found to be strongly associated with lint% and downregulating the potential candidate gene (Gbar_A05G014160 or GbLP1) by virus-induced gene silencing decreases lint% (Zhao et al. 2022).While some of the candidate genes identified by GWAS to be associated with lint% could affect the trait by regulating fiber initiation since they have a higher expression level in 0-5 dpa ovules (Ma et al. 2018;Zhao et al. 2022), some might affect lint % through their involvement in fiber elongation and/or development of the lint fiber secondary cell wall (Ma et al. 2019;Yu et al. 2021;Zhao et al. 2022).Furthermore, most genetic loci associated with lint% have pleiotropic negative effects on other agronomic traits (Fang et al. 2017;Li et al. 2023).For instance, 2 high lint% loci on Chr-D03 negatively contribute to fiber fineness or boll number per plant (Li et al. 2023).
GhHD1 is a homeodomain-leucine zipper (HD-Zip) transcription factor (TF) expressed predominantly in epidermal tissues during the fiber initiation stage (Walford et al. 2012;Qin et al. 2022).Silencing GhHD1 delayed the timing of fiber initiation and overexpressing GhHD1 increased the number of fiber initials on the seed surface (Walford et al. 2012).The results of a recent single-cell transcriptome study suggest that GhHD1 works in concert with MYB25-like and other TFs in regulating fiber differentiation, initiation, and rapid elongation during the fiber initiation and early developmental stage (Qin et al. 2022).A retrotransposon insertion in the 9th exon of HD1_At in G. barbadense results in loss-of-function of the gene and the glabrous stem phenotype (Ding et al. 2015;Niu et al. 2019), although the link between this mutation and the low lint% of G. barbadense is yet to be established.But in Gossypium arboreum, a cultivated diploid cotton species, the fiberless seeds and glabrous stem observed in the SMA-4 mutant (Beasley and Egli 1977) has been suggested to be linked to an alternative splicing mutation in GaHD1 (Ding et al. 2020).
To determine the major genetic components regulating lint fiber initiation and lint% as a guide to breeding higher yielding cotton varieties, in this study, we mapped the genetic loci responsible for the fiberless seed trait of 'SL1-7-1' and dissected, in a stepwise procedure, the genetic loci associated with fiberless seeds and lint% using segregating populations derived from backcrosses or cross between sibling segregants with extremely different lint%.We found that the genetic regions containing MYB25-like_At and HD1_At are associated with the fiberless trait of 'SL1-7-1' and that the genetic region harboring HD1_Dt is also associated with lint%.The relative contributions of the MYB25-like_At, MYB25-like_Dt, HD1_At, and HD1_Dt loci to lint% was further evaluated in different biparental segregating populations and a diversity panel including both G. hirsutum and G. barbadense cotton accessions.Finally, we propose genetic solutions for lint% improvement.

Plant materials
Two sets of cotton materials were used, 1 for biparental crosses to analyze lint% segregation (done in Australia), another (a diversity panel) for association and haplotype analyses (done in China).
The diversity panel included 723 G. hirsutum accessions (including 642 fuzzy seeded accessions and 81 fuzzless seed ones) and 326 G. barbadense accessions.These cotton accessions were selected, based on the rationale of maximizing genetic diversity, from the cotton germplasm collection at the National Midterm Genetic Bank of Cotton at the Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, China.

Growth conditions
In Canberra, Australia, cotton plants were grown in glasshouses at 28 ± 2°C with natural lighting.Each plant was grown in a pot with a diameter of 23 cm.For each segregating population, the parental lines were grown alongside the segregants.
In China, the G. hirsutum accessions were grown in the field of the Experimental Station of the Institute of Cotton Research of the Chinese Academy of Agricultural Sciences in 2017 and 2018.The G. barbadense accessions were grown in the field in Sanya, Hainan in 2014 and in Aksu, Xinjiang in 2015 and 2016.All experiments were conducted with a randomized design with 3 replicates.Each replicate consisted of 3 8-m rows with a row space of 80 cm.Cotton field management was carried out by following the local requirements for commercial cotton production.

Phenotyping
For the segregating populations generated by biparental crosses, the phenotypes of interest were fiberless seeds and lint%.Seed cotton harvested from the segregants of each segregating population was first observed for seed fiber phenotype to separate them into 2 groups-fiberless and fiber-bearing.Fiber-bearing seed cottons were hand-ginned and lint% was calculated using the formula: Lint% = weight of harvested lint weight of unginned seed cotton × 100.
For the diversity panel used in association and haplotype analyses, based on "The Standardised Data Collection and Description Protocol for Evaluation of Cotton Germplasm" of China, 30 open bolls per plot were collected from the middle fruiting branches of cotton plants and ginned by a roller ginning machine after drying to calculate lint% using the aforementioned formula.BLUPs (best linear unbiased predictors) of lint% were estimated using LME4 of the R package and used in comparison.

DNA sample preparation, bulk segregant analysis, and genotyping
For the plants used in segregation analysis in Australia, DNA extraction and KASP (Kompetitive Allele-Specific PCR) genotyping were performed as described previously (Zhu et al. 2016).For each bulk segregant analysis (BSA; Zhu et al. 2017), 2 DNA pools were prepared using the segregants of the corresponding segregating population with the lowest (or fiberless) and highest lint% (Supplementary File 1).DNA of the segregants was individually extracted and an equal amount of DNA from each segregant was taken and pooled for whole-genome shot-gun sequencing through Azenta (Indianapolis, USA).Sequencing was done using the Illumina HiSeq2500 in the format of 150-bp paired-end reads with a depth of ∼25× coverage.Read mapping, single nucleotide polymorphism (SNP) call and SNP frequency distribution plotting were done by following the approaches reported previously (Zhu et al. 2018).The in-house genome of 'Sicala V-2', generated by mapping 'Sicala V-2' reads to the 'TM-1' reference genome (Wang et al. 2019), was used as the reference for BSA of the SS F 2 population, and the in-house genome of 'T586', generated by mapping 'T586' reads to the same 'TM-1' reference genome, was used as the reference for BSA of ST F 2 , TPT BC 1 F 2 , and TPP SBC 1 F 2 (Fig. 1).The genomic region(s) showed the most divergent SNP frequency in the 2 pools and with a frequency > 0.8 in 1 pool and ∼0.5 in another pool were considered as the candidate region(s) associated with lint%.The SNP frequency distribution plots of SS F 2 and ST F 2 were generated using a 1-Mbp sliding window and the plots of TPT BC 1 F 2 and TPP SBC 1 F 2 were generated using a 500-kb sliding window.KASP was done on the ViiA7 Real-Time PCR system (Life Technologies, California, USA) using the KASP master mix sourced from LGC Group (Middlesex, UK).All individuals of each segregating population were used in KASP genotyping.The KASP markers used to narrow down the genomic region containing the causative locus associated with lint% were designed for the region defined by BSA.The sequences of these KASP markers together with those used in defining the genotype of the MYB25-like_At, MYB25-like_Dt, HD1_At, and HD1_Dt loci in different segregating populations are shown in Supplementary Table 1.
For the diversity panel grown in China, CTAB was used in isolating DNA from cotton seedlings germinated under growth cabinet conditions (18 hours light/6 hours dark, T m 26°C, and relative humidity 65%).Genotyping of the diversity panel was done based on whole-genome resequencing data (∼10× coverage) generated by Illumina sequencing as previously described (He et al. 2021;Wang et al. 2022).The fuzzy seeded G. hirsutum and the G. barbadense accessions have been reported previously (He et al. 2021;Wang et al. 2022) and the fuzzless G. hirsutum accessions were used for the first time in this study.After quality control, the resequencing data of each accession were aligned to the 'TM-1' reference genome (Yang et al. 2019) with SNP calling using the Unified Genotyper module of the Genome Analysis Toolkit (V3.1, Mckenna et al. 2010) based on the default parameters.The SNPs that met the following criteria were kept for further use: minor allele frequency ≥ 0.05, coverage depth ≥ 3 and ≤ 50, quality score ≥ 20, and max-missing rate ≤ 20%.

Haplotype analysis of MYB25-like and HD1
Our focus was on the 4 loci containing MYB25-like_At, MYB25-like_Dt, HD1_At, or HD1_Dt, so for each locus, the SNPs Major genes for cotton lint% | 3 that were retained based on the criteria aforementioned and are within each of the annotated gene and its 2-kb flanking region were used in haplotype analysis.The SNPs were used to construct the NJ phylogenetic tree of the cotton accessions by using Archaeopteryx in TASSEL 5.2.51 (Bradbury et al. 2007) and then the accessions were separated into 3 groups based on their haplotypes, i.e.LP (associated with higher lint%), lp (associated with lower lint%), and heterozygous.The statistical significance for the association of the LP and lp haplotypes with lint% was carried out by the Mann-Whitney test in Graphpad Prism 8 (version 8.4.3; https://www.graphpad.com/).The haplotype heatmaps were ordered according to the local genotype clustering by Fig Tree (version 1.4.4;http://tree.bio.ed.ac.uk/software/figtree).

Analysis of the effect of allelic/haplotype combination on lint percentage
To investigate the effect of different combinations of haplotypes of the 4 genes (MYB25-like_At, MYB25-like_Dt, HD1_At, and HD1_Dt) on lint% in G. hirsutum, we separated the 723 G. hirsutum accessions into fuzzless and fuzzy seeded 2 groups and then the fuzzy seeded accessions were further grouped based on the number of LP haplotypes at the 4 loci and compared their mean lint% values (BLUPs).

RNA sample preparation and gene expression analysis
Whole ovules were collected from 'Pima S-7', 'TM-1', 'T586', and 'SL1-7-1' at −1, 0, 1, 3, 5, and 7 dpa and used in RNA extraction.Total RNA was isolated using the Maxwell RSC Plant RNA Kit (Promega, Madison, USA) by following the manufacture's instruction.Quantitative real-time PCR (qRT-PCR) was carried out as previously described (Zhu et al. 2018) on the ViiA7 Real-Time PCR System (Life Technologies, California, USA) using the FastStart Universal SYBR Green Master Mix (ROX) (Roche, Basel, Switzerland).The expression levels were determined based on 3 biological replicates each with 3 technical replicates.The cotton ubiquitin gene (GenBank accession no.EU604080) was used as the reference gene for calculation of the relative expression level of individual genes in each sample based on the formula of −2 ΔCt .Because primers specific to MYB25-like_At could not be confidently designed, the total expression level of MYB25-like_At and MYB25-like_Dt and MYB25-like_Dt alone was measured.Primers used in qRT-PCR are provided in Supplementary Table 1.All primer pairs had a similar PCR efficiency (89.9-99.2%)determined by LinRegPCR (http://www.hartfaalcentrum.nl/index.php?main= files&sub=LinRegPCR).
None of the SS F 2 segregants was completely fiberless and the lint% of the 79 fuzzless segregants ranged from 1.09 to 40.93%.

HD1_At of 'Pima S-7' contributes to low lint percentage
Our previous investigation indicated that, in addition to dysfunctional MYB25-like_At and MYB25-like_Dt, other yet to be identified dysfunctional genetic component(s) also contribute to the fiberless phenotype observed in the progeny derived from 'T586' × 'Pima S-7' (TP; Zhu et al. 2018).
Major genes for cotton lint% | 5 (11.16-17.93%)were used in preparing the fiberless and high lint% DNA pool, respectively, for BSA.After mapping the reads to the inhouse 'T586' genome sequence, we found that the region containing HD1_At (Chr-A06) was the best candidate region contributing to low lint% (Fig. 2c; Supplementary Fig. 6).The genomic interval with HD1_At was narrowed down to ∼6.0 Mbp based on association between lint% of the 118 progeny and their genotypes in the interval determined by KASP markers (Supplementary Fig. 7).When the 118 progeny were separated into 3 groups, i.e.HD1_At PS7/PS7 , HD1_At PS7/T586 , and HD1_At T586/T586 , based on their HD1_At genotypes determined using the KASP marker AM294 within HD1_At (Supplementary Fig. 5), the average lint% of HD1_At PS7/PS7 plants (0.24%) was significantly lower than that of other 2 groups and 26 of the 29 HD1_At PS7/PS7 progeny were fiberless (Table 2), indicating that the HD1_At locus of 'Pima S-7' contributes to defective fiber initiation and low lint%.

Association of HD1_Dt with lint percentage
TP F 2 -90 (with homozygous MYB25-like_At T586/T586 that is dysfunctional according to Wan et al. (2016)) was also backcrossed to 'Pima S-7' (with dysfunctional MYB25-like_Dt and HD1_At according to our previous results (Zhu et al. 2018) and the results presented above) to generate the TPP BC 1 F 2 population with 126 segregants.

HD1_At has a larger effect on lint percentage than HD1_Dt
We first evaluated the impact of HD1_At of 'T586' and 'Pima S-7' on lint development in 2 F 3 populations derived from TP F 2 -22 and TP F 2 -55 and segregating for HD1_At.While the lint% of the TP F 2 -22 progeny was much lower than that of the TP F 2 -55 progeny (Table 4), thanks to dysfunctional MYB25-like_At of TP F 2 -22 (Supplementary Table 2), the average lint% of the 3 types of segregants with different HD1_At allele was HD1_At T586/T586 > HD1_At T586/PS7 > HD1_At PS7/PS7 in both populations (Table 4), indicating the negative effect of HD1_At PS7/PS7 on lint% being stronger than that of HD1_At T586/T586 in different genetic background (Supplementary Table 2).
We then assessed the relative effect of HD1_At and HD1_Dt on lint development using 2 F 3 populations derived from TP F 2 -65 and TP F 2 -88 and segregating for both loci.TP F 2 -65 and TP F 2 -88 had the same genotype at MYB25-like_Dt (MYB25-like_Dt T586/T586 ) but differing at MYB25-like_At (Supplementary Table 2).Because of the dysfunctional MYB25-like_At in TP F 2 -65, the overall lint% of the TP F 2 -65 population was much lower than that of the TP F 2 -88 population.As in the F 3 populations from TP F 2 -22 and TP F 2 -55 mentioned above, the negative impact of the 'Pima S-7' HD1_At allele on lint development was stronger than that of the 'T586' HD1_At allele, particularly in TP F 2 -65 (Table 4).In the TP F 2 -65 population, when HD1_At was homozygous for the 'T586' allele, the HD1_Dt allele, whether it was from 'T586' or 'Pima S-7', had little effect on lint%; however, when HD1_At was heterozygous (HD1_At T586/PS7 ), the plants with homozygous 'Pima S-7' HD1_Dt (i.e.HD1_Dt PS7/PS7 ) had a significantly higher lint% than the plants with HD1_Dt T586/T586 or HD1_Dt T586/PS7 ; when HD1_At was homozygous for the 'Pima S-7' allele (i.e.HD1_At PS7/PS7 ), both HD1_Dt PS7/PS7 and HD1_Dt T586/PS7 plants had a significantly higher lint% than the HD1_Dt T586/T586 plants (Table 4).These results confirmed the negative impact of the 'T586' HD1_Dt allele on lint development and suggest that its impact is genetic background dependent.In the TP F 2 -88 population, thanks to the presence of a functional MYB25-like (i.e.MYB25-like_At PS7/PS7 and MYB25-like_Dt T586/T586 ), the impact of the HD1_Dt allele of 'T586' and 'Pima S-7' on lint development was hardly distinguishable, although the lint% of the plants with HD1_At PS7/PS7 tended to be lower than that of the plants with HD1_At T586/T586 or HD1_At T586/PS7 (Table 4).Together, these results indicate that HD1_At has a larger impact on lint development than HD1_Dt.
Regarding individual SNPs, we found a nonsynonymous SNP (A vs T at A12-91193018) located in the 3rd exon of MYB25-like_At being significantly associated with lint% (P = 2.72 × 10 −25 ), with the TT allele being the LP allele and present in most (84.76%) of the G. hirsutum accessions, particularly among the fuzzy seeded accessions (Fig. 4a; Table 6).The SNP converts the 105th amino acid (within the 2nd MYB-DNA binding domain) of MYB25-like_At from lysine (AAG) to methionine (AUG), which may affect the DNA binding ability of MYB25-like_At and hence its function.
When the 723 G. hirsutum accessions were separated into fuzzless (81) and fuzzy (642) groups, for each of the 4 genes; first, the average lint% of the fuzzy seed accessions was higher than that of the fuzzless accessions regardless of their haplotypes (Table 6), suggesting a negative impact of the fuzzless gene(s) on lint%, likely due to a pleiotropic effect of the fuzzless gene(s) on Table 2. Distribution of the TPT BC 1 F 2 [('T586' × 'Pima S-7') × 'T586'] segregants with homozygous 'T586' alleles at both MYB25-like_At and MYB25-like_Dt loci in different ranges of lint percentage.lint development; second, the difference of the average lint% between the LP and lp haplotype in the fuzzless group was much larger than that in the fuzzy group (Table 6), implying that the negative impact of the fuzzless gene(s) on lint% can be partially compensated for by the presence of the LP haplotype of each gene.Among the fuzzy seeded accessions, the number (and proportion) of the accessions with the LP haplotype was more (and higher) than that with the lp allele/haplotype at the MYB25-like_Dt and HD1_At loci, but less (and lower) at the MYB25-like_At and HD1_Dt loci.However, at the MYB25-like_At locus, when the accessions were grouped based on the A12-91193018 SNP, most fuzzy seeded accessions but not fuzzless seed accessions had the LP allele (TT) (Table 6).Among the fuzzless accessions, at the MYB25-like_At locus, there were many more lp accessions than LP ones; while at the HD1_Dt locus, the number of lp accessions was only slightly more than that of LP accessions; in contrast, slightly more LP accessions than lp accessions were observed at the MYB25-like_Dt locus; and an equal number of LP and lp accessions were observed at the HD1_At locus (Table 6).These results also showed a strong correlation between the fuzzless seed trait and low lint% at the MYB25-like_At locus and a decreasing correlation in the order of MYB25-like_Dt, HD1_At, and HD1_Dt, in line with MYB25-like being important for the development of both fuzz and lint fibers (Zhu et al. 2018).Nevertheless, the lp allele (AA) at A12-91193018 itself did not seem to contribute to the fuzzless seed trait because ∼2% of the fuzzy accessions contained this allele (Table 6).

Genotype at
The same approach was applied to a G. barbadense population with 326 accessions (Supplementary Table 5) for identification of haplotypes at the 4 genetic loci.At the HD1_At locus, the 326 accessions could be grouped into 3 haplotypes (LP, hetero, and lp), with majority of the accessions (73.0%) containing the lp haplotype; however, the average lint% of the 3 haplotypes was not significantly different (Supplementary Fig. 10).Given that the loss-of-function of HD1_At in G. barbadense caused by a retrotransposon insertion is associated with hairless stems (Ding et al. 2015;Niu et al. 2019), it will be of interest to know the association between the haplotypes and the density of their stem trichomes.
Only a single SNP and a few SNPs without linkage relationships were found at the MYB25-like_At and HD1_Dt loci, respectively, so no haplotype could be defined.At the MYB25-like_Dt locus, no SNP was found among the 326 accessions.These results indicate that the G. barbadense population used here has a relatively uniform genetic make-up in 3 of the 4 genetic loci and imply that these loci may have been fixed by natural and artificial selection because of their importance in the determination of lint development.

The effect of allelic combinations on lint percentage
Given that nearly all commercial Upland cotton varieties have the fuzzy seed trait, we further explored the effect of the favorable haplotype (i.e. the LP allele) of the 4 loci on lint% using the fuzzy seeded G. hirsutum accessions (Supplementary Table 4).Of the 641 accessions that could be used in the analysis, 29 (4.52%) lacked the LP allele (0 LP) at all 4 loci, and 15.76%, 46.02%, 28.71%, and 4.99% of the accessions carried 1, 2, 3, and 4 LP alleles, respectively.The proportion of the accessions with 0, 1, and 4 LP alleles was lower than the expected (6.25%, 25%, and 6.25%, respectively), while that with 2 and 3 LP alleles was higher than the expected (37.5 and 25%, respectively), indicating that the LP alleles of the 4 loci, with different combinations of 2 or 3 but not all 4, have been preferentially retained through breeding practices.The mean lint% of the accessions with 4, 3, 2, 1, and 0 LP allele were 33.38%, 32.69%, 31.92%,32.34%, and 31.27%,respectively.The mean lint% of the accessions with 3-4 LP alleles was significantly higher than that with 2 LP alleles (Fig. 5), indicating that combining more LP alleles at the MYB25-like and HD1 loci tends to increase lint%.Similarly, the pyramiding effect of LP alleles on lint% was reported for 2 other candidate genes associated with lint% (Chen et al. 2022).
Nevertheless, we also noticed significant variation of lint% within each group that contained 1-4 LP alleles, and a high lint% was observed even in some fuzzless seed accessions, such as sGuokang36 (41.6%;Supplementary Table 4), implying that, while MYB25-like and HD1 are the major genes in the determination of Table 3. Distribution of the TPP a SBC 1 F 2 segregants with the genetic background of MYB25-like_At T586/T586 , MYB25-like_Dt PS7/PS7 , and HD1_At PS7/PS7 in different ranges of lint percentage.lint%, development of lint fibers is regulated by complicated gene networks, which involve many other genetic loci that act independently or interactively with MYB25-like and HD1.

Discussion
Improving lint yield is the final goal of every cotton breeding program.Cotton lint yield is determined by seed cotton yield and lint %.Three strategies can be adopted to achieve the goal of high lint yield-enhancing seed cotton yield without changing lint%, increasing lint% without changing seed cotton output, and improving both seed cotton yield and lint%.Seed cotton yield is often affected by environmental conditions, while lint% is mainly genetically determined and less affected by environmental conditions compared to seed cotton yield (Zhu, Hou, et al. 2021).
Increasing lint% has thus been the main strategy adopted by cotton breeders for improving overall cotton lint yield.As a result, lint % of modern commercial cotton varieties is much higher than those of the older obsolete varieties (Ma et al. 2019;Conaty and Constable 2020), however this comes with a tradeoff in the size and nutritional composition of the cotton seeds (that tend to get smaller), causing negative effect on seed germination, seedling vigor, and on the downstream industries using cotton seeds as raw materials (Maeda et al. 2023).
Given the importance of lint% in developing high-yielding elite commercial cotton varieties, numerous studies, including genetic mapping using segregating populations derived from biparental crosses and GWAS based on natural populations or germplasm collections, have identified hundreds of lint% QTLs (Niu et al. 2022).Individually, most lint% QTLs have a small genetic effect and are population dependent, so are rarely applicable in different breeding programs.The GWAS-based QTLs may be more breederfriendly than the QTLs identified based on specific biparental crosses, but they may also be only relevant to the accessions used in the analysis, making them less likely to be useful in breeding programs based on different germplasm.Identifying major genetic loci responsible for lint% and knowing how they were shaped to regulate lint% during cotton evolution and by historical breeding practices will provide practical genetic solutions for improving lint%, and consequently lint yield of cotton, on the basis of without compromising the development of cotton seeds.
Fuzzless and fiberless mutants provide a unique opportunity to study the genetic mechanisms underlying lint initiation and to shed insight on enhancing lint development and improving lint %, because the number of lint fiber initials is one of the major determinants of lint%.The fuzzless seed phenotype of the dominant N 1 mutant is because of the loss-of-function of MYB25-like_At (Wan et al. 2016), and that of the recessive n 2 mutant is likely to be caused by a dysfunctional MYB25-like_Dt (Zhu et al. 2018).Significant variation in lint% (0.7-23.6%) has been observed in different fuzzless seed accessions (Turley et al. 2007), and fiberless segregants were found among the progeny derived from crosses between the N 1 and n 2 mutants (Turley and Kloth 2002;Zhu et al. 2018;this study), suggesting that MYB25-like_At alone, MYB25-like_Dt alone, and interactions between these 2 homeologs, or their interaction with other gene(s) play an important role in the determination of initiation of lint fibers and lint%.In this study, the fiberless seed phenotype of 'SL1-7-1' was linked to dysfunctional MYB25-like_At and HD1_At (Figs. 2, a and b, and 3).Further investigation using a population derived from the cross between lines with both MYB25-like_At and MYB25-like_Dt mutated but with different lint% also linked the HD1_At locus to initiation of lint fibers and lint% (Fig. 2c).Using a population derived from the cross between lines with dysfunctional MYB25-like_At, MYB25-like_Dt, and HD1_At, but still showing different lint% further identified the HD1_Dt locus to be another genetic component associated with initiation of lint fiber and lint% (Fig. 2d).Comprehensive analyses of the genetic effect of single and different combinations of the 4 loci in different genetic backgrounds and the G. hirsutum diversity panel revealed that while the negative impact of dysfunctional MYB25-like_Dt, HD1_At, and HD1_Dt on lint% is often masked by the presence of a functional MYB25-like_At, it is clear that both homeologs of MYB25-like and HD1, especially MYB25-like_At and HD1_At, are the major genes determining initiation of lint fiber and consequently lint%.In line with this observation, at the key time points of fiber initiation (from −1 to 0 dpa), MYB25-like_At and HD1_At are the 2 most important transcription factors (TFs) at −1 dpa, and HD1_At and MYB25-like_Dt are the 2 most important TFs at −0.5 dpa involved in regulating fiber initiation.At 0 dpa, while MYB25-like_Dt and HD1_At still play a key role in regulating the gene networks underlying fiber initiation, HD1_Dt also becomes one of the key TFs of the networks (Qin et al. 2022).
MYB25-like genes also regulate initiation and development of fuzz fibers (Wan et al. 2016;Zhu et al. 2018).According to the SNP-based genotype information, 723 G. hirsutum accessions could be separated into 3 alleles (LP, hetero, and lp) at MYB25-like_At and MYB25-like_Dt (Figs. 4a and 4b).Both LP and lp alleles were found in accessions with fuzzy or fuzzless seeds, suggesting that the lp alleles themselves are not the causes of the fuzzless seed phenotype.In other words, the sequence variants of the lp haplotype are only associated with development of lint fibers but not fuzz fibers.Among the fuzzy seeded accessions, more contain the LP allele at MYB25-like_Dt and HD1_At, and at MYB25-like_At, the LP allele at A12-91193018, a SNP identified to be associated with higher lint%, was the predominant one (Table 6); meanwhile the lint% of the accessions containing 3-4 LP alleles was significantly higher than those with 2 LP alleles (Fig. 5).These observations suggest that the LP alleles of MYB25-like and HD1 have experienced positive selection in breeding practices though further testing is required for confirmation, and that pyramiding the LP alleles of the 4 loci would be the basis for further improving lint%.
None of previous QTL mapping and GWAS studies had identified MYB25-like or HD1 as being associated with lint% (Niu et al. 2022).One reason could be that these loci have been almost completely fixed due to their importance in determining initiation of lint fiber and lint%, as indicated by the observation that most fuzzy seeded G. hirsutum accessions contained the LP alleles/haplotypes at the 4 loci (Table 6, Supplementary Table 4) and no different haplotypes could be identified at 3 of the 4 loci in the G. barbadense accessions analyzed.Alternatively, the functionality of some loci may be repressed because of epistasis.In this study, a SNP in MYB25-like_At was found to be significantly associated with  lint% thanks to the inclusion of fuzzless/fiberless accessions, with 68% of them containing the lp allele (11% with LP and 21% being heterozygous) and 33% of them having a lint% ≤ 20% (the lowest lint% of the fuzzy seeded accessions is 20.9%;Supplementary Table 4).Therefore, diversifying the genetic background and phenotype of the materials used is crucial for GWAS.These 2 factors also determine whether the genetic loci with relatively weak impact can be uncovered, particularly for those with epistatic relationships.For instance, we demonstrated HD1_Dt as one of the major loci in the determination of lint%, largely due to using of an innovative strategy in creating the segregating population, i.e. the 2 parents used had dysfunctional alleles in all known major loci affecting lint% but still showed distinct phenotypes.This strategy is generally applicable for dissecting the role of genes with redundant and/or additive functionality.
Given the observation of MYB25-like and HD1 being the major genetic elements responsible for lint%, the straightforward approach for improving lint% seems to be to enhance their expression levels.Indeed, overexpressing HD1_At increases the number of fiber initials (Walford et al. 2012).But field experiments using transgenic cotton overexpressing HD1_At and MYB25-like individually or in combination showed that, generally, enhancing the expression of the gene(s) had little effect on lint% and lint yield, despite occasional observation of a few lines with higher lint% (Liu et al. 2020).One reason could be that while overexpressing HD1_At increases the number of fiber initials, they might not fully develop into lint fibers since increased numbers of short fibers were observed on the ginned seeds from the transgenic cotton overexpressing HD1_At (Walford et al. 2012).Simply enhancing the overall expression levels of MYB25-like and HD1, even when using an ovule-specific promoter (FBP-7), was unable to achieve what had been expected (Liu et al. 2020), implying that the expression levels of the genes involved in the biological processes following fiber initiation have to be tempo-spatially synchronized and coordinated with the increased expression of MYB25-like and HD1 to ensure proper development of all fiber initials into lint fibers.Many genes have been individually demonstrated to be important for fiber elongation, cellulose synthesis, and secondary cell deposition, such as homeobox-leucine zipper gene GhHOX3 (Shan et al. 2014), alanine-rich-protein gene GhAlaRP (Zhu, Li, et al. 2021), GhTCP4 (TCP for Teosinte branched 1, Cycloidea, PCF1; Cao et al. 2020), and sucrose synthase gene GhSusA1 (Jiang et al. 2012), but how they are regulating fiber development coordinately with MYB25-like and HD1 and others, such as phytohormone related genes, is yet to be explored using systems biology approaches.
Given that the number of the LP alleles of MYB25-like and HD1 seems to be positively correlated with lint% (Fig. 5), from a breeder's perspective, pyramiding the LP alleles of MYB25-like and HD1 would be the first choice for improving lint%.Indeed, significant room still exists in terms of pyramiding the LP alleles of the 4 loci.For instance, among the fuzzy seeded G. hirsutum accessions used in this study, at the MYB25-like_At locus, while the majority already have the LP allele at A12-91193018, about half of the accessions still have the lp haplotype; at the other 3 loci, a quarter to half of the accessions still possess the lp allele (Table 6); and the proportion of the G. hirsutum accessions containing 4 LP alleles (4.99%) is slightly lower than the expected 6.25% (Fig. 5; Supplementary Table 4).The next step would be to check for the presence of the reported lint% QTLs using molecular markers and to investigate new lint% QTLs in the breeder's germplasm based on genetic mapping and/or GWAS.The known (if present) and the newly identified QTLs could then be further pyramided with the LP alleles of MYB25-like and HD1.Nevertheless, choosing the QTLs to be used in further pyramiding demands careful consideration so as to minimize or avoid any possible negative impact on other traits because almost all lint% QTLs characterized so far have pleiotropic negative effects on traits like seed size and fiber quality (Fang et al. 2017;Ma et al. 2019;Li et al. 2023).A first step will be to identify lint% QTLs without negative effects on other traits by evaluating all of those reported to date or by using new germplasm and populations to identify new ones.Because many of the reported lint% QTLs have pleiotropic negative effects, the lint% QTLs that contain genes, such as Gh_D02G0025 (Ma et al. 2018) and Gh_D07G0463 and Gh_D01G0162 (Chen et al. 2022) with a potential role in enhancing fiber initiation and rapid elongation, might be an initial choice for pyramiding because they are less likely to have negative impact on seed development and fiber quality.Major genes for cotton lint% | 11 Loss-of-function of MYB25-like contributes to low lint% (Turley et al. 2007;Walford et al. 2011;Wan et al. 2016;Zhu et al. 2018), but the nonsynonymous A to T change in the 2nd MYB-DNA binding domain of MYB25-like_At seems to play a positive role in increasing lint% (Table 6).Incorporating the favorable TT allele in the background of the dominant fuzzless mutation N 5 (containing the AA allele of MYB25-like_At) that was reported to have a weaker impact on lint% (Zhu, Stiller, et al. 2021) would combine the fuzzless seed trait with high lint% to mitigate the negative impact of high lint% on seed development because the resources normally devoted to fuzz development would presumably be redirected for seed and/or lint development.The cutting-edge base editing technology would be the ideal approach for achieving this goal without extensive breeding (Qin et al. 2020), although the bottleneck of genotype-dependent transformation in cotton needs to be solved.
In summary, this study uncovered the homeologs of both MYB25-like and HD1 as the major genetic components responsible for fiber initiation and lint% using fiber mutants and innovative methodology in the generation of segregating populations for dissecting the components in a step-by-step manner.The importance of MYB25-like and HD1 in the determination of lint% was demonstrated using an Upland cotton population, in which the lint% of the accessions with ≥1 LP allele at MYB25-like and HD1 was higher than those without any LP allele, especially those containing >2 LP alleles.Strategies for breeding high-yielding cotton varieties by improving lint% without compromising other important agronomic traits are proposed.

Fig. 4 .
Fig. 4. Haplotypes of the MYB25-like_At, MYB25-like_Dt, HD1_At, and HD1_Dt loci (including 2-kb flanking region) and their association with lint percentage in the 723 G. hirsutum accessions.a) MYB25-like_At.b) MYB25-like_Dt.c) HD1_At.d) HD1_Dt.At each locus, the left panel shows the SNPs in the annotated gene with the haplotype of LP (higher lint percentage) and lp (lower lint percentage) shown below.Dark blue rectangles represent exons.The purple arrow indicates transcription orientation of the gene.On the right, the top is a haplotype heatmap with the X-axis representing individual accessions grouped based on the SNPs of the locus distributing across the Y-axis; the dot plot shows the association of the LP and lp haplotypes with lint percentage (each dot representing an accession).In each dot plot, the left and right vertical red lines represent quartiles and the middle one median.The significance of association between lint percentage and the LP or lp haplotype is indicated by the P-value (based on Mann-Whitney test).

aFig. 5 .
Fig. 5. Violin plot shows the effect of the number of LP alleles at the 4 loci on lint percentage of the fuzzy seeded G. hirsutum accessions.The mean lint percentage and the number of accessions of each group are shown below the violin image, with those being significantly different (P < 0.05; based on Mann-Whitney test) indicated by the P-value.The dot lines represent quartiles (the top and bottom ones) and medians (the middle ones).Each gray dot represents an accession.

Table 4 .
The effect of the HD1_At and HD1_Dt loci on lint percentage.
a Different letters indicate statistical significance at P < 0.05 based on 2-tailed Student's t-test.

Table 5 .
The effect of the individual MYB25-like and HD1 locus on lint percentage.

Table 6 .
The mean lint percentage (%) of the 3 haplotypes of each gene in accessions with fuzzless or fuzzy seeds.