Multiple QTL Mapping in Autopolyploids: A Random-Effect Model Approach with Application in a Hexaploid Sweetpotato Full-Sib Population

Genetic analysis in autopolyploids is a very complicated subject due to the enormous number of genotypes at a locus that needs to be considered. For instance, the number of...

like diploids, existing approaches can be readily applied. However, despite many successful studies in diploids and allopolyploids, QTL mapping in autopolyploids remains difficult. In fact, unlike diploid mapping populations, which can have two to four segregating QTL genotypes (in case of inbred or outbred species, respectively), autopolyploid mapping populations can have a much wider range of possible genotypes per locus. For example, there are up to 36, 400, or 4900 possible genotypes from crosses between two tetra-, hexa-, or octoploid outbred parents, respectively.
Single-dose markers, segregating in a 1:1, 3:1, or 1:2:1 fashion, have limited information for building integrated genetic maps in autopolyploids, and generally result in either separate parental maps (e.g., Shirasawa et al. 2017) or limited map integration (e.g., Balsalobre et al. 2017). In order to make use of multiple-dose markers, the first step is to perform dosage or quantitative SNP calling. Although most methods were designed for tetraploid species (e.g., Voorrips et al. 2011;Schmitz Carley et al. 2017), additional studies have produced methods that can analyze data for higher ploidy levels (Serang et al. 2012;Gerard et al. 2018). For building integrated genetic maps in tetraploid species, one can use the well-established TETRAPLOIDSNPMAP ) as well as TETRAORIGIN (Zheng et al. 2016), which also considers multivalent pairing. For higher ploidy species, MAPPOLY (Mollinari and Garcia 2019) is a better option than POLYMAPR (Bourke et al. 2018), because the former has implemented a hidden Markov model (HMM) general enough to analyze higher ploidy levels, whereas the latter is limited to tetra-and hexaploid species, and lacks HMM implementation to robustly map all multiple-dose markers (see Mollinari et al. 2020). With an integrated map, one can calculate the genotype conditional probabilities of putative QTL, ideally using appropriate HMM Mollinari and Garcia 2019). Based on a polyploid model in Kempthorne (1955), a single-QTL model, hereinafter referred to as fixed-effect interval mapping (FEIM), was proposed for autotetraploids (Hackett et al. 2001(Hackett et al. , 2014, and later extended for autohexaploids (van Geest et al. 2017).
For an even ploidy level m, the FEIM model can be written as where y i is the phenotypic value of individual i, m9 is the intercept, a j is the additive effect of allele j, X ij is the conditional probability of allele j in individual i, and e i is the residual error. The constraints a 1 = 0 and a mþ1 ¼ 0 are naturally imposed to satisfy the conditions P m j¼1 X j ¼ m=2 and P 2m j¼mþ1 X j ¼ m=2, so that m9 is a constant that is hard to interpret due to these constraints (Hackett et al. 2001(Hackett et al. , 2014. Note that 2m22 additive allele effects need to be estimated, i.e., tetra-, hexa-, or octoploid models will have 6, 10, or 14 main effects, respectively. In order to test whether the additive allele effects are different from zero (the null hypothesis), likelihood-ratio tests (LRT) are performed along positions on a genetic map. Commonly, the tests are presented as "logarithm of the odds" (LOD scores), where LOD ¼ LRT=½2 3 lnð10Þ. In order to declare a QTL, empirical LOD thresholds are computed for each trait using permutations (Churchill and Doerge 1994). This approach has been used widely so far (e.g., Schumann et al. 2017;van Geest et al. 2017;Massa et al. 2018). However, limitations in fitting multiple-QTL models have been presented, due mostly to the possibility of over-parameterization or the lack of optimized algorithms for model selection (Mengist et al. 2018;Klaassen et al. 2019).
Variance component methods have been used for performing QTL mapping in related individuals of complex population structures or families in humans (Lippert et al. 2014), animals (Druet et al. 2008), and plants (Crepieux et al. 2005). In common, these approaches take into account the flexibility of mixed models in dealing with the correlated QTL genotype effects among individuals due to shared alleles identical-by-descent (IBD) by each relative pair at a particular location in the genome. Since a higher ploidy level leads to a much larger number of allele combinations, genotypic effects may be very hard to assess from the small population sizes usually available. In this case, the integrated genetic map provides key information on the inheritance of sets of chromosomal segments from parents to progeny (Mollinari and Garcia 2019), making up the basis for IBD-based additive relationship estimation. If a locus is linked to a region underlying the variation of a trait of interest, more shared alleles IBD for that locus are expected among individuals with similar phenotypic values (Almasy and Blangero 2010). Thus, the key parameters in this model are the variance components attributable to putative QTL, which determine the presence of linkage. Because only one parameter per QTL (the variance component) needs to be estimated, one could try to build a multiple-QTL model for polyploids, inspired by the corresponding multiple interval mapping (MIM) for diploid mapping populations , without the risk of model over-parameterization.
A multiple-QTL mapping approach is expected to increase detection power, enable separation of linked QTL, and provide the basis for studying QTL interaction (epistasis). Thus, such a model may benefit several autopolyploid horticultural (e.g., potato, blueberry, kiwifruit, strawberry), ornamental (e.g., rose, chrysanthemum), forage (e.g., alfalfa, guinea grass), and field (e.g., sugarcane) crops. The sweetpotato [Ipomoea batatas (L.) Lam. ð2n ¼ 6x ¼ 90Þ] is a staple food in several developing countries, with a production of 112 million tons worldwide in 2017 (FAO 2019). Particularly, it has attracted growing interest due to its characteristics for food and nutrition security . In addition to carbohydrates, dietary fiber, vitamins, and minerals, orange-fleshed sweetpotatoes provide high levels of b-carotene to fight vitamin A deficiency in vulnerable populations, such as those in sub-Saharan Africa . In order to increase production and meet farmer's and market needs, it is imperative to make molecularassisted selection an effective part of sweetpotato breeding programs. Toward this end, one of the first steps is to characterize the genetic architecture of traits of interest, such as those related to storage root yield and quality, and resistance to biotic and abiotic stresses (Khan et al. 2016). In spite of being considered an "orphan" crop, there have been recent advances in building genome references from its wild diploid relatives , optimizing a genotyping-by-sequencing protocol (GBSpoly) for highthroughput SNP genotyping (Wadl et al. 2018), and building a high-density integrated genetic map . In this paper, we introduce a random-effect multiple interval mapping (REMIM) model for autopolyploids. Using a genome-assisted, GBSpoly-based integrated genetic map from a sweetpotato biparental population, we map QTL for yield-related traits with our open-source software, QTLPOLY.

Full-sib population
A bi-parental mapping population (named BT) comprising 315 individuals was developed by crossing an orange-fleshed American variety, 'Beauregard' (CIP440132), and a nonorangefleshed African landrace, 'Tanzania' (CIP440166), as male and female parents, respectively. The parents show contrasting phenotypes for several traits such as dry matter, b-carotene and sugar content, and susceptibility to biotic (e.g., virus disease) and abiotic (e.g., drought) stresses. 'Beauregard' is known as to have higher yield than 'Tanzania', and the current QTL mapping study will focus on yield components.

Phenotypic analyses
Field trials: In addition to the 315 full-sibs, parents (each replicated twice) and another variety, 'Daga' (CIP199062.1), were used as checks, making up a total of 320 individuals per replication in an 80 3 4 alpha-lattice design. Virus-free planting material derived from tissue culture was obtained from the CIP-Peru Genebank in La Molina. The clones were grown in a screen house in CIP substation San Ramon, and the planting material multiplied under low-disease pressure field conditions in Satipo, where cuttings for the six experiments were obtained. Four experiments were conducted in Ica (14°019 S and 75°449 W, 420 m), with two independent trials over two seasons, and one experiment each was conducted in San Ramon (11°079 S and 75°219 W, 828 m) and Pucallpa (8°239 S and 74°319 W, 154 m). The number of replications were two at Ica and three at San Ramon and Pucallpa. In all trials, 1 m and 0.3 m of inter-and intra-row spacing was used, respectively. In the first season at Ica (from 25 February to 29 June 2016), the plot size was 6 m 2 of 16 plants arranged in four rows (4 plants per row) with one empty row between plots. In the second season at Ica (from 15 November 2016 to 17 March 2017), the plot size was 4.8 m 2 of 16 plants arranged in two rows (8 plants per row) with no empty row between plots. In San Ramon (from 14 May to 15 September 2016) and Pucallpa (from 1 July to 4 November 2016), the plot size was 9 m 2 of 30 plants arranged in three rows (10 plants per row) with no empty row between plots.
Phenotypic data: Eight yield-related phenotypes (see File S1) were collected per plot at harvest, $120 days after transplanting. For analysis purposes, foliage and root yield data were standardized by plot size (relative to the largest) and converted to tons per hectare (t ha 21 ) to allow comparisons across trials. Number of roots was divided by the number of plants in the plot. The total number of storage roots per plant (TNR) and total root yield (RYTHA) considered all storage roots from the whole plot regardless of their individual weight. Number of commercial roots per plant (NOCR) and commercial root yield (CYTHA) considered only storage roots of marketable size ($100 g for African market). Number of noncommercial roots per plant (NONC) and noncommercial root weight (NCYTHA) were obtained from the difference between total and commercial roots. Foliage yield (FYTHA) was measured by weighing all above-ground biomass per plot. Finally, commercial index (CI) was calculated as the ratio between CYTHA and total biomass (i.e., the sum of RYTHA and FYTHA).
Multi-environment phenotypic model: We considered each one of the six field trials as an environment. Jointly adjusted means for each individual were obtained by using the following mixed-effect model where y ijkl is the phenotypic observation of the i th genotype in the j th block within the k th replicate at the l th environment, m is the overall mean, e l is the random effect of the l th environment (l ¼ 1; . . . ; L; L ¼ 6) with e l $ N ð0; s 2 e Þ, r kðlÞ is the random effect of the k th replicate (k ¼ 1; . . . ; K; K ¼ 2 or 3 depending on the environment) at the l th environment with r kðlÞ $ N ð0; s 2 r Þ, b jðklÞ is the random effect of the j th block (j ¼ 1; . . . ; J; J ¼ 80) within the k th replicate at the l th environment with b jðklÞ $ N ð0; s 2 b Þ, g i is the fixed effect of the i th genotype (i ¼ 1; . . . ; I; I ¼ 318), ge il is the random effect of genotype-by-environment (G 3 E) interaction with ge il $ N ð0; s 2 ge Þ, and e ijkl is the random residual error with e ijkl $ N ð0; s 2 l Þ (i.e., environment specific variances). Variance components were estimated by restricted maximum likelihood (REML) using ASREML-R (v4.1; https://www.vsni.co.uk/software/asreml-r).
Mean-basis broad-sense heritabilities (H 2 ) were approximate as the ratio between genotypic and phenotypic variances as where s 2 g is the variance component associated with the g i term from Equation 2 when treated as a random effect, i.e., g i $ N ð0; s 2 g Þ, s 2 e is the variance component associated with the residual error but with a common variance for all environments, i.e., e ijkl $ N ð0; s 2 e Þ, and K ¼ 2:25 is the harmonic mean of the number of replicates across environments.

Genotypic analyses
GBSpoly and dosage calling: A modified GBS protocol called GBSpoly was carried out according to Wadl et al. (2018) and described in detail for the BT population by Mollinari et al. (2020). In brief, total DNA was extracted and double restricted using a CviAII-TseI enzyme combination for all full-sibs and parents (each parent replicated 10 times). Restriction fragments were ligated to adapters, size selected, and amplified. Adapters contained an 8-bp buffer sequence in addition to sample-specific variable length barcodes (6-9 bp). Each 64-plex library was sequenced using eight lanes of an Illumina HiSeq 2500 system in order to ensure optimal read depth for dosage calling. We trimmed the 8-bp buffer sequence from the reads using the FASTX-TOOLKIT (available at hannonlab.cshl.edu/fastx_toolkit/). A modified version of TASSEL-GBS pipeline (v4.3.8), called TASSEL4-POLY (Pereira et al. 2018, available at https://github.com/gramarga/tassel4poly) was used to demultiplex and to count and store the actual read depth for all loci in variant call format (VCF) files. We used BOWTIE2 (Langmead and Salzberg 2012) to align 64-bp tags against the I. trifida and I. triloba genomes, two sweetpotato wild relative diploid species , available at http://sweetpotato.plantbiology.msu.edu). Finally, the software SUPERMASSA (Serang et al. 2012, available at https://bitbucket.org/orserang/supermassa) was used to perform multi-threading dosage call through a wrapper function named VCF2SM (Pereira et al. 2018, available at https:// github.com/gramarga/vcf2sm).
Linkage mapping: A linkage map was constructed by Mollinari et al. (2020) using the R package MAPPOLY (Mollinari and Garcia 2019, available at https://github. com/mmollina/mappoly; see File S2). In brief, we computed two-point recombination fractions between all 38,701 nonredundant, high quality GBSpoly-based markers, and sorted the most likely linkage phase between each marker pair. Markers were then grouped into 15 linkage groups (LGs) by using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA) hierarchical clustering method. For each LG, large-scale ordering was obtained using multidimensional scaling as implemented in the R package MDSMAP , and then small-scale ordering was refined based on the reference genomes (see details in Mollinari et al. 2020). Map distances, computed using Haldane's map function, were re-estimated based on the individual posterior probabilities from SUPER-MASSA dosage calls. The final integrated, completely phased map was composed of 30,684 markers distributed along 15 LGs with a total length of 2708.3 centiMorgans (cM) and no major gaps between markers (11.3 markers every cM, on average). Multi-point genotype conditional probabilities of putative QTL were estimated for every individual given the final map using an HMM algorithm (Lander and Green 1987;Jiang and Zeng 1997) adapted for polyploids ) as implemented in MAPPOLY. Since 17 full-sibs were filtered out along the map construction , only the remaining 298 individuals were ultimately used for QTL mapping.

QTL mapping analyses
Under random bivalent pairing, an autopolyploid individual of a species with an even ploidy level m can produce m m=2 , or "m choose m=2", different gametes with the same probability, and a cross between two individuals can generate p ¼ m m=2 2 different genotypes. As an example, consider two contrasting parents, A and B, of a hexaploid species (such as sweetpotato) and their respective genotypes for a QTL as abcdef and ghijkl, each one with potentially six different alleles. As each parent can produce 20 different gametes, the cross A 3 B would generate p ¼ 20 2 ¼ 400 possible different genotypes. The model detailed next can be adapted easily to any polyploid species with an even ploidy level by simply changing p accordingly, e.g., p ¼ 6 2 ¼ 36 for autotetraploid and p ¼ 70 2 ¼ 4,900 for autooctoploid full-sib families.
REMIM model and hypothesis testing: Taking a full-sib population with n individuals derived from a cross between two hexaploid parents, A and B, the multiple-QTL mapping model is expressed by where y is the n31 vector of phenotypic values (in our case, the jointly adjusted means from the phenotypic analysis), m is the fixed effect of population mean, u q is the p31 random vector of additive genetic values of QTL q ðq ¼ 1; . . . ; QÞ with u q $ N ð0; Ps 2 q Þ, and e is the n31 random vector of residual error with e $ N ð0; Is 2 Þ. 1 and I are an n31 vector of 1's and an n3n identity matrix, respectively, Z q is the n3p incidence matrix of genotype conditional probabilities of QTL q, and P is a p3p additive relationship matrix between the p possible QTL genotypes. This matrix is fixed for a given ploidy level, and, for a hexaploid species, P cells assume one out of seven different values (P jj9 ¼ f0=6; 1=6; 2=6; 3=6; 4=6; 5=6; 6=6g with j; j9 ¼ f1; . . . ; pg), depending on how many alleles IBD are shared between two genotypes. For example, the genotype pair abcghi-defjkl shares 0 out of 6 alleles, hence P jj9 ¼ 0=6, whereas abcghi-abcghi shares 6 out of 6 alleles, hence P jj9 ¼ 6=6.
Assuming that the random-effect QTL are uncorrelated, each with expectation zero, the expectation of the vector of phenotypic values y is EðyÞ ¼ 1m and its variance-covariance matrix is is the n 3 n additive relationship matrix between the n individuals on the putative QTL q, and s 2 q and s 2 are the respective variance components associated with QTL q and the residual error. In other words, each G q cell G ii9ðqÞ ranges from 0 to 1, as the result of the sum over the products of QTL genotype probabilities, Z ijðqÞ ¼ PrðG ijðqÞ jmapÞ and Z i9j9ðqÞ ¼ PrðG i9j9ðqÞ jmapÞ, from Z q of a sib-pair i and i 9 ði; i9 ¼ f1; . . . ; ngÞ, weighted by the proportion of shared alleles IBD P jj9 from P between each pair of possible genotypes j and j9 ðj; j9 ¼ f1; . . . ; pgÞ, i.e.
If the respective genotypes j and j9 of two sibs i and i9, G ijðqÞ and G i9j9ðqÞ , are observed with certainty, computing pairwise additive relationship is straightforward, e.g., PrðG i1ðqÞ ¼ abcghijmapÞ ¼ 1 and PrðG i92ðqÞ ¼ abcghjjmapÞ ¼ 1, hence G ii9ðqÞ ¼ 5=6. However, if the probability of the genotype of i is split in two, e.g., consisting of the realized additive relationship matrix.
Here, our interest is in testing i.e., whether QTL q contributes to the variation in y or not, so that several tests have to be performed along the genome. As part of the algorithm described next, we test for the presence of multiple QTL in consecutive rounds. In practice, we compute and store a G q matrix for every putative QTL q, representing genomic positions at a certain step size (e.g., every 1 cM). In this case, Equation 3 can be rewritten as where g q is an n31 random vector of the individual breeding values on the QTL q with g q $ N ð0; G q s 2 q Þ. Notice how Equation 3 and 4 are closely connected. At the QTL peak q, u q (Equation 3) are the best linear unbiased prediction (BLUP) values of the p QTL genotype effects, whereas g q (Equation 4) are the BLUP values for the n full-sibs, i.e., the individual breeding values based on the additive allele effects. Therefore, the QTL-based breeding valuesĝ q are equivalent to those from Z qûq . These alternatives forms are conveniently used in different contexts of QTL detection and characterization, as described next. While Equation 4 is preferred in order to make the multiple-QTL model selection less computational intensive because the stored G q can be used recursively, Equation 3 is used to describe the QTL genotype effects, which are ultimately used to compute additive allele effects.
We computed linear score statistics according to Qu et al. (2013) at every position and compared its P-value with a prescribed critical value, as part of the algorithm used to declare QTL as described in the next section. In order to compute the test statistics, we assume that y has normal distribution, i.e., y $ N ð1m; VðtÞs 2 Þ, with the variance components of QTL q rescaled in relation to s 2 as s 2 q ¼ s 2 t q , so that where r is the putative QTL being tested in a model with q ¼ 1; . . . ; r; . . . ; Q QTL and t ¼ ðt r ; t ðQ21Þ Þ9. Using this form, the hypothesis testing becomes H 0 : t r ¼ 0 vs: H a : t r . 0, and t ðQ21Þ components are regarded as nuisance parameters. The other nuisance parameters m and s 2 are removed when writing the log REML profiled function (Qu et al. 2013). Accordingly, it follows that y $ N ð0;ṼðtÞs 2 Þ, withṼðtÞ ¼G r t r þ P q6 ¼rG q t q þ Iñ and G q ¼ AG q A9. Based on ℓðtÞ, the score function of t r is given by S r ðtÞ ¼ @ℓ=@t r , i.e. On the one hand, when there is only one QTL ðr ¼ Q ¼ 1Þ, i.e.,t 0 ¼t r ¼ 0 under H 0 , S r (0) is the exact (nonasymptotic) test statistic. On the other hand, a moment-based approximation to the null distribution is used when two or more QTL are present in the model (Q . 1), i.e.,t 0 ¼ ð0;t ðQ21Þ Þ9 under H 0 (Qu et al. 2013). The validity of the moment-based approximation was assessed through simulations, as suggested by Qu et al. (2013). In any case, large score value indicates departure from H 0 . The P-values associated with the linear score test are continuous over the unit interval as a result of weighted sums of the scores from the profiled likelihood. Herein, we conveniently take the "logarithm of P" as LOP ¼ 2 log 10 ðPÞ for graphic representation and supporting interval calculation purposes. Support intervals are defined as the QTL peak neighboring region with LOP greater than, or equal to, LOP2d, where d is a constant that subtracts the highest LOP (thus from the QTL peak) in that region, as similarly proposed for the LOD score statistics (Lander and Green 1987).

QTL detection and characterization:
In order to select QTL, we adapted the MIM methodology described by Kao et al. (1999) to a random-effect model framework as follows: 1. Forward search adds one QTL at a time to the model at the position with the highest score statistic if the P-value is smaller than a genome-wide significance threshold level (e.g., a=0.20), and fits it into the model. Consecutive rounds of search for a new QTL are carried out conditioning to the one(s) in the model until no more QTL positions can reach the threshold. A window size (e.g., of 20 cM) is avoided on either side of QTL already in the model when searching for a new QTL; 2. Model optimization follows rounds of position refinement and backward elimination when no more QTL can be added in the forward search step. In turn, a QTL position is updated conditional to all the other QTL in the model, and its score statistic is re-evaluated at a more stringent significance threshold level (e.g., a=0.05), when the QTL may be dropped. The final set of QTL is defined when all selected positions are significant, and, thus, no more positions change or QTL are dropped; 3. Forward search (now with a threshold value as stringent as the one used for backward elimination, e.g., a=0.05) as well as model optimization procedures are repeated until no more QTL are added (via forward search) or dropped (via backward elimination). Finally, QTL profiling is performed with the remaining significant QTL after the last round of model optimization has been carried out. The score statistics and their associated P-values are computed for all genomic positions conditional to the final set of QTL.
Notice that, as part of the strategy for selecting QTL, we were less stringent during the first step of forward search, so that we were able to allow more positions to be tested again during model optimization. In fact, power for detecting significant positions is expected to increase when conditioning the forward search as well as the backward elimination to other QTL already in the model (Da Costa E Silva et al. 2012a). For the forward search performed after the first backward elimination, we used the last threshold set from the backward elimination in order to avoid false positives. Here, we adopted the score-based resampling method to assess the genome-wide significance level proposed by Zou et al. (2004). In brief, n independent samples from a standard normal distribution were used to obtain the score statistic for every map position under evaluation (e.g., every 1 cM) and the P-value of the highest score was stored. After repeating this N = 1000 times (resampling), the score-based threshold for a significance a level was then defined from the 100(12a) percentile of ascending ordered P-values from the N samples.
Once the QTL were selected, we were able to estimate their variance components and compute QTL heritabilities, h 2 q , as the ratio between the QTL variance component and total variance. Given the parameter estimates, QTL-based breeding values are directly obtained as the BLUPs of the QTL genotypes (i.e., the vectorû q ) from Equation 3. BLUPs of the p = 400 possible genotypes were further decomposed in order to compute the additive effects as the average of u q (BLUPs) containing the respective alleles. Note that, in an F 1 population, the allele substitution effects (the very definition of additive effects) can be assessed only among alleles within each parent. Due to the model assumptions of zero mean for random effects, additive allele effects sum up to zero. These effects should be interpreted as the heritable contributions from parent to offspring, hence providing straightforward estimation of QTL-based breeding values to be used for selection.

Simulations
We conducted the following simulation study to examine the performance of REMIM (Equation 4) and compare it with FEIM (Equation 1). We simulated quantitative traits with three QTL each (q ¼ 1; . . . ; Q; Q ¼ 3) positioned along the BT linkage map (Mollinari et al. 2020, n = 298;see File S2) according to three scenarios (1000 simulations each): (i) unlinked, where all QTL were positioned in different LGs each; (ii) random, where each QTL was positioned randomly, but no closer than 20 cM from each other in case of being assigned to the same LG; and (iii) linked, where at least two QTL were positioned in the same LG, but no closer than 50 cM from each other. The QTL heritabilities were simulated as h 2 q ¼ f0:3; 0:2; 0:1g following their respective QTL genotype effect distributions as g q $ N ð0; G q s 2 q Þ, where s 2 q ¼ f0:75; 0:50; 0:25g. The environmental error was simulated from a standard normal distribution ðs 2 ¼ 1Þ, while the population mean was simulated as zero ðm ¼ 0Þ.
One round of forward search followed by model optimization (steps 1 and 2 from the algorithm described above) was carried out for each simulated trait using different genomewide significance forward ða ¼ 0:20Þ and backward ða ¼ f0:15; 0:10; 0:05; 0:01gÞ P-value thresholds based on the score-based resampling method (Zou et al. 2004). For comparison, we ran FEIM (Equation 1) with the same simulated traits using different genome-wide significance LOD thresholds ða ¼ f0:20; 0:15; 0:10; 0:05; 0:01gÞ based on 1000 permutations (Churchill and Doerge 1994). We also stored the error vectors used to add noise to each simulated phenotype, and ran FEIM and REMIM models again, now using the respective error vectors as offset variables, which simply subtract the error from its respective phenotype. In this case, we expected that both FEIM and REMIM models would perform similarly, since all the noise had been controlled, so that the only variation left was thus due to the QTL. We used the same step size of 2 cM as well as the same window size of 20 cM for both models. LOP2d (from REMIM) and LOD2d (from FEIM) support intervals were calculated for three different d values ðd ¼ f1:0; 1:5; 2:0gÞ.
Following the definitions and summary statistics from Da Costa E Silva et al. (2012b), all QTL kept after model optimization were considered "mapped." A mapped QTL was considered "paired" if ,20 cM apart from the simulated position, and a paired QTL was considered "matched" (true QTL) if its support interval included a simulated QTL. Finally, a mapped QTL was considered "mismatched" (false QTL) if it was not matched. We summarized detection power and empirical false discovery rate (FDR) for each support interval. Power was calculated as the ratio between the number of matched QTL over the total number of simulated QTL. FDR was estimated as the ratio between the number of mismatched QTL over the total number of mapped QTL. The absolute distance differences between simulated and mapped positions of paired QTL (precision) were averaged out. The proportion of matched QTL over the total number of paired QTL as an approximation of support intervals (coverage) was provided for each d value.

Software implementation
We implemented the algorithm for detection and characterization of multiple QTL based on the REMIM model in an R package called QTLPOLY (available at https://github.com/guilherme-pereira/qtlpoly). We integrated functions from the R package VARCOMP (v0.2-0; Qu et al. 2013) to compute the score statistics. The rounds of QTL search and model optimization use the variance components estimated in the previous round, so that the new estimates iterate faster. In addition, calculations for different genomic positions were paralleled in order to speed up the process by using the R base package PARALLEL (v3.5.2; R Core Team 2019). Final models were fitted using the R package SOMMER (v3.6; Covarrubias-Pazaran 2016), from which BLUPs were extracted and used for the computation of additive allele effects and QTL-based breeding values. Both VARCOMP and SOMMER packages use REML estimation to compute the variance components from the random-effect QTL model. Functions for plotting QTL profiles, effects and support intervals were based on GGPLOT2 (v3.1.0; Wickham 2016). Additional functions for running FEIM model and multi-threaded permutations were included in QTLPOLY and were based on the lm() function from R base package STATS (v3.5.2; R Core Team 2019).

Gene expression profiling
A developmental time-course gene expression profiling data of 'Beauregard' (NCBI BioProject PRJNA491292) was reported previously , and a parallel time-series of development with 'Tanzania' (NCBI BioProject PRJNA549660) roots was recently analyzed, including identification of differentially expressed genes . In brief, 'Beauregard' and 'Tanzania' roots were harvested from four biological replicates at 10, 20, 30, 40, and 50 days after transplanting (DAT), and 30, 40, and 50 DAT roots were classified into fibrous and storage roots based on diameter as described by Wu et al. (2018). RNA-sequencing (RNA-seq) datasets from both genotypes were generated and processed as described previously , with the one exception that the 'Tanzania' 30 DAT storage root sample was subsampled for 30 million reads. Differentially expressed genes were determined as described in Gemenet et al. (2020) using DESEQ2 (v1.22.2;Love et al. 2014) with a log2 fold-change (lfc) threshold of 2 and an adjusted P-value cutoff of 0.01, based on the fragments per kilobase exon model per million mapped reads (FPKM). To provide a comparison of expression abundances in the roots to leaves, 'Beauregard' and 'Tanzania' plants were grown as described in Lau et al. (2018) for control conditions and RNA-seq libraries from leaves processed as described above.

Data availability
The authors state that all data necessary for confirming the conclusions presented in the article are represented fully within the article. Raw RNA-seq reads are available at NCBI under BioProject numbers PRJNA491292 and PRJNA549660. Raw GBSpoly reads and VCF files are available via Mollinari et al. (2020). File S1 contains phenotypic data. File S2 contains genetic map information used in this study (also available via Mollinari et al. 2020). File S3 is the raw expression abundance matrix (also available via Gemenet et al. 2020). File S4 is the log2 FPKM expression matrix (also available via Gemenet et al. 2020). File S5 contains the differentially expressed genes associated with this study. MAPPOLY software used for linkage mapping analyses is available at GitHub (https://github. com/mmollina/mappoly). QTLPOLY software used for QTL mapping analyses and simulations is available at GitHub (https://github.com/guilherme-pereira/qtlpoly). Supplemental material available at figshare: https://doi.org/ 10.25386/genetics.12246134.

Trait heritabilities and correlations
Each one of the eight yield-related traits from six environments were analyzed using a multi-environment mixed-effect model, from which we were able to obtain jointly predicted means for each full-sib and variance component estimates (Table 1). Parents showed contrasting means for all traits, with 'Beauregard' presenting higher means for number of roots and root yield (both commercial and noncommercial) and commercial index when compared to 'Tanzania', which surpassed 'Beauregard' only for foliage yield. Interestingly, transgressive segregation was observed among the full-sibs for all traits, with emphasis on several individuals, with CYTHA higher than the most productive parent. Broad-sense heritabilities (H 2 ) were generally high, ranging from 70.39% (NCYTHA) to 88.42% (CI). Correlations between the predicted means were also estimated ( Figure 1). Low to nonsignificant correlations (from 20.04 to 0.16**) were observed between FYTHA and root yield traits. The highest correlation (0.99***) was between CYTHA and RYTHA, which was expected, since most RYTHA is derived from CYTHA. Among the traits used for CI calculation, CYTHA also had the highest correlation with CI (0.78***), likely because it is its main component. TNR components were also highly correlated with TNR, namely NOCR (0.89***) and NONC (0.86***). Finally, NOCR and NONC turned out to be highly correlated with CYTHA (0.81***) and NCYTHA (0.86***), respectively.

Mapping QTL in the BT population
Simulations: The BT linkage map based on 298 F 1 progenies was used to simulate 1000 quantitative traits with three QTL each. We ran FEIM (Equation 1) and REMIM (Equation 4) for each simulated trait in order to assess their power and empirical FDR in three increasingly difficult scenarios. We notice that the proportions of mapped QTL paired to the simulated positions with the highest heritability ðh 2 1 ¼ 0:3Þ were similar regardless of the method and criterion. However, higher proportions of simulated QTL with low heritability ðh 2 3 ¼ 0:1Þ were consistently mapped under the multiple-QTL mapping approach (see Supplemental Material, Table S1). In general, the average absolute difference between the simulated and mapped QTL peak location did not differ whatsoever when comparing models or thresholds. For REMIM, different genome-wide a level forward thresholds did not impact power or FDR (results not shown), but varying a level backward thresholds was critical. From testing different d values for LOD2d and LOP2d, we learned that d = 1.5 was a good approximation of 95% support interval for both FEIM and REMIM (see Table S2). Based on results for such a support interval, Figure 2 compares different threshold criteria for declaring a QTL during FEIM and REMIM (for forward a=0.20 threshold). On the one hand, both FEIM and REMIM have shown a relative control of FDR, with ,15% of false discoveries for most a levels, regardless of scenario. On the other hand, power differed in up to $18% when comparing different a levels that delivered ,10% FDR. Such a drop in detection power is more noticeable when FEIM is dealing with linked QTL, whereas power barely changes for REMIM across scenarios. Interestingly, even with the most stringent criteria of a=0.01 backward threshold for REMIM, we were able to map as many QTL as using a=0.20 for FEIM, but with better FDR control ($7% for REMIM in comparison to .15% for FEIM). By fitting the model without the simulated residual error in the "unlinked" scenario, we observed $92% power for FEIM. The small difference in comparison to $100% power for REMIM is likely due to the fact that the QTL genotype effects were simulated based on the REMIM additive relationship model (Equation 4). Despite the relative bias, FEIM failure in separating linked QTL became more evident in the "random" and "linked" scenarios. As a consequence, detection power plateaus $87% and 66% were observed, while REMIM exhibited $100% power (Figure 2).
Yield-related traits: We adopted genome-wide a levels of 0.20 and 0.05 as the respective forward and backward thresholds for detecting QTL in eight yield-related traits in the BT population using REMIM (Equation 4), such that respective resampling-based P-value thresholds were defined as 5.83310 24 and 1.42310 24 . In total, 13 QTL were identified (Figure 3), with P-values ranging from 1.42310 27 (QTL 2 for TNR) to 1.37310 24 (QTL 2 for NOCR) ( Table 2). The number of QTL per trait ranged from one (CYTHA, RYTHA and FYTHA) to four (NONC and TNR); NOCR had two QTL, and no QTL were found for NCYTHA and CI. Four LG harbored QTL regions: LGs 1, 3, and 10 harbored three QTL each, and LG 15 harbored four QTL. Approximate 95% support intervals computed as LOP21.5 (see Figure S1) showed that QTL were colocalized mostly within each LG. QTL peaks for NOCR, NONC, and TNR can be found from 137.60 to Parental ( B and T) and progeny ð F 1 Þ means, minimum, and maximum F 1 means, and genetic ðs 2 g Þ, genotype-by-environment interaction ðs 2 ge Þ and residual ðs 2 Þ variance components and heritability (H 2 ) estimates are shown for eight traits: number of commercial (NOCR), noncommercial (NONC) and total (TNR) roots per plant, commercial (CYTHA), noncommercial (NCYTHA) and total (RYTHA) root yield in t ha 21 , foliage yield (FYTHA) in t ha 21 , and commercial index (CI) 142.07 cM on LG 1, and from 13.11 to 20.18 on LG 3. On LG 15, QTL peaks were localized either at 67.20 and 78.04 cM for NONC and TNR, respectively, or at 5.34 cM for both CYTHA and RYTHA. QTL variance ðs 2 q Þ and heritability ðh 2 q Þ estimates from Equation 3 are shown in Table 2, where the subscript q denotes the QTL number for a specific trait. QTL heritabilities ranged from 8.99 (QTL 3 for TNR) to 22.04% (QTL 2 for TNR), representing the proportion of the total variance explained by that QTL, conditional to all the other QTL in the model. Out of 13 QTL, 4 were considered major QTL ðh 2 q . 15%Þ, which happened as pairs of colocalized QTL at the beginning of LGs 3 for NOCR/TNR and 15 for RYTHA/ CYTHA (see Figure S2). Altogether, multiple QTL explained as much as 35. 63%, 49.19%, and 55.06% of the total variance for NOCR, NONC, and TNR, respectively. In order to compare these QTL detection results with FEIM, we adopted an a=0.05, so that permutation-based LOD score thresholds ranged from 7.63 to 7.85, depending on the trait (see Figure  S4). A total of 12 QTL were mapped (see Table S3): one for each CYTHA, RYTHA, and FYTHA; two for NOCR; three for TNR; and four for NONC. No QTL were found for NCYTHA and CI either. The same four LGs harbored QTL: LGs 1 and 10 had two QTL each, LG 3 had three and LG 15 had five QTL, with the most significant QTL (adjusted R 2 . 11) found on LGs 1, 3, and 15. In comparison with REMIM, FEIM did not detect QTL for NONC or TNR on LGs 1 and 10, respectively. Instead, it allowed two QTL on LG 15 for NONC (one at 56.86 cM and another at 119.08 cM).
From REMIM, additive allele effects (see Table S4) were derived from the QTL genotype BLUPs (Equation 3). In general, although both parents have shown allele contributing to either decreasing or increasing the trait means, 'Beauregard' seemed to contribute more, increasing the number of roots, while 'Tanzania' exhibited major alleles increasing root yield. These effects represent the parental contributions to the population mean, i.e., how much one adds to, or subtracts from, the mean, given 1 of the 400 possible genotypes. For instance, Figure 4 shows the additive allele effects of QTL 1 for CYTHA. Inferences on which alleles contribute more to the mean as well as which ought to be selected for breeding purposes are straightforward. For example, individuals with the haplotypes b from 'Beauregard' and i from 'Tanzania', and without the haplotypes c and j through l from the respective parents will have the highest QTL-based breeding value estimates for CYTHA. By computing QTL-based breeding values, we could hypothesize on the genetic basis of trait correlation (see Figure S3). For instance, NOCR and NONC were highly correlated (0.51***) given a couple of colocalized QTL. While TNR is significantly correlated to NOCR/NONC (.0.76***), smaller correlation was observed between TNR and CYTHA/RYTHA (,0.15*). Interestingly, QTL-based breeding values for FYTHA did not seem to correlate to any other root-related trait. Finally, the absolute positive correlation of 1.00*** between predicted means from CYTHA and RYTHA (Figure 1) could be explained by a single colocalized QTL, since the correlation between QTL-based breeding values was also very high (0.99***).

Candidate genes underlying major QTL
We elected to examine putative candidate genes under two QTL with the highest heritabilities: the QTL for TNR on LG 3 (colocalized for NOCR and NONC) and the QTL for CYTHA on LG 15 (colocalized for RYTHA). The QTL peak on LG 3 was positioned at 1,591,872 bp relative to I. trifida genome (see Table S5), and 75 genes were found within a $500-kb window around this peak. Examination of functional annotation of these genes, coupled with expression profiles in leaves, as well as a time course of developing roots in both 'Beauregard' and 'Tanzania' ) (see File S3, File S4), revealed three candidate genes of interest (see Figure S5). The first I. trifida gene, itf03g02930, encodes a homolog of SKU5, a glycosyl phosphatidylinositol modified protein in Arabidopsis thaliana; the second candidate gene, itf03g03280, encodes a protein with sequence similarity to annexin (ANN1 and ANN2); and the last candidate gene, itf03g03460, encodes a protein similar to the WUSCHEL homeobox family protein (AtWOX13). In general, these genes were expressed at a low level in leaves, but highly in roots. Six additional genes were found to be differentially expressed in 'Beauregard' storage roots relative to fibrous roots across the time course, while only a single gene was differentially expressed in 'Tanzania' (see File S5).
On LG 15, a major QTL for CYTHA, with the peak at 477,772 bp, spanned positions from 21,822 to 1,915,814 bp (see Table S5) and over 300 genes. As this was too large a distance to manually curate candidate genes responsible for the trait, we restricted our manual review to 25 genes distal and proximal to the most significant marker. Within this region, two genes encoded functions that may be associated with storage root development and had expression profiles that supported a role in storage root development (see Figure S5, File S3, File S4). The hormone ethylene has diverse roles in cell proliferation and elongation, and the itf15g01020 gene encodes a protein with similarity to the A. thaliana CONSTITUTIVE TRIPLE RE-SPONSE 1 gene (CTR1) with functions in the ethylene signaling pathway. This gene was expressed in leaves but expressed at twice the levels in developing roots. Storage roots are grown for their high starch content, and itf15g01120 encodes a protein with similarity to starch branching enzyme 2.2. This gene was expressed in leaves and roots with the highest expression levels detected in storage, not fibrous nor developing roots. Analysis of differentially expressed genes in storage vs. fibrous roots of 'Beauregard' and 'Tanzania' within this QTL region revealed 39 unique differentially expressed genes in 'Beauregard' across the time course, and 12 unique differentially expressed genes in 'Tanzania' (see File S5). Detection power (in percentage) vs. empirical false discovery rate (FDR, in percentage) from QTL mapping analyses of simulated traits in 'Beauregard' 3 'Tanzania' (BT) full-sib family. Each trait was simulated with three QTL ðq ¼ f1; 2; 3gÞ with different heritabilities ðh 2 q ¼ f0:3; 0:2; 0:1gÞ positioned along the BT linkage map (n = 298). At least two out of three QTL were linked or not depending on three scenarios (linked, random, and unlinked), with 1000 simulations each scenario. Fixed-effect interval mapping (FEIM, red) and random-effect multiple interval mapping (REMIM, blue) were carried out with (solid lines) and without (dotted lines) the simulated error. FEIM and REMIM used different genome-wide significance thresholds (a ¼ f0:20; 0:15; 0:10; 0:05; 0:01g, symbols) based on permutation tests or resampling method, respectively. For a $95% support interval coverage, power was computed as the proportion of true QTL over the total number of simulated QTL, and FDR as the proportion of false QTL over the total number of mapped QTL.

Discussion
Polyploid single-vs. multiple-QTL models QTL mapping in autopolyploid species has been limited to a fixed-effect interval mapping (FEIM) model proposed for tetraploids (Hackett et al. 2001(Hackett et al. , 2014 and also expanded for hexaploids (van Geest et al. 2017). Consisting of a single-QTL model, 2m22 main effects are fitted (m is the ploidy level), and this model is compared to a null model (with no QTL) using LRT, ultimately expressed as LOD scores. Permutation-based genome-wide significance LOD thresholds are then used to declare a QTL. Trying to fit additional QTL into FEIM model could rapidly lead it to over-parameterization, since each QTL requires as many as 6 (for tetraploids), 10 (for hexaploids), or 14 (for octoploids) main effects to be tested and estimated in such a fixed-effect multiple-QTL model. Furthermore, new rounds of permutation tests, based on a model with QTL, would need to be carried out in order to provide an updated LOD score threshold (Klaassen et al. 2019). In contrast, the random-effect multiple interval mapping (REMIM) model presented here is designed to fit multiple random-effect QTL by estimating only one single parameter ðs 2 q Þ per QTL. Score statistic tests are performed in order to assess whether a QTL variance component is zero or not, conditional to other QTL in the model. These tests provide an approach for comparing two nested models with the reduced model having a random effect excluded, similar to what restricted LRT (RLRT) would do. However, (R)LRT is more prone to numerical errors because the null hypothesis ðH 0 : s 2 q ¼ 0Þ falls on the boundary of the parameter space, whereas score-based methods can be robust to eventual misspecification of the distribution of random effects (Verbeke and Molenberghs 2003). A score-based resampling method (Zou et al. 2004) was used for setting genome-wide significance thresholds, which facilitates a forward-backward search to identify an optimal multiple QTL-model in a computationally tractable manner.
Here, we used the BT population genetic map to simulate quantitative traits based on multiple QTL with different heritabilities each, in order to compare FEIM and REMIM performances under three increasingly difficult scenarios ( Figure  2). Both approaches would potentially detect similar number of QTL in case they were all unlinked. However, despite the small bias created by the way QTL were simulated (based on REMIM model), FEIM showed a relative loss of power. Multiple-QTL model approaches have proven to provide greater power and better FDR control than single-QTL models for both univariate Laurie et al. 2014) and multivariate (Da Costa E Silva et al. 2012b) models, due mostly to the differences in detecting QTL with smaller effects. In fact, this is rather expected as a multiple-QTL model has a smaller residual variance, which helps to detect additional QTL. Multiple-QTL models are also supposed to improve detection of more than one QTL on the same LG (Mayer 2005), as they are usually hard to separate from each other due to the large extension of linkage disequilibrium in mapping populations. For polyploids, a nonoptimized approach of using residuals from a fitted single-QTL model as phenotypic data to find a second linked QTL has been proposed (Mengist et al. 2018), as it requires additional manual steps. In contrast, the forward-backward search employed here has been shown to be optimized to detect linked QTL. The consistently superior results in comparison to FEIM pointed out that the linear score statistics behaved well as part of our algorithm, and the impact of using different a level thresholds for QTL detection was also assessed here. In QTL mapping analysis, it is important to have a reasonable balance between detection power and FDR, as we are interested in mapping as many true QTL as possible. When deciding on which a level to adopt, one should consider the goals of the study, i.e., whether it is intended to use a few very reliable QTL for marker-assisted breeding, or to discover as many QTL-related putative genes as possible for further validation. Although one could use a more relaxed criteria in Table 2 Random-effect multiple interval mapping (REMIM) of yield-related traits from 'Beauregard'3'Tanzania' (BT) full-sib family Linkage group (LG), map position (in centiMorgans) and its $95% support interval (within parenthesis), score statistic and its corresponding P-value, variance ðs 2 q Þ; and heritability (h 2 q , in percentage) of mapped QTL using resampling-based genome-wide significance P-value threshold of 0.05 (backward elimination) Trait abbreviations: number of commercial (NOCR), noncommercial (NONC) and total (TNR) roots per plant, commercial (CYTHA), noncommercial (NCYTHA) and total (RYTHA) root yield in t ha 21 , foliage yield (FYTHA) in t ha 21 , and commercial index (CI).
order to increase the power of detection while still maintaining an acceptable level of FDR, REMIM with respective forward and backward a levels of 0.20 and 0.05 seemed reasonable.

QTL mapping for yield traits in sweetpotato
Most of the linkage and QTL mapping work done for sweetpotato so far has relied on strategies based on a double pseudotestcross approach for diploid species (Grattapaglia and Sederoff 1994). For example, separate parental maps have been built based on this diploid-based simplification, using qualitative marker systems such as randomly amplified polymorphic DNA (RAPD; Ukoskit and Thompson 1997), amplified fragment length polymorphism (AFLP; Kriegner et al. 2003;Cervantes-Flores et al. 2008a;Nakayama et al. 2012), retrotransposon insertion polymorphisms (Monden et al. 2015), and simple sequence repeats (SSR; Kim et al. 2017). A recent map was developed from a selfing population and used only single-dose SNPs, resulting in higher marker saturation in comparison to the previous maps (Shirasawa et al. 2017), though the map was still not integrated. In some of these cases, QTL mapping analyses were performed for several traits, related mostly to quality (Cervantes-Flores et al. 2011;Zhao et al. 2013;Yu et al. 2014;Kim et al. 2017) and resistance to biotic stresses (Cervantes-Flores et al. 2008b;Yada et al. 2017a). For yield-related traits, only two studies have been reported to date (Chang et al. 2009;Li et al. 2014). The use of DNA markers with unknown DNA sequence limited our ability to compare their results with I. trifida and I. triloba genomes , and, ultimately, with our present QTL study (see Table S5). Moreover, although these diploid-based strategies were the state-of-the-art at that time for qualitative marker-based, low density genetic maps, they imposed significant restrictions on statistical power for QTL detection and its genetic interpretation.
Recently, more improved methods and computational tools that take into account autopolyploid complexity for dosage SNP calling (Voorrips et al. 2011;Serang et al. 2012;Schmitz Carley et al. 2017;Gerard et al. 2018) and integrated linkage map construction Bourke et al. 2018; Mollinari and Garcia 2019) have become available, mostly dedicated to tetraploids. Taking advantage of the newly developed MAPPOLY package, Mollinari et al. (2020) built the first integrated genetic map for sweetpotato, from the BT population used here. For a hexaploid species, this has opened up new opportunities for more interpretable QTL genetic models due to MAPPOLY implementation of a HMM that delivers QTL genotype conditional probabilities along a fully integrated genetic map (Mollinari and Garcia 2019). Based on this map, we detected 13 QTL ( Figure 3) using REMIM, with QTL heritabilities ranging from 8.99 to 22.05% (Table 2, see Figure S2). Most of these QTL were also mapped among the 12 QTL using FEIM (see Figure S4), with proportion of variance explained (PVE) ranging from 8.42 to 12.43% (see Table S3). Based on the double pseudotestcross approach, previous studies found nine QTL for storage root yield ð17:7% # PVE # 59:3%Þ , seven QTL for root and top (foliage) weight ð16:0% # PVE # 29:5%Þ; plus one QTL for root number ðPVE ¼ 14:8%Þ (Chang et al. 2009). Because of likely estimation bias due to reduced population sizes (n , 200), and the use of not very informative markers and linkage maps, these previous PVE findings are hard to compare with our results.
Although number of roots seemed to be as heritable as root yield (Table 1), only one colocalized QTL was detected for CYTHA/RYTHA. These traits are likely more complex in terms of their genetic architecture, though. That is, not only number of roots contributes to yield, but also size and composition, so we can expect that more regions are involved in root yield, in addition to those involved in number of roots. Nevertheless, Yada et al. (2017b) found a rather low trait heritability (likely individual-basis) for commercial root yield (H 2 = 24%) among 278 full-sibs of a cross between 'New Kawogo', a Ugandan landrace, and 'Beauregard', possibly due to stronger G 3 E interaction, which adds to the trait complexity. Here, G 3 E interaction seemed important for all traits, and its consequences to QTL mapping and breeding will be explored in future studies. As QTL mapping targets major QTL, usually stable across environments, most of the minor ones must have gone undetected. Moreover, additional genetic variation could be due to higher-order allele interactions and genetic epistasis, which the current models do not account for. Colocalized QTL among number of roots and yield traits explain some of the correlations among QTL-based breeding values (see Figure S3), partially explaining the correlations among the predicted means for these traits (Figure 1). Based on the correlation between QTL-based breeding values, FYTHA does not seem to be useful in indirect selection for CYTHA, as suggested previously (Chang et al. 2009).
'Beauregard' and 'Tanzania' contributed more importantly with positive and negative major effects, respectively. However, the presence of both favorable and unfavorable QTL alleles were observed in either parents (see Table S4), which possibly explains the presence of transgressive segregants for all traits. Transgression in polyploids seems to be due to cumulative complementary alleles not only at different loci (Tanksley 1993), but also from the same QTL. In fact, increased heterozygosity has been suggested as one of the major forces of polyploid evolutionary success, as a broader allele repertoire may result in the variation of gene expression and regulation needed to thrive in more diverse environmental conditions (Van de Peer et al. 2009). As an example, 'Tanzania' exhibited alleles contributing to increase CYTHA from a major QTL (Figure 4), although this landrace was not very productive in our environments overall. The additive effects are the most important from a breeding point-of-view, and their estimation provides straightforward direction on which alleles to select. Simpler biallelic-based models proposed previously (Hackett et al. 2014;Chen et al. 2018) may be used to estimate other interactions. The effective use of these allele interactions in QTL detection and breeding remains limited, though. As noted by Gallais (2003), estimating multi-allelic interactions reliably would require larger populations.
Several studies have looked at genes involved in storage root initiation and development in sweetpotato as reviewed by Khan et al. (2016). The storage roots differentiate from lateral roots by development of cambia around the protoxylem and secondary xylem, while lignification of the steles of some lateral roots inhibits this transformation (Villordon et al. 2012). Using the expression profile of the parents of the current mapping population, we found genes in leaves and roots (see Figure S5) related to root directional growth (e.g., SKU5, itf03g02930 homolog; Sedbrook et al. 2002) and lateral development (e.g., AtWOX13, itf03g03460 homolog; Deveaux et al. 2008) as well as with sugar transport to the root tip (e.g., ANN1 and ANN2, itf03g03280 homolog; Wang et al. 2018) within the QTL region on LG 3 associated with number of storage roots. Thus, both root restructuring and carbon supply is likely involved in the number of lateral root that transform to storage root. Other genes such as MADS-box (e.g., itf03g02230 homolog; Kim 2002), expansin (EXP, itf03g05010 homolog), and BEL1-like homeodomain (e.g., itf03g02670 homolog; Ponniah et al. 2017) have been strongly implicated in storage root formation and development in sweetpotato, and were found within the QTL region on LG 3. On the QTL related to storage root weight on LG 15, we found genes related with the hormonal control of cell proliferation (e.g., CTR1, itf15g01020 homolog; Street et al. 2015) and with starch biosynthesis (e.g., starch branching enzyme 2.2, itf15g01120 homolog; Li et al. 2014). The association between these and other differentially expressed genes listed in this study (see File S5) is yet to be defined, and suggests the complex nature of storage root formation and development.

Final considerations
Here, we present a stepwise-based algorithm for multiple-QTL model selection in full-sib populations of autopolyploid species with a fully integrated map, from which QTL genotype conditional probabilities can be calculated. The use of score statistics is a key component of this new method, which depends on a dynamic and fast-computing test for model selection during the QTL detection process. Simulations were performed in order to assess the impact of using different threshold criteria and to provide some empirical sense on how to use the method in practice. REMIM has been carried out in a hexaploid sweetpotato population to detect major loci contributing to the variation of yield-related traits that may be targeted in molecular-assisted breeding. The use of randomeffect models has created the context for fitting multiple QTL, providing straightforward information on variance components, important for computing QTL heritabilities. Finally, QTL genotype predictions (BLUPs) allowed us to estimate additive effects for characterizing major allele contributions, and compute QTL-based breeding values that can be used for performing selection. This novel approach may enable more complex models, such as those accounting for interaction among QTL as well as multiple traits or multiple environments in order to study shared genetic control in different traits/ environments and G 3 E interaction at QTL level. Such a model can also be expanded in order to consider multi-parental designs and double reduction or preferential pairing as long as the reconstructed haplotypes can be used to inform on shared alleles IBD. For these more complex models, one could expect additional computational cost, for which further investigation is needed. Understanding the genetic architecture of root yield and other traits related to quality and resistance to biotic and abiotic stresses represents great opportunity for improving characteristics of interest in sweetpotato and other polyploids. Most of these important traits are polygenic in nature and only assessed later in a breeding program, where marker-assisted selection could help to speed up the process.