Abstract

Motivation

Empirical Bayes techniques to genotype polyploid organisms usually either (i) assume technical artifacts are known a priori or (ii) estimate technical artifacts simultaneously with the prior genotype distribution. Case (i) is unappealing as it places the onus on the researcher to estimate these artifacts, or to ensure that there are no systematic biases in the data. However, as we demonstrate with a few empirical examples, case (ii) makes choosing the class of prior genotype distributions extremely important. Choosing a class is either too flexible or too restrictive results in poor genotyping performance.

Results

We propose two classes of prior genotype distributions that are of intermediate levels of flexibility: the class of proportional normal distributions and the class of unimodal distributions. We provide a complete characterization of and optimization details for the class of unimodal distributions. We demonstrate, using both simulated and real data that using these classes results in superior genotyping performance.

Availability and implementation

Genotyping methods that use these priors are implemented in the updog R package available on the Comprehensive R Archive Network: https://cran.r-project.org/package=updog. All code needed to reproduce the results of this article is available on GitHub: https://github.com/dcgerard/reproduce_prior_sims.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Because of their importance in agriculture (Udall and Wendel, 2006), evolution (Soltis et al., 2014) and biodiversity (Soltis and Soltis, 2000), scientists are increasingly interested in studying the genomic landscape of polyploid organisms (organisms with more than two copies of their genome). In these studies, researchers usually first estimate the number of copies of each allele (dosage) at given polymorphic loci in a sample. These dosages may then be fed into downstream analyses such as genome-wide association studies (Benevenuto et al., 2019; Ferrão et al., 2018; Rosyara et al., 2016), genomic prediction (Amadeu et al., 2019; de Bem Oliveira et al., 2019; Endelman et al., 2018; de C Lara et al., 2019) and genetic mapping (Ferreira et al., 2019; Shirasawa et al., 2017). To determine allele dosage, researchers first measure the reference and alternative counts of small reads from some sequencing technology (Baird et al., 2008; Elshire et al., 2011). Based on these read counts, they then infer the dosage of reference and alternative alleles using statistical methods.

Due to constraints in sequencing depth, many genotyping methods incorporate empirical Bayes techniques to borrow strength between individuals (Blischak et al., 2018; Clark et al., 2019; Gerard et al., 2018; Maruki and Lynch, 2017; Serang et al., 2012; Voorrips et al., 2011). These methods posit some class of prior distributions for the possible genotypes, select a distribution among this class using maximum marginal likelihood, and incorporate this prior genotype distribution in a Bayesian inference scheme to derive a posterior genotype distribution from which genotype calls are made. Adaptively estimating the prior distribution allows methods to automatically gauge the level of prior information in the data and use this prior information to improve genotyping.

In (empirical) Bayesian inference, researchers typically first navigate toward prior classes that are either ‘uninformative’ or ‘flexible’. In the case of genotyping, a typical ‘uninformative’ prior would be the discrete uniform distribution, which assumes that each genotype is equally probable. This distribution was one of the first used in the literature (McKenna et al., 2010). If a researcher wanted no constraints on the genotype distribution, this can be accomplished with the class of general categorical distributions, which allows each genotype to occur with any frequency, subject to the constraint that the genotype frequencies sum to one. That is, it places no assumptions on the structure of the prior distribution and simply tries to fit the best genotype distribution to the data. This class was also one of the early priors used for genotyping polyploids (DePristo et al., 2011; Li, 2011).

Besides the genotype of the individual, there are usually other, more technical, aspects of the data that should be taken into consideration during allele dosage inference, including sequencing error rate, allele bias and overdispersion (Gerard et al., 2018). A principled empirical Bayes approach would integrate over the uncertainty in these other parameters prior to estimating the genotype prior. However, computation is a major concern as there are often tens or hundreds of thousands of SNPs to genotype in a single dataset, and even increases of seconds can make a genotyping method untenable for applied researchers. Thus, for computational reasons, genotyping methods either assume these parameters are known a priori (Blischak et al., 2018; Clark et al., 2019; Maruki and Lynch, 2017) or estimate them simultaneously with the prior genotype distribution (Gerard et al., 2018; Voorrips et al., 2011).

Assuming these parameters are known a priori is unappealing as it prohibits adapting to the features present in the data. However, using the second strategy of simultaneously estimating technical features with the genotype prior makes the choice of the class of priors of vital importance—priors that are either too flexible or too constrained perform poorly in practice. To demonstrate this, we present a genotype plot (Gerard et al., 2018) of a single SNP from the autotetraploid potato data from Uitdewilligen et al. (2013),Figure 1A. A genotype plot contains the reference counts of an individual on the y-axis and the alternative counts on the x-axis. Individuals with a dosage of four reference alleles would lie near the x = 0 vertical line, while individuals with a dosage of zero reference alleles would lie near the y = 0 horizontal line. The plot seems to display two clusters of individuals, which would indicate to a researcher that there is a group of potatoes with a genotype of four reference alleles and a group of potatoes with a genotype of three reference alleles. However, when we fit updog (the software implementation of the empirical Bayes genotyping method described in Gerard et al., 2018) to these data, we obtain unreasonable genotype estimates when we either use the class of general categorical distributions (labeled ‘General’ in Fig. 1B) or we use the discrete uniform distribution (labeled ‘Uniform’ in Fig. 1C). These results generalize to other datasets (Supplementary Fig. S1). As a preview, Figure 1D demonstrates that using one of our proposed classes of prior distributions results in intuitive genotype estimates.

The number of reference reads (y-axis) versus the number of alternative reads (x-axis) from 84 autotetraploid potatoes for a SNP from Uitdewilligen et al. (2013). The raw unannotated data are presented in (A). The other panels are color-coded by estimated genotype according to updog fits using either (B) the general categorical class of priors, (C) the discrete uniform prior distribution or (D) the class of proportional normal priors. The dashed lines in (B) through (D) are the estimated expected counts, and are functions of the estimated allele bias and estimated sequencing error rate (see Gerard et al., 2018). Only the proportional normal class of priors provides intuitive genotyping
Fig. 1.

The number of reference reads (y-axis) versus the number of alternative reads (x-axis) from 84 autotetraploid potatoes for a SNP from Uitdewilligen et al. (2013). The raw unannotated data are presented in (A). The other panels are color-coded by estimated genotype according to updog fits using either (B) the general categorical class of priors, (C) the discrete uniform prior distribution or (D) the class of proportional normal priors. The dashed lines in (B) through (D) are the estimated expected counts, and are functions of the estimated allele bias and estimated sequencing error rate (see Gerard et al., 2018). Only the proportional normal class of priors provides intuitive genotyping

This indicates that using a class of priors that is either too flexible (the general categorical) or too restrictive (the discrete uniform) can result in poor genotyping. In the case of the discrete uniform prior, updog effectively tries to estimate the allele bias and sequencing error to make the empirical genotype distribution as close to a discrete uniform as possible. As the real genotype distribution is not discrete uniform, this results in extreme estimates of the allele bias and unreasonable genotype estimates. In the case of the general categorical prior, since we do not have a data-adaptive prior distribution on the bias, there is little loss in marginal likelihood for having an extreme allele bias. And so the bias parameter is free to vary as long as it increases the marginal likelihood. Indeed, if we include a slight penalty on the bias parameter, the general categorical class of prior distributions results in reasonable genotype estimates (Supplementary Fig. S2). These results indicate that having too flexible a class of prior distributions can result in unstable genotype estimates; while having too constrained a class of prior distributions can result in the likelihood parameters adapting to the prior distribution instead of having the prior distribution improve our estimates of the genotypes.

In this article, we propose two classes of prior distributions for genotyping polyploids which are designed to have intermediate levels of flexibility (Section 2). One is the class of discrete unimodal distributions, for which we provide a complete characterization. The other is the class of distributions whose probability masses are proportional to a normal density. We provide optimization details for these classes, along with other classes of priors used in the literature. We demonstrate, through simulation and on real data, that these two classes, and in particular the proportional normal class, are more robust to varying genotype distributions (Section 3). The proportional normal prior is the default in the updog software available on the Comprehensive R Archive Network: https://cran.r-project.org/package=updog.

2 Materials and methods

We now provide a brief description of the model and genotyping procedure we study. We assume that all inference is done one SNP at a time (though see the discussion in Section 4). Let xi be the reference read counts for individual i, let ni be the total number of reads for individual i, let K be the ploidy of the species under study, let yi{0,1,,K} be the (unknown) genotype for individual i, let η be a vector of likelihood-specific parameters (such as the allele bias, sequencing error rate or overdispersion), and let θ be a vector of prior-specific parameters. Different values of θ index the class of allowable genotype distributions. Let f(xi|yi,ni,η) denote the likelihood of xi. This could, e.g. be the likelihood from Gerard et al. (2018) or Voorrips et al. (2011), but we allow it to be more general. Let f(yi|θ) denote the prior probability of genotype yi. We suppose that the inference strategy consists of the following two steps: First, estimate the likelihood- and prior-specific parameters by maximum marginal likelihood:
(1)
and second, derive the posterior distribution of the genotypes for each individual:
(2)

These posterior probabilities are then used to make genotype calls. Deriving a general-purpose expectation-maximization algorithm (Dempster et al., 1977) to obtain η^ and θ^ is not difficult (Supplementary Section S2.1), though special optimization considerations are needed for some of the priors we discuss in this section (Supplementary Sections S2.1 and S2.2).

Prior distributions that have been used in the past (though not necessarily in the above framework) include:

1. The discrete uniform distribution (McKenna et al., 2010):
(3)
2. The distribution of autopolyploid sibling genotypes (Serang et al., 2012):
(4)

Equation (4) is merely a convolution of two hypergeometric distributions. The 1 and 2 parameters are the genotypes of the two parents. We will denote this distribution as the ‘F1 distribution’ as it results from an F1 cross of autopolyploid individuals.

3. The binomial distribution (Li, 2011; Martin et al., 2010):
(5)

The α parameter is the allele frequency of the reference allele. The binomial distribution results by assuming the individuals are in Hardy-Weinberg equilibrium.

4. The beta-binomial distribution (Blischak et al., 2018):
(6)
where B(a,b) represent the beta function with parameters a and b. The α parameter is the allele frequency of the reference allele, while γ is an overdispersion parameter. This is slightly different than the prior used in Blischak et al. (2018) as there the overdispersion parameter is individual-specific, while in Equation (6) it is locus-specific.
5. The general categorical distribution (DePristo et al., 2011; Li, 2011):
(7)

There are many issues with these previously used priors. The issues with the discrete uniform and general categorical distributions were already noted in Section 1. The F1 and binomial distributions are only applicable in specific situations. The F1 distribution can only be applied if the individuals are known siblings; the binomial distribution relies on the strong assumptions of Hardy-Weinberg equilibrium (Crow, 1988). Though the beta-binomial distribution can account for overdispersion, it is unable to account for underdispersion, which may be caused by, e.g. unobserved relatedness (e.g. the F1 distribution is generally less dispersed than a binomial, and hence a beta-binomial).

In order to account for both over- and underdispersion, in our first proposal we model genotype probabilities as being proportional to a normal density:
(8)
where N(·|μ,σ2) represent the normal density with mean μ and variance σ2. This distribution, though somewhat ad hoc, can approximate very well distributions from the classes of binomial, beta-binomial, and F1 distributions (Supplementary Fig. S3). One can thus essentially consider it a general class that encompasses as special cases these classes of distributions. Varying σ2 allows for the prior distribution to accommodate both over- and underdispersion. We call this the ‘proportional normal distribution’.

Motivated by the fact that most of the prior distributions previously proposed in the literature are unimodal, our second proposal is to model genotype probabilities within the flexible class of unimodal probability mass functions. In Supplementary Section S1 we exactly characterize this class of distributions using mixtures of discrete uniform distributions. This idea was motivated by the work of Stephens (2017) who approximated continuous densities using mixtures of continuous uniform distributions. Our characterization lends itself to a convex optimization problem during the M-step of the EM algorithm (Supplementary Section S2.1). But rather than use standard convex optimization techniques, we found it faster to develop a weighted EM scheme to optimize over the class of unimodal distributions (Supplementary Section S2.2).

All priors explored in this article are summarized in Supplementary Table S2.

3 Results

3.1 Simulation results

In this section we compare, through simulation, the seven different classes of prior distributions from Section 2. Our strategy was to generate genotypes under seven different genotype distributions, where each genotype distribution was designed to be favorable to one of the classes of prior distributions. The discrete uniform, binomial (Hardy-Weinberg), beta-binomial and F1 favorable genotype distributions were all generated according to their assumed priors. For the genotype distribution favorable to the proportional normal prior, we generated genotypes according to the proportional normal distribution with a small variance since we motivated this distribution by the need to account for underdispersion. For the unimodal favorable genotype distribution, we allowed genotype probabilities to decline linearly from K to 0 since none of the other more structured prior classes have such linear declines. Finally, for the genotype distribution favorable to the general categorical prior, we specified non-unimodal genotype proportions, which none of the other priors should be able to accommodate. The specific genotype distributions are listed in Supplementary Section S3 and are graphically represented in Supplementary Figure S6.

Given individual genotypes, we generated sequencing data according to the likelihood in Gerard et al. (2018), setting the sequencing error rate to 0.001, varying the overdispersion parameter over {0,0.005,0.01} (where larger values indicate greater dispersion), and varying the allele bias parameter over {1,0.75,0.5} (where values further from 1 indicate greater bias). Each repetition, we generated thirty hexaploid individuals at a sequencing depth of 100 reads. For each unique combination of allele bias, overdispersion, and genotype distribution, we generated 500 datasets. For each dataset, we fit updog assuming one of the seven prior classes described in Section 2.

The results under conditions of no allele bias and no overdispersion are presented in Figures 2 and 3. The results under varying levels of bias and overdispersion are presented in Supplementary Figures S7–S27. Figure 2 plots the proportion of individuals genotyped correctly (with larger values being better) whereas Figure 3 plots the difference between the estimated proportion of individuals misclassified and the actual proportion of individuals misclassified (with values closer to 0 being better). In Figures 2 and 3, we see that each class of prior distributions performs best when the genotype distribution is designed to be favorable to it. Except for the classes of proportional normal distributions and unimodal distributions, every class performs very poorly under some genotype distribution. However, the class of unimodal distributions performs poorly under some genotype distributions under larger levels of bias and overdispersion (Supplementary Figs S8, S11, S15 and S18). The class of proportional normal distributions only performs poorly under the (unrealistic) general categorical-favorable genotype distribution with large levels of bias and overdispersion (Supplementary Figs S13 and S20). Only the class of proportional normal priors performs adequately when the gentoypes are very underdispersed (Supplementary Figs S11 and S18). Estimates of the allele bias when using the class of proportional normal priors appear to be unbiased (Supplementary Figs S23–S27). These results indicate that the class of proportional normal distributions performs the best under a wide variety of settings.

Proportion of individuals genotyped correctly (y-axis) stratified by the assumed class of prior distributions (x-axis) used in an updog fit. The facets index the distributions used to generate the genotypes, with each genotype distribution designed to be favorable to one of the classes of prior distributions (orange). The distributions are ordered roughly from least flexible (the discrete uniform) to most flexible (the general categorical class). The read counts were generated assuming no allele bias and no overdispersion
Fig. 2.

Proportion of individuals genotyped correctly (y-axis) stratified by the assumed class of prior distributions (x-axis) used in an updog fit. The facets index the distributions used to generate the genotypes, with each genotype distribution designed to be favorable to one of the classes of prior distributions (orange). The distributions are ordered roughly from least flexible (the discrete uniform) to most flexible (the general categorical class). The read counts were generated assuming no allele bias and no overdispersion

Estimated proportion of individuals incorrectly genotyped subtracted from the actual proportion of individuals incorrectly genotyped (y-axis) stratified by the assumed class of prior distributions (x-axis) used in an updog fit. The horizontal line (at y = 0) indicates unbiased estimation of the misclassification error rate. The facets index the distributions used to generate the genotypes, with each genotype distribution designed to be favorable to one of the classes of prior distributions. The distributions are ordered roughly from least flexible (the discrete uniform) to most flexible (general categorical). The read counts were generated assuming no allele bias and no overdispersion
Fig. 3.

Estimated proportion of individuals incorrectly genotyped subtracted from the actual proportion of individuals incorrectly genotyped (y-axis) stratified by the assumed class of prior distributions (x-axis) used in an updog fit. The horizontal line (at y = 0) indicates unbiased estimation of the misclassification error rate. The facets index the distributions used to generate the genotypes, with each genotype distribution designed to be favorable to one of the classes of prior distributions. The distributions are ordered roughly from least flexible (the discrete uniform) to most flexible (general categorical). The read counts were generated assuming no allele bias and no overdispersion

3.2 Sweet potato results

In this section, we compare the seven classes of prior distributions discussed in Section 2 using the data from Shirasawa et al. (2017). The hexaploid sweet potatoes in these data are known siblings, resulting from a single generation of selfing (an S1 cross). Thus, the class of genotype distributions is known to be F1 (Equation 4) with the constraint that 1=2 (Gerard et al., 2018), which we will simply call ‘S1’. We fit updog, using all seven prior classes in Section 2 as well as the S1 prior class, to the 1000 SNPs with mean read-depth closest to 100. We ran updog only on a sample of twenty individuals, as having fewer individuals makes the class of prior distributions most important. For each SNP, we calculated the Euclidean distances between all pairs of posterior mean genotypes resulting from the eight different fitted prior classes. We also calculated the Hamming distances (divided by the number of individuals) between the posterior mode genotypes resulting from the eight different fitted prior classes.

The results are presented in Figure 4. As the genotype distribution is known to belong to the S1 class, we will consider its genotypes a proxy for the true genotypes. Thus, prior classes that are closest to S1 (in terms of either Euclidean distance from posterior mean genotypes or Hamming distance from posterior mode genotypes) perform the best. We see in Figure 4 that F1 performs the best. This is expected since S1 is a small subclass of the class of F1 distributions. The second best performance comes from the class of proportional normal priors. Thus, on real data, we see the gains attained by using the class of proportional normal priors. Mean distances between the results of all methods are presented in Supplementary Figures S28 and S29.

Boxplots of distances (x-axis) stratified by prior class (y-axis). The left facet contains the Euclidean distances between posterior means when either using the S1 prior or the specified prior on the y-axis. The right facet contains the proportion of individuals genotyped differently when either using the S1 prior or the specified prior on the y-axis. Distances closer to 0 indicate superior performance
Fig. 4.

Boxplots of distances (x-axis) stratified by prior class (y-axis). The left facet contains the Euclidean distances between posterior means when either using the S1 prior or the specified prior on the y-axis. The right facet contains the proportion of individuals genotyped differently when either using the S1 prior or the specified prior on the y-axis. Distances closer to 0 indicate superior performance

4 Discussion

In this article, we reviewed the classes of prior distributions used to genotype polyploids, found these classes either too flexible or too restrictive, and proposed two new classes of priors of intermediate levels of flexibility. This required us to characterize the class of discrete unimodal distributions using mixtures of discrete uniform distributions. We then demonstrated, through simulation, that our proposals work better under the genotyping inference scheme in updog. In particular, we recommend to use by default the class of proportional normal distributions as it resulted in robust performance under different genotype distributions. The proportional normal prior is the default in the updog software available on the Comprehensive R Archive Network: https://cran.r-project.org/package=updog. Our classes of priors are also more generally applicable to other inference schemes.

Our priors are appropriate defaults for a wide variety of genotyping scenarios. For example, breeding populations might exhibit violations in Mendelian segregation proportions (Equation 4) due to double reduction (Stift et al., 2010), preferential pairing (Voorrips and Maliepaard, 2012), or resulting from an allele being semi-lethal. Additionally, unstructured populations might exhibit violations from Hardy-Weinberg equilibrium (Equation 5) due to selection or non-random mating (Crow, 1988). Our proportional normal and unimodal classes of priors may be used in all of these scenarios without explicitly modeling these processes.

Accurate dosage estimation is vital for many statistics/bioinformatics procedures. We note, however, that not all genomic studies require dosage estimates. For example, transcriptional (Colle et al., 2019), genomic rearrangement (Meng et al., 2019) and even genomic prediction studies (de Bem Oliveira et al., 2019; Sverrisdóttir et al., 2017) have been performed without dosage estimates.

Here, we only analyzed one SNP at a time, though some recent approaches have also attempted to borrow strength between SNPs (Blischak et al., 2018; Clark et al., 2019). In particular, Clark et al. (2019) tries to account for linkage disequilibrium (LD) between SNPs. However, their method, though intuitive, is somewhat ad hoc. Generalizing our proportional normal prior would allow for a more principled approach to accounting for LD by simply letting the genotype distribution across multiple SNPs be proportional to a multivariate normal. To account for the high-dimensionality present due to the large number of SNPs, one could impose strong structural constraints on the covariance matrix, such as letting it decay exponentially according to recombination rate (Wen and Stephens, 2010).

Acknowledgements

All graphics were made using ggplot2 (Wickham, 2016) in the R statistical language (R Core Team, 2019). We would like to thank Matthew Stephens for some useful conversations. We thank the College of Arts and Sciences at American University for providing a travel grant from the Mellon Fund. This allowed the authors to meet and collaborate on this article.

Conflict of Interest: none declared.

References

Amadeu
 
R.R.
 et al. (
2019
)
Impact of dominance effects on autotetraploid genomic prediction
.
Crop Sci
.

Baird
 
N.A.
 et al. (
2008
)
Rapid SNP discovery and genetic mapping using sequenced RAD markers
.
PLoS One
,
3
,
e3376
e3377
.

Benevenuto
 
J.
 et al. (
2019
)
How can a high-quality genome assembly help plant breeders?
Gigascience
,
8
.

Blischak
 
P.D.
 et al. (
2018
)
SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data
.
Bioinformatics
,
34
,
407
415
.

Clark
 
L.V.
 et al. (
2019
)
polyRAD: genotype calling with uncertainty from sequencing data in polyploids and diploids
.
G3
,
9
,
663
673
.

Colle
 
M.
 et al. (
2019
)
Haplotype-phased genome and evolution of phytonutrient pathways of tetraploid blueberry
.
Gigascience
,
8
,
giz012
.

Crow
 
J.F.
(
1988
)
Eighty years ago: the beginnings of population genetics
.
Genetics
,
119
,
473
476
.

de Bem Oliveira
 
I.
 et al. (
2019
)
Genomic prediction of autotetraploids; influence of relationship matrices, allele dosage, and continuous genotyping calls in phenotype prediction
.
G3
,
9
,
1189
1198
.

Dempster
 
A.P.
 et al. (
1977
)
Maximum likelihood from incomplete data via the EM algorithm
.
J. R. Stat. Soc. B
,
39
,
1
22
.

DePristo
 
M.A.
 et al. (
2011
)
A framework for variation discovery and genotyping using next-generation DNA sequencing data
.
Nat. Genet
.,
43
,
491
498
.

Elshire
 
R.J.
 et al. (
2011
)
A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species
.
PLoS One
,
6
,
e19379
.

Endelman
 
J.B.
 et al. (
2018
)
Genetic variance partitioning and genome-wide prediction with allele dosage information in autotetraploid potato
.
Genetics
,
209
,
77
87
.

Ferrão
 
L.F.V.
 et al. (
2018
)
Insights into the genetic basis of blueberry fruit-related traits using diploid and polyploid models in a GWAS context
.
Front. Ecol. Evol
.,
6
,
107
.

Ferreira
 
R.C.U.
 et al. (
2019
)
Genetic mapping with allele dosage information in tetraploid Urochloa decumbens (stapf) r. d. webster reveals insights into spittlebug (Notozulia entreriana berg) resistance
.
Front. Plant Sci
.,
10
,
92
.

Gerard
 
D.
 et al. (
2018
)
Genotyping polyploids from messy sequencing data
.
Genetics
,
210
,
789
807
.

de C Lara
 
L.A.
 et al. (
2019
)
Genomic selection with allele dosage in Panicum maximum Jacq
.
G3
,
9
,
2463
2475
.

Li
 
H.
(
2011
)
A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data
.
Bioinformatics
,
27
,
2987
.

Martin
 
E.R.
 et al. (
2010
)
SeqEM: an adaptive genotype-calling approach for next-generation sequencing studies
.
Bioinformatics
,
26
,
2803
2810
.

Maruki
 
T.
,
Lynch
M.
(
2017
)
Genotype calling from population-genomic sequencing data
.
G3
,
7
,
1393
1404
.

McKenna
 
A.
 et al. (
2010
)
The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data
.
Genome Res
.,
20
,
1297
1303
.

Meng
 
Z.
 et al. (
2019
) Characterization of a Saccharum spontaneum with a basic chromosome number of x = 10 provides new insights on genome evolution in genus Saccharum. Theor. Appl. Genet.

R Core Team. (

2019
)
R: A Language and Environment for Statistical Computing
.
R Foundation for Statistical Computing
,
Vienna, Austria
.

Rosyara
 
U.R.
 et al. (
2016
)
Software for genome-wide association studies in autopolyploids and its application to potato
.
Plant Genome
,
9
.

Serang
 
O.
 et al. (
2012
)
Efficient exact maximum a posteriori computation for Bayesian SNP genotyping in polyploids
.
PLoS One
,
7
,
e30906
13
.

Shirasawa
 
K.
 et al. (
2017
)
A high-density SNP genetic map consisting of a complete set of homologous groups in autohexaploid sweetpotato (Ipomoea batatas)
.
Sci. Rep
.,
7
.

Soltis
 
D.E.
 et al. (
2014
)
The polyploidy revolution thenand now: Stebbins revisited
.
Am. J. Bot
.,
101
,
1057
1078
.

Soltis
 
P.S.
,
Soltis
D.E.
(
2000
)
The role of genetic and genomic attributes in the success of polyploids
.
Proc. Natl. Acad. Sci. USA
,
97
,
7051
7057
.

Stephens
 
M.
(
2017
)
False discovery rates: a new deal
.
Biostatistics
,
18
,
275
294
.

Stift
 
M.
 et al. (
2010
)
Inheritance in tetraploid yeast revisited: segregation patterns and statistical power under different inheritance models
.
J. Evol. Biol
.,
23
,
1570
1578
.

Sverrisdóttir
 
E.
 et al. (
2017
)
Genomic prediction of starch content and chipping quality in tetraploid potato using genotyping-by-sequencing
.
Theor. Appl. Genet
.,
130
,
2091
2108
.

Udall
 
J.A.
,
Wendel
J.F.
(
2006
)
Polyploidy and crop improvement
.
Crop Sci
.,
46(Suppl. 1
),
S3
.

Uitdewilligen
 
J.G.
 et al. (
2013
)
A next-generation sequencing method for genotyping-by-sequencing of highly heterozygous autotetraploid potato
.
PLoS One
,
8
,
e62355
.

Voorrips
 
R.E.
,
Maliepaard
C.A.
(
2012
)
The simulation of meiosis in diploid and tetraploid organisms using various genetic models
.
BMC Bioinformatics
,
13
,
248
.

Voorrips
 
R.E.
 et al. (
2011
)
Genotype calling in tetraploid species from bi-allelic marker data using mixture models
.
BMC Bioinformatics
,
12
,
172
.

Wen
 
X.
,
Stephens
M.
(
2010
)
Using linear predictors to impute allele frequencies from summary or pooled genotype data
.
Ann. Appl. Stat
.,
4
,
1158
1182
.

Wickham
 
H.
(
2016
)
ggplot2: Elegant Graphics for Data Analysis
.
Springer-Verlag
,
New York
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
Associate Editor: Russell Schwartz
Russell Schwartz
Associate Editor
Search for other works by this author on:

Supplementary data