Recursive Algorithms for Modeling Genomic Ancestral Origins in a Fixed Pedigree

The study of gene flow in pedigrees is of strong interest for the development of quantitative trait loci (QTL) mapping methods in multiparental populations. We developed a Markovian framework for modeling ancestral origins along two homologous chromosomes within individuals in fixed pedigrees. A highly beneficial property of our method is that the size of state space depends linearly or quadratically on the number of pedigree founders, whereas this increases exponentially with pedigree size in alternative methods. To calculate the parameter values of the Markov process, we describe two novel recursive algorithms that differ with respect to the pedigree founders being assumed to be exchangeable or not. Our algorithms apply equally to autosomes and sex chromosomes, another desirable feature of our approach. We tested the accuracy of the algorithms by a million simulations on a pedigree. We demonstrated two applications of the recursive algorithms in multiparental populations: design a breeding scheme for maximizing the overall density of recombination breakpoints and thus the QTL mapping resolution, and incorporate pedigree information into hidden Markov models in ancestral inference from genotypic data; the conditional probabilities and the recombination breakpoint data resulting from ancestral inference can facilitate follow-up QTL mapping. The results show that the generality of the recursive algorithms can greatly increase the application range of genetic analysis such as ancestral inference in multiparental populations.

ABSTRACT The study of gene flow in pedigrees is of strong interest for the development of quantitative trait loci (QTL) mapping methods in multiparental populations. We developed a Markovian framework for modeling ancestral origins along two homologous chromosomes within individuals in fixed pedigrees. A highly beneficial property of our method is that the size of state space depends linearly or quadratically on the number of pedigree founders, whereas this increases exponentially with pedigree size in alternative methods. To calculate the parameter values of the Markov process, we describe two novel recursive algorithms that differ with respect to the pedigree founders being assumed to be exchangeable or not. Our algorithms apply equally to autosomes and sex chromosomes, another desirable feature of our approach. We tested the accuracy of the algorithms by a million simulations on a pedigree. We demonstrated two applications of the recursive algorithms in multiparental populations: design a breeding scheme for maximizing the overall density of recombination breakpoints and thus the QTL mapping resolution, and incorporate pedigree information into hidden Markov models in ancestral inference from genotypic data; the conditional probabilities and the recombination breakpoint data resulting from ancestral inference can facilitate follow-up QTL mapping. The results show that the generality of the recursive algorithms can greatly increase the application range of genetic analysis such as ancestral inference in multiparental populations. Many complicated experimental crosses have recently been produced for mapping QTL, particularly in plants (e.g., Kover et al. 2009;Sannemann et al. 2015). In contrast to traditional biparental populations, multiple parents have been used to increase genetic diversity and thus QTL segregation probability, and many generations of intercross mating have been used to increase accumulated recombination breakpoints in sampled offspring in the final generation and thus QTL mapping resolution. The expected density of recombination points is one of key quantities for optimizing experimental designs prior to collecting genotypic and phenotypic data (Rockman and Kruglyak 2008). Furthermore, the identification of recombination breakpoints from genotypic data provides useful information for increasing detection power and mapping resolution (Xu 2013;Li et al. 2015). The primary aims of this paper are to develop the theory of gene flow in a fixed pedigree with arbitrary structure to calculate the prior distribution of recombination points, and to apply the theory for ancestral inference and detection of recombination points from genotypic data in multiparental populations. The theory of one-locus gene flow in a pedigree has been well developed. The haploid genomes of pedigree founders are designated by distinct labels, called founder genome labels (FGLs). A set of genes are distinct individuals. Rostron (1978) developed a recursive algorithm for calculating the two-gene coefficients of inbreeding and kinship. Karigl (1981) developed a recursive algorithm for calculating the coefficients of the fifteen gene identity states for four genes between two individuals. Thompson (1983) showed that a similar algorithm applies to the probabilities of joint descent of multiple genes from a specific FGL. Bauman et al. (2008) and Zhou et al. (2012) developed a recursive algorithm for calculating the ancestral coefficients such as the probability of one gene descending from a given FGL and the probability of two genes descending from two given FGLs (distinct or not).
The theory of gene flow at two linked loci in a pedigree has also been developed. Weir and Cockerham (1969) and Cockerham and Weir (1973) developed a recursive algorithm for calculating the two-locus inbreeding coefficient of an individual that is defined as a linear function of the probabilities of the fifteen identity states for four genes, two at each of two linked loci. Thompson (1988) developed a recursive algorithm for calculating the two-locus kinship between two individuals defined as the probability of IBD at both loci, see Hill and Weir (2007) for calculating multi-locus IBD probabilities in random mating populations. In contrast to the identity coefficients, the ancestral coefficient at two loci is defined as the two-locus diplotype probabilities, and there are 4 L possible diplotypes for four genes in an individual with respect to L distinct FGLs. Thus, the calculation of the two-locus ancestral inferences has been developed only for simple breeding systems including selfing, brother-sister (or sibling) mating, and parent-offspring mating (Haldane and Waddington 1931;Johannes and Colome-Tatche 2011;Broman 2012).
There has been much interest in developing the theory of gene flow in a pedigree with chromosomes assumed to be a genomic continuum. Donnelly (1983) developed a formal mathematical framework, where the crossover processes in a chromosome pedigree follow jointly a continuous time Markov random walk on the vertices of a hypercube, the time parameter being position along homologous chromosomes. Here a vertex represents one of possible gene transmissions from founders to all non-founders, and the number of vertices in a d-dimensional hypercube is 2 d , d being the number of non-founders. This framework has been used in many kinds of small pedigree systems (e.g., Bickeböller and Thompson 1996;Stefanov 2000;Martin and Hospital 2011). However, it is impractical to apply Donnelly's Markovian framework to a large complex pedigree since the number of states (hypercube vertices) increases exponentially with the pedigree size.
Alternatively, the theory of junctions has been developed to study the gene flow for a genomic continuum (Fisher 1949(Fisher , 1954. A junction is defined as a boundary point (recombination breakpoint) on a chromosome where two distinct FGLs meet. Fisher (1949Fisher ( , 1954 and Bennett (1953Bennett ( , 1954 developed the theory of junctions for simple breeding systems including selfing, brother-sister mating, and parent-offspring mating. It has been extended to populations (Stam 1980;Baird et al. 2003;Chapman and Thompson 2003;MacLeod et al. 2005;Zheng et al. 2014;Zheng 2015). In particular, Zheng et al. (2014) and Zheng (2015) developed a Markovian framework for modeling ancestral origins within an individual with the state space being the possible pairs of FGLs, and thus the number of states is much smaller than that of Donnelly's Markovian framework (Donnelly 1983). On the other hand, our Markovian framework (Zheng et al. 2014;Zheng 2015) is restricted by two assumptions: the mating scheme from one generation to the next is random mating, and the FGLs are assumed to be exchangeable so that the probability distribution of ancestral origins in offspring is invariant to all possible permutations of the FGLs. However, the exchangeability generally does not hold for a mapping population produced via an arbitrary breeding pedigree.
In this paper, we relax the restriction of our previous Markovian framework by extending it to a fixed pedigree. We first describe a recursive algorithm (denoted by EXCH) under the assumption of FGLs being exchangeable, which is applicable for the construction of Markovian framework in simple breeding schemes. Then we develop a recursive algorithm (denoted by NON-EXCH) for modeling ancestral origins to relax the exchangeability assumption. The two recursive algorithms apply to both autosomes and sex chromosomes if they exist. The results of both recursive algorithms are compared with those from extensive simulations on a classical pedigree. We first apply the two algorithms to compare different population designs (or breeding pedigrees) prior to experiments, for example, in terms of the overall density of junctions (recombination breakpoints), an important factor of QTL mapping resolution. In addition, the non-exchangeability can be illustrated under various breeding designs. Then we apply the two algorithms to incorporate pedigree information for ancestral inference from genotypic data in simulated and real collaborative cross (CC) populations, resulting in conditional probabilities that are necessary for downstream QTL mapping.

Notations and overview
The symbols are briefly explained in Table 1, and some of them are illustrated in Figure 1. Pedigree members with unspecified mother and father are called the founders of the pedigree, and the other members are non-founders. The recursive algorithm presupposes that pedigree members are ordered under the constraint that parents always precede children. We denote by subscripts a, b, c the members of a pedigree, and denote by a . b individual a comes after individual b ð6 ¼ aÞ: We denote by superscripts m the maternally derived genes or chromosomes, and p for the paternally derived. We denote by superscripts o, o 1 , o 2 , and o 3 the unspecified parental origins (m or p) of genes or chromosomes. For the sake of brevity, f ðaÞ ¼ fAA; XX; XYg denotes a horizontal piecewise equation, so that f ðaÞ ¼ AA; XX; and XY for the autosomes of individual a, XX chromosomes of female a, and XY chromosomes of male a, respectively.
In the derivations of recurrence relations of a quantity, we will focus on a particular ordering a . b if the quantity concerns genes in two individuals; we will always trace the maternally derived gene or chromosome within non-founder a back to its two parental genes or chromosomes if the maternally derived gene or chromosome concerns the quantity, and otherwise trace the paternally derived gene or chromosome. Throughout this paper, ♂ and ♀ always denote the father and mother of individual a, rather than any other, respectively.
The identity coefficient f o1o2 ab ð12Þ denotes that two genes at a single locus have identity state ð12Þ; where the first gene is in individual a and has parental origin o 1 ; and the second gene is in individual b and has parental origin o 2 ( Figure 1A). There are only two two-gene identity states: IBD (11) or non-IBD (12), and f o1o2 ab ð11Þ ¼ 1 2 f o1o2 ab ð12Þ: Similarly, we denote by f o1o2o3 abc ð123Þ the three-gene identity coefficient with the identity state being the non-IBD state ð123Þ; which is required together with two-gene identity coefficients for deriving the recursive relations for junction densities.
An expected identity junction density is defined as the expected number of the specified type of recombination breakpoints per Morgan along two homologous chromosomes. Here expectation concerns the stochasticity of gene flow from founders to descendants on a fixed pedigree. And identity indicates that only the identity patterns (not specific FGLs) on the two sides of junctions matter. In the recursive algorithm NON-EXCH, we will introduce ancestral junction densities where the specific FGLs do matter. For example, J o1o2 ab ð1232Þ denotes that the four genes, two at each side of a breakpoint along two chromosomes, have genetic identity state ð1232Þ ( Figure 1B). Here the first and third genes are on the left and right sides of a breakpoint, respectively, and they are in individual a and have parental origin o 1 : And the second and fourth genes are on the left and right sides, respectively, and they are in individual b and have parental origin o 2 : We adopt the notation of Nadot and Vayssiex (1973) for an identity state: the genes are labeled by natural integers starting 1, and the same integer is assigned to the gene that is IBD with a previous gene.
Following the previous framework (Zheng et al. 2014;Zheng 2015), we model ancestral origins along two homologous chromosomes within an individual by a continuous time Markov process, which can be described by an initial distribution p of ancestral origins and a rate matrix Q. Under the assumption of exchangeability among FGLs, the initial distribution p of individual a is determined by the identity coefficient f mp aa ð11Þ; and the rate matrix Q of individual a is determined by the five expected identity junction densities J  Zheng (2015) for an illustrative construction of rate matrix Q from the five junction densities.
In the following, we describe a recursive algorithm for calculating the identity coefficients and the expected junction densities for an individual in a given pedigree. In short, let 1 S be an indicator function that equals 1 if statement S is true and 0 otherwise, and let 0 S2 S1 ¼ 1 2 1 S1 1 S2 be an indicator function that equals 0 if statements S1 and S2 are true and 1 otherwise.

Identity coefficients
The recurrence relations for the two-and three-gene identity coefficients are similar to the recursive algorithm of Karigl (1981). However, the calculation proceeds by tracing back genes instead of individuals, see also e.g., Jacquard (1974), Nadot and Vayssiex (1973), Denniston (1974), and Garcia-Cortes (2015). Thus, we can account for the asymmetry between maternally and paternally derived chromosomes, and account for autosomes and sex chromosomes simultaneously. Throughout this paper, we assume that there are no crossovers between X and Y sex chromosomes within a male. In addition, we focus on non-IBD state ð12Þ instead of IBD state ð11Þ to simplify the recurrence relations.
We derive the recurrence relations according to the Mendelian inheritance rules: (1) a maternally (or paternally) derived autosomal gene descends from the two genes within the mother (or father, respectively) with equal probability 1=2; (2) similar to (1) for a maternally derived X-linked gene, and (3) a paternally derived sex-linked gene within a female (or male) must descend from the X-linked (or Y-linked, respectively) gene of the father. The recurrence relations of the two-gene identity coefficient f o1o2 ab ð12Þ for non-founder a are given by n Table 1 List of symbols and their brief descriptions Symbol Description 1 S An indicator function that equals 1 if statement S is true and 0 otherwise 0 S2 S1 An indicator function that equals 0 if S1 and S2 are true and 1 otherwise.
Pedigree members ♂, ♀ Father of a, mother of a f mp ab ð11Þ Probability of IBD between maternal gene of a and paternal gene of b f o1o2 ab ð11Þ Probability of IBD between two genes of ab with parental origins o 1 o 2 f o1o2 ab ð12Þ Probability of non-IBD between two genes of ab with parental origins Probability of non-IBD among three genes of abc with parental origins o 1 o 2 o 3 R m a Expected junction density (per Morgan) on the maternal chromosome of a R o a On the chromosome of a with parental origin o ðg 1 g 2 g 3 g 4 Þ A junction type denoted by two-locus four-gene identity state: g 1 g 2 (g 3 g 4 ) are on the left-hand (right-hand) side of junction, and g 1 g 3 and g 2 g 4 are two haplotypes. Seven types: 1112, 1121, 1122, 1211, 1213, 1222, and 1232 J o1o2 ab ðg 1 g 2 g 3 g 4 Þ Expected density of junction type ðg 1 g 2 g 3 g 4 Þ; where haplotype g 1 g 3 of a has parental origin o 1 and haplotype g 2 g 4 of b has parental origin o 2 r mp aa Expected overall junction density for a. r apply to three-gene identity coefficients by permuting the three genes, and thus we consider only a particular ordering a $ b $ c; and we have for o 1 ; o 2 2 fm; pg; where the first (second) equation traces the maternally (or paternally, respectively) derived gene within individual a.
The boundary conditions of these recurrence relations are given according to the assignment of FGLs to the founders in a pedigree. If we assign distinct FGLs to each haploid genome of an outbred founder, all multi-gene non-IBD probabilities are 1 if any pair of genes is not from the same haploid genome, and 0 otherwise. Four-gene or higher order identity coefficients may be derived similarly, but they are not required for the derivations of junction densities.

Expected identity junction densities
Let R m a ðR p a Þ be the maternal (paternal) map expansion, the expected density of recombination breakpoints on the maternally (paternally) derived chromosome of individual a. The implicit two-gene identity state for the map expansions is ð12Þ: It holds that  (Zheng 2015). The expected overall junction density for individual a is given by r The recurrence relations for the two map expansions of non-founder a are relatively simple, and they are given by (Zheng 2015 according to the theory of junctions that a new identity junction is formed whenever a recombination event occurs between two homologous chromosomes that are non-IBD at the location of a crossover (Fisher 1954). Since chromosomes are modeled as a genomic continuum, the shared junction type (1122) between two chromosomes can be formed only by duplicating the chromosome segment harboring the recombination breakpoint. For the expected junction density J o1o2 aa ð1122Þ of nonfounder a, we have Ã ; for o; o 1 ; o 2 2 fm; pg; where and the last equation is obtained by reversing the ordering of the two haplotypes. In the first equation, J oo aa ð1122Þ refers to the same chromosome in individual a, and thus equals R o a by definition. We trace the maternally derived haplotype within non-founder a back to its two parental haplotypes if the maternally derived haplotype concerns the junction density, and otherwise track the paternally derived haplotype.
We derive the recurrence relations for J o1o2 ab ð1213Þ and J o1o2 ab ð1232Þ jointly. The recombination breakpoint for ð1232Þ occurs on the first chromosome of a homologous pair, and it is on the second chromosome for ð1213Þ: The junction type ð1213Þ becomes ð1232Þ when reversing the ordering of two haplotype of junction type ð1213Þ: We have  Table 1 for brief explanations of these quantities. J mo ab ð1232Þ ¼  The boundary conditions are given by the assignment of FGLs to the founders. The within-and between-founder identity junction densities are 0 if a single FGL is assigned to the whole haploid genome of each founder.

RECURSIVE ALGORITHM NON-EXCH
Notations and overview Let V denote the set of distinct FGLs assigned to the founders of a pedigree. We adopt the same notations in the recursive algorithm with FGL exchangeability except for the following changes. As shown in Figure 1, we replace identity states by ancestral states: ð1122Þ/ðiijjÞ; ð1211Þ/ðijiiÞ; ð1213Þ/ðijikÞ; ð1222Þ/ðijjjÞ; and ð1232Þ/ðijkjÞ for the two-chromosome expected junction densities, and R m a /R m a ðijÞ and R p a /R p a ðijÞ explicitly for the map expansions, where i; j; k 2 V are different from each other (denoted by i 6 ¼ j 6 ¼ k).
We represent a two-gene ancestral state by ðijÞ; and a three-gene ancestral state by ðijkÞ; where i; j; k 2 V are not necessary different from each other. In addition we introduce one-gene ancestral coefficient f o a ðiÞ; the probability that the maternally ðo ¼ mÞ or paternally ðo ¼ pÞ derived gene in individual a carries FGL i 2 V: The identity coefficients can be obtained by summing the corresponding ancestral coefficients. For example, f o1o2 ab ð11Þ ¼ P i2V f o1o2 ab ðiiÞ: Similar relationships hold between the expected identity junction densities and the expected ancestral junction densities, for example, J o1o2 ab ð1122Þ ¼ : Similarly, the initial distribution p of the Markov chain can be constructed from the two-gene ancestral coefficients, and the rate matrix Q can be constructed from the expected ancestral junction densities, without assuming the exchangeability of FGLs. Bauman et al. (2008) and Zhou et al. (2012) derived the recurrence relations of the one-and two-gene ancestral coefficients for autosomes by tracing two distinct genes simultaneously in their parents. We derive equivalent recurrence relations for both autosomes and sex chromosomes by tracing one gene once in its parent, instead of simultaneously tracing two genes within an individual. In addition, we derive the recurrent relations of the three-gene ancestral coefficients that are required in the recurrence relations of the expected ancestral junction densities.

Ancestral coefficients
The relations of the ancestral coefficients are very similar to those of the identity coefficients, and they are based on the same rules of Mendelian inheritance. For example for the one-gene ancestral coefficient f o a ðiÞ of non-founder a, we have f m a ðiÞ ¼ ' : See Appendix A for the recurrence relations of the two-gene ancestral coefficient f o1o2 abc ðijÞ and three-gene ancestral coefficient f o1o2o3 abc ðijkÞ: The boundary conditions of the recurrence relations of the ancestral coefficients are given by the FGLs assigned to the founders in a pedigree.

Expected ancestral junction densities
The ancestral map expansions can be expressed in terms of the expected ancestral junction densities as follows Here R o a ðijÞ; o 2 fm; pg refers to the junctions on the single chromosome of a with parental origin o, while the terms on the right-hand side refer to the junctions on the two homologous chromosomes of a. We have J where the contributions of the two-gene ancestral coefficients account for the asymmetry of FGLs i and j, for example, f mp ♀♀ ðijÞ ¼ f pm ♀♀ ðjiÞ 6 ¼ f pm ♀♀ ðijÞ: A new ancestral junction is formed whenever a recombination event occurs between two homologous chromosomes that have the unordered FGLs i and j at the location of a crossover.
The recurrence relations for J o1o2 ab ðiijjÞ are the same as those for J o1o2 ab ð1122Þ; except that identity states are replaced by ancestral states. We have for o; o 1 ; o 2 2 fm; pg; where the last equation is obtained by reversing the ordering of the two haplotypes. See Appendix B for the recurrence relations of J o1o2 ab ðijkjÞ; J o1o2 ab ðijikÞ J o1o2 ab ðijjjÞ, and J o1o2 ab ðijiiÞ: As before, the within-and between-founder ancestral junction densities are 0 if a single FGL is assigned to the whole haploid genome of each founder.

Data availability
Ancestral inference using the two recursive algorithms is implemented in the previous developed RABBIT software, where the function mag-icReconstruct is extended to incorporate pedigree information. RABBIT is available at https://github.com/chaozhi/RABBIT.git, and it is offered under the GNU Affero general public license, version 3 (AGPL-3.0). Example scripts for using the recursive algorithms, simulating genotypic data, and ancestral inference are included. The real CC data have been described by Durrant et al. (2011).

Simulation evaluation
Before applying the two recursive algorithms EXCH and NON-EXCH to multiparental populations, we evaluate their accuracy by forward simulations on the classical pedigree of Native Americans (Figure 3) (Chapman and Jacquard 1971). The pedigree has been previously used for evaluating recursive algorithms (Jacquard 1974;Karigl 1981;Garcia-Cortes 2015). We simulate two linkage groups: one pair of homologous autosomes and one pair of sex chromosomes. We assign FGLs to the haploid or diploid genomes of the founders. Each descendant gamete is specified as a list of FGL segments determined by chromosomal crossovers. The number of crossovers in a linkage group of a gamete follows a Poisson distribution with mean 1. We assume no genetic interference so that the positions of crossovers are independently and randomly distributed across the chromosomes. We obtain simulated results for the pedigree member "M22" (Figure 3) by averaging over 10 6 simulations. Table 2 shows the comparisons between the numerical results from the recursive algorithm EXCH and the simulated results. A unique FGL is assigned to each haploid genome of each founder, so that in total twelve distinct FGLs are assigned to the fixed founders. The differences between the numerical and simulated results are less than 0.002, which is very likely due to the stochasticity of gene flow from founders to descendants. The identity coefficient f mp aa ð11Þ for autosomes is in agreement with the previous result (Karigl 1981). Table 3 evaluates the recursive algorithm NON-EXCH by the forward simulations. A unique FGL is assigned to the whole diploid genome of each founder, so that in total six distinct FGLs are assigned to the fixed founders. Here we use a different assignment of FGLs because Table 3 was otherwise too large. The results show that founder "J" does not contribute to the maternally derived autosome and X chromosome in offspring "M22", which is relatively straightforward from the pedigree structure in We confirmed that the numerical results from the algorithm NON-EXCH are reduced to those from the algorithm EXCH, by summing over possible FGLs under the given identity pattern. In addition, we confirmed the consistency between the two algorithms in the multiparental populations shown in Figure 4.  Table 1 for other notations.

Breeding design
We apply the recursive algorithm EXCH to multiparental populations, which have become attractive for QTL mapping in many animal and plant species. Specifically, we study the inbreeding process by calculating the IBD probability f mp aa ð11Þ; and measure the QTL mapping resolution by calculating the expected overall junction density r mp aa : We focus on multi-way funnel breeding schemes (Figure 4). In the funnel scheme, the founders of each line are randomly permuted, and each line is produced by a intercross scheme that combines all founder genomes through several generations of pairwise crosses prior to repeated generations of inbreeding by e.g., sibling mating. The alternating backcross and the father-daughter backcross have been studied by Welsh and McMillan (2012) for accelerating inbreeding by a simulation approach. Figure 5A and B show the IBD probability and the expected overall junction density as a function of generation for the recombinant inbred lines (RIL) by selfing and the RIL+ by selfing. These plant breeding schemes have been adopted in many recently produced crop multiparent advanced generation intercross (MAGIC) populations (e.g., Huang et al. 2012;Mackay et al. 2014;Pascual et al. 2015). The RIL+ by selfing has one additional generation of intercrossing instead of selfing in the RIL scheme (Teuscher and Broman 2007). Thus, the RIL+ scheme has smaller IBD probability f mp aa ð11Þ ( Figure 5A), and has about 1 per M larger overall junction density r mp aa in a given later generation (t $ 5) ( Figure 5B). Figure 5C and E show that the IBD probabilities f mp aa ð11Þ for the alternating backcross and the father-daughter backcross are the same as those of the RIL by sibling for autosomes and they are smaller than those of the RIL by sibling for sex chromosomes. Welsh and McMillan (2012) measured the inbreeding process by the number of generations to reach complete fixation of genome as homozygous, and showed that the number of generation increases from the alternating backcross to the father-daughter backcross and to the RIL by sibling. The differences in the number of generation can be partially because the differences of the IBD probabilities for sex chromosomes, and partially because the complete fixation refers to one individual for the alternating backcross and two individuals for the father-daughter backcross. Figure 5D and F show that the expected overall junction densities r mp aa for the alternating backcross and the father-daughter backcross are lower than those of the RIL by sibling for autosomes and sex chromosomes, and that the junction densities for the alternating backcross are slightly smaller than those of the father-daughter backcross for autosomes. These results are consistent with those of Welsh and McMillan (2012): the number of chromosome segments in the final inbred lines increases from the alternating backcross to the father-daughter backcross and to the RIL by sibling.

FGL exchangeability
We apply the recursive algorithm NON-EXCH to study the prior FGL exchangeability for the multi-way funnel breeding schemes (Figure 4). We assume that all founder parents are fully inbred, and assign FGLs from A to H to the eight inbred parents in order from left to right. To examine the FGL exchangeability, we calculate the ancestral coefficient f mp aa ðijÞ and the expected ancestral junction density J mp aa ðiijjÞ; the latter being the only ancestral junction type for complete inbred individuals. Figure 6 shows the exchangeability in terms of f mp aa ðijÞ and J mp aa ðiijjÞ for the females in generation t ¼ 6; 11, and 22 for the RIL by sibling. The left panels show the results for autosomes. The FGL nonexchangeability confirms the pedigree inconsistency introduced by The right panels of Figure 6 show the exchangeability patterns for sex chromosomes of the RIL by sibling. The founder parents D, G, and H cannot pass their X chromosomes beyond F1, and thus their FGLs are impossible in generation t $ 2: Similar to autosomes, f mp aa ðijÞ ¼ 0 for ðjiÞ or ðijÞ ¼ ðABÞ and ðEFÞ. For f mp aa ðijÞði ¼ jÞ or f m a ðiÞ; the probability of FGL C is around twice as large as that of A, B, E or F in generation t $ 2; because the X chromosome carrying FGL C in generation 1 is inherited with probability 1, whereas each of the X chromosomes carrying FGLs A, B, E, and F in generation 1 is inherited with probability 1/2. Similarly, the values of J mp aa ðiijjÞ involving FGL Figure 3 The pedigree of Native Americans. It consists of 6 founders and 16 non-founders. Circles denote females, and rectangles for males.
n Table 2 The identity coefficients and the expected identity junction densities for pedigree member "M22" a A unique FGL is assigned to each haploid genome of each founder in Figure 3.
C increase with inbreeding generation, they are about twice as large as others in generation 22. We examine the exchangeability patterns for the other multi-way funnel breeding schemes, and focus on autosomes since the FGL exchangeability does not hold in general for sex chromosomes because of the asymmetry between X and Y chromosomes. The non-exchangeability of f mp aa ðijÞ disappears in a completely inbred individual for all 4-8-and 16-way breeding schemes, because of random chromosomal segregations over many inbreeding generations. However, the FGL exchangeability of J mp aa ðiijjÞ often does not hold even after for a completely inbred individual, for example, see Figure 7 for the breeding schemes after 100 generations of selfing inbreeding.

Ancestral inference
Either one of the two recursive algorithms can be used as a prior to incorporate pedigree information for ancestral inference in multiparental populations from genotypic data, which will be illustrated as follows by simulated and real CC populations.
Simulated CC: We simulate a CC population (Churchill et al. 2004) with 100 independent funnels, and take the female at the last generation. The SNP data for the eight founder mouse strains are from Collaborative Cross Consortium (2012), with marker density 5 SNPs per cM. The genotypic data of a sample female are obtained by simulating the FGLs forwardly and then combining them with the founder SNP data. Allelic errors are assumed to occur independently, and we simulate genotypic data of founders and sampled individuals with the same allelic error probability 0.005.
We analyze four simulated data sets: F11-AA, F11-XX, F22-AA, and F22-XX, where the first part denotes the generation, and the second part denotes the 19 pairs of autosomes (AA) or the sex chromosomes (XX). We perform two analyses for each data set by applying each of the two recursive algorithms as the prior for modeling ancestral origins along two homologous chromosomes within an individual. See Figure 6 for the prior distributions of f mp aa ðijÞ and J mp aa ðiijjÞ as a part of results obtained from the algorithm NON-EXCH. The true allelic error probability is used for the genotypic data of founders and sampled individuals. Table 4 shows the evaluation of the prior FGL exchangeability on ancestral inference in the simulated CC population. The two analyses of the data set F22-AA are almost the same because there is approximately no prior FGL exchangeability for the autosomes in generation t ¼ 22; Figure 4 Illustration of different 8-way funnel breeding schemes. The alternating backcross scheme alternates between mother-son and father-daughter matings in generation t $ t b 0 ; and the father-daughter backcross alternates between father-daughter and random sibling matings in generation t $ t b 0 : Circles denote females, and rectangles for males.
n Table 3 The ancestral coefficients and the expected ancestral junction densities for pedigree member "M22" and the improvement of NON-EXCH over EXCH for F11-AA is 3% because of the FGL non-exchangeability such as that of J mp aa ðiijjÞ in Figure 6. The improvements for sex chromosomes are larger than those for autosomes because of the more pronounced FGL nonexchangeability ( Figure 6). The improvement for the data set F22-XX is 6%, and it increases to 9% for F11-XX.  The identity coefficient f mp aa ð11Þ and the expected overall junction density f mp aa for the breeding schemes in Figure 4. (A-B) The results for selfing mating schemes 1-2 in Figure 4. (C-D) The results for the autosomes of mating schemes 3-5 with backcross starting from t b 0 ¼ 12 in Figure 4. (E-F) The results for the sex chromosomes.
Real CC: The real CC population consists of 120 lines that are sampled at generation t in the range of 8 # t # 14 (Durrant et al. 2011). For each line, we estimate the sampling generation by using the algorithm EXCH and taking the generation with the maximum likelihood, where the funnel code is not required. Then we estimate the funnel code by using the algorithm NON-EXCH and an iterative maximum likelihood, where a funnel code is proposed by slightly disturbing the current funnel code and it is accepted if the likelihood is increased. We compare the algorithms EXCH and NON-EXCH with GAIN (Liu et al. 2010), the latter being specifically developed for the CC.
To study the effect of marker densities on ancestral inference, we analyze only the first pair of homologous autosomes and thin the full dataset by taking every second SNP markers, and repeat to obtain nested sub datasets. The data fractions or the relative marker densities are given by r M ¼ 1; 2 21 ; :::; 2 27 : The absolute maximum marker density is 145 SNPs per cM, 28 times higher than that of simulated CC. From the full dataset r M ¼ 1; we calculate the marginal posterior probabilities by NON-EXCH, EXCH, and GAIN, and set the pseudo-true values to the most probable ancestral origins if they are the same among the three methods. Overall the pseudo-true ancestral origins for 98:4% observed genotypes are obtained. Figure 8 shows that the results on ancestral inferences are only slightly different among the three methods. The results of GAIN and NON-EXCH contain no pedigree inconsistencies, whereas NON-EXCH assigns ancestral origins to the impossible mating pairs of founder parents with probability around 0.005. The wrongly assigned probability for GAIN is larger than those of NON-EXCH and EXCH, and the difference increases with the decreasing marker density. The ranking of the wrong called probability from lowest to highest is NON-EXCH, EXCH, and GAIN, and the improvement of NON-EXCH over Figure 7 The FGL non-exchangeability patterns of the expected ancestral junction density J mp aa ðiijjÞ: The results are for autosomes of the multi-way RIL by 100 generations of selfing or sibling. The FGLs for founder parents are letters starting from A up to P from left to right.
n a The percentage decrease of the wrongly assigned (or called) probability for the analysis using the algorithm NON-EXCH, relative to the algorithm EXCH. b One minus the posterior probability of being true ancestral states. c One minus the fraction of called ancestral states being true ancestral states. At each SNP location within an individual, the ancestral state is called by its maximum posterior probability.
EXCH is around 4:2% and the improvement of NON-EXCH over GAIN is around 7:2% when relative density r M $ 1=8: The similar performances between algorithms NON-EXCH and EXCH indicate that the prior FGL exchangeability is a reasonable approximation for autosomes in the CC.

DISCUSSION
We have developed two novel recursive algorithms for modeling genomic ancestral origins along two homologous chromosomes in a pedigree with arbitrary but known structure. The algorithms apply to both autosomes and sex chromosomes when they exist, and allow selfing in the pedigree. The extensive simulations on a real example pedigree show that the numerical results from the two recursive algorithms are consistent with the simulated results, apart from the stochasticity of gene flow. The Markov property is assumed for the ancestral origin process along two chromosomes, and thus genetic interference is assumed to be absent. The assumption does not affect the expected junction densities calculated from the two recursive algorithms, although it does affect the distribution of inter-junction distances including the variance of junction densities.
One important application of the recursive algorithms is to design a breeding scheme for accelerating the inbreeding process to obtain immortal lines and for maximizing the resolution of mapping QTL. Welsh and McMillan (2012) studied inbreeding processes among different types of multiparental crosses by simulations, and it takes approximately 5.5 hr to complete 100,000 simulations of eight-way RILs. In contrast, it takes less than one second for our recursive algorithms. Rockman and Kruglyak (2008) compared different intercross breeding designs in multiparental RILs by simulations, aiming at increasing the density of junctions (recombination breakpoints) and thus fine-mapping resolution. Our previous recursive algorithms (Zheng et al. 2014;Zheng 2015) can be used for calculating the junction density in random mating schemes, and the two new algorithms extend the calculation for any breeding schemes with fixed pedigrees. The two recursive algorithms can also be used in random mating schemes by applying them to many pedigrees that are simulated according to specified mating schemes and averaging the results, which would still require less number of replicates and less computational time than simulation studies.
The second important application of the two recursive algorithms is to provide an appropriate way of incorporating pedigree information for analyzing genotypic data in bi-or multiparental populations. Specifically, the two recursive algorithm can be used to calculate the process parameter values of hidden Markov models for genotypic data, that is, the prior probability distribution of ancestral origins (FGLs) at an initial site and the prior transition probability matrix describing how ancestral origins change along two homologous chromosomes within an individual. See Figure 1 of Zheng (2015) for an example. The application to the ancestral inference in the CC shows that the new algorithms implemented in RABBIT performs only slightly better than GAIN (Figure 8). However, GAIN is specifically designed for the CC, and our previous algorithm applies only to breeding schemes with stage-wise random mating. The new recursive algorithms have pronounced advantages of generality and computational efficiency, and they apply to arbitrary breeding pedigrees that are far away from random mating. For example, one of the multiparental barley populations consists of backcrossing, half-diallel crossing, and selfing, where one inbred founder is much stronger represented in the offspring lines (Liller et al. 2017).
Furthermore, we have applied the two recursive algorithms to incorporate pedigree information for genotype imputation in multiparental populations (Zheng et al. 2018), and the application for genetic linkage map construction in multiparental populations is under development.

ACKNOWLEDGMENTS
The authors thank two anonymous reviewers for their valuable comments. This research was supported by the Stichting Technische Wetenschappen (STW) -Technology Foundation, which is part of the Figure 8 Evaluation of the recursive algorithms that are applied in the ancestral inference for the first pair of autosomes of the 120 real CC lines. Panels (A-C) show the results for the wrongly assigned probability, the wrongly called probability, and the pedigree inconsistency, respectively. The pedigree inconsistency is measured by the sum of the posterior probabilities over the four mating pairs of founder parents.
Nederlandse Organisatie voor Wetenschappelijk Onderzoek -Netherlands Organization for Scientific Research, and which is partly funded by the Ministry of Economic Affairs. The specific grant number was STW-Rijk Zwaan project 12425. CZ designed the study, created the model, developed the software and algorithm, and wrote the first draft of the manuscript. MPB and FAE provided critical feedback, helped shape the manuscript, and acquitted financial support. All authors read and approved the final manuscript.