The Evolutionary Dynamics of Genetic Incompatibilities Introduced by Duplicated Genes in Arabidopsis thaliana

Abstract Although gene duplications provide genetic backup and allow genomic changes under relaxed selection, they may potentially limit gene flow. When different copies of a duplicated gene are pseudofunctionalized in different genotypes, genetic incompatibilities can arise in their hybrid offspring. Although such cases have been reported after manual crosses, it remains unclear whether they occur in nature and how they affect natural populations. Here, we identified four duplicated-gene based incompatibilities including one previously not reported within an artificial Arabidopsis intercross population. Unexpectedly, however, for each of the genetic incompatibilities we also identified the incompatible alleles in natural populations based on the genomes of 1,135 Arabidopsis accessions published by the 1001 Genomes Project. Using the presence of incompatible allele combinations as phenotypes for GWAS, we mapped genomic regions that included additional gene copies which likely rescue the genetic incompatibility. Reconstructing the geographic origins and evolutionary trajectories of the individual alleles suggested that incompatible alleles frequently coexist, even in geographically closed regions, and that their effects can be overcome by additional gene copies collectively shaping the evolutionary dynamics of duplicated genes during population history.


Introduction
Genetic incompatibilities describe the decrease of fitness due to dysfunctional allele combinations in hybrid individuals (Maheshwari and Barbash 2011). The evolution of genetic incompatibilities has often been explained by the Bateson-Dobzhansky-Muller (BDM) model (Bateson 1909;Dobzhansky 1937;Muller 1942), where independent mutations in interacting genes get fixed in different populations, which cause deleterious epistasis and reduced fitness in their hybrids. Over the past decades, many studies have elucidated the genetic basis of such genetic incompatibilities including reciprocal pseudofunctionalization (i.e., loss of function) of duplicated genes (Fishman and Sweigart 2018;Vaid and Laitinen 2019). Gene duplications can provide genetic backup of essential genes and the basis for evolutionary novelties by allowing for new genetic and epigenetic variations (Conant and Wolfe 2008;Kondrashov 2012;Panchy et al. 2016). However, in some cases, pseudofunctionalization of duplicated essential genes may occur independently in both copies in different individuals. This in turn can lead to the loss of any functional gene copy in the hybrid offspring of such individuals, and thereby cause severe genetic incompatibilities (Lynch and Force 2000).
Genetic incompatibilities introduced by duplicated genes have been reported within inter/intraspecific hybrids of Arabidopsis thaliana (Bikard et al. 2009;Durand et al. 2012;Agorio et al. 2017), rice (Mizuta et al. 2010;Yamagata et al. 2010;Nguyen et al. 2017), and Mimulus . Identification of these incompatible alleles, however, often relied on genetic mapping in experimental populations, which is a time consuming and costly process. As incompatible alleles at duplicated genes are frequently introduced by loss-of-function (LoF) mutations (such as stopcodon gain, frameshift, gene deletion) and epimutations (Bikard et al. 2009;Blevins et al. 2017), initial examination of LoF (epi)mutations within whole-(epi)genome sequence data could be a shortcut to quickly target promising candidates.
Although several incompatible alleles from duplicated genes have been identified in A. thaliana, it is still unclear how these incompatible alleles originate and evolve in natural populations, and how the populations adapt to the reduction in fitness. Untangling the complex evolutionary process would require accurate (epi)genotypes of incompatible genes across sufficiently large natural populations. The Arabidopsis 1001 Genomes Project (Alonso-Blanco et al. 2016) and 1001 Epigenomes Project (Kawakatsu et al. 2016) have released substantial omics data, which can be used to unravel the evolutionary trajectory of such incompatible alleles.
Here, we created an extended version of the Arabidopsis multiparent RIL population (Huang et al. 2011) to identify genetic incompatibilities between several different genotypes simultaneously. Based on distorted segregation of duplicated genes, we mapped four genetic incompatibilities. Unexpectedly, however, we identified several, healthy RILs which carried presumably incompatible allele combinations. Further analysis of their genomes revealed additional gene copies rescuing these severe incompatibilities. Encouraged by this, we searched for incompatible allele combinations within 1,135 accessions of the 1001 Genomes Project (Alonso-Blanco et al. 2016), where these combinations were surprisingly common. Using the incompatible allele combinations as phenotypes, we mapped modifiers of all four incompatibilities using GWA. The LoF alleles from duplicated genes were geographically widely distributed, and coexisted with additional gene copies in the same regions showing how additional gene copies can overcome differential copy loss in a population.

Identification of Incompatible Gene Pairs within an Intercross Population
We used the Arabidopsis Multiparent RIL (AMPRIL) population to find incompatible alleles that arose from duplicated genes. The eight AMPRIL founder accessions (An-1, C24, Col-0, Cvi-0, Eri-1, Kyo, Ler, and Sha) were selected across the entire geographic distribution of A. thaliana including the Northern hemisphere and the Cape Verde Islands. Recently, we generated chromosome-level genome assemblies of all seven, nonreference founder genomes (Jiao and Schneeberger 2020). The first release of the AMPRIL population (AMPRIL I) contained six subpopulations (referred to as ABBA, ACCA, ADDA, BCCB, BDDB, CDDC) derived from reciprocal diallel crosses between four hybrids (A: Col-0 Â Kyo, B: Cvi-0 Â Sha, C: Eri-1 Â An-1, D: Ler Â C24) and the subsequent selfing to the F4 generation by single-seed descent (Huang et al. 2011). Here, we present the extension of the AMPRIL population with six new subpopulations referred to as EFFE, EGGE, EHHE, FGGF, FHHF, and GHHG (E: Col-0 Â Cvi-0, F: Sha Â Kyo, G: Ler Â An-1, H: Eri-1 Â C24) based on different diallel intercrossing scheme and selfing of the recombinant genomes until the F6 generation ( fig. 1a). Each subpopulation consists of approximately 90 individuals representing recombinants of four founders. In total, 992 RILs from all 12 subpopulations were sequenced and analyzed using RAD-seq (Baird et al. 2008) (supplementary data 1, Supplementary Material online) and genotyped with $2 million high-quality SNP markers. We used a Hidden Markov Model to reconstruct the parental haplotypes (identity-bydescent) including residual heterozygous regions (Rowan et al. 2015) (supplementary fig. 1 and note 1, Supplementary Material online). The genotyping resulted in 12,878 different recombination breakpoints (on average one breakpoint per 9.3 kb) across the entire population. This allowed us to divide the genome of each progeny into 12,883 haplotype blocks, where each block relates to the haplotype(s) of only one (homozygous regions) or two (heterozygous regions) of the founder haplotypes.
We developed a two-step workflow to combine genetic and genomic evidence to quickly identify incompatible alleles of duplicated genes ( fig. 1b). In the first step, we selected 781 distal (interchromosome) duplicated gene pairs including 612 gene pairs in which the reference sequence contains two copies and other founder genomes feature at least one copy (supplementary data 2, Supplementary Material online). In the remaining 169 gene pairs, the reference sequence only has one copy and at least one other parental genome has an additional copy in a different chromosome. As genetic incompatibility leads to the underrepresentation of incompatible allele combinations (Ackermann and Beyer 2012;Corbett-Detig et al. 2013), we searched for significant distortions from the expected frequencies of all parental allele combinations across all 781 duplicated gene pairs in all 12 subpopulations, two merged subpopulations (ABBA and EFFE, CDDC and GHHG as they share the same founders), and the whole population (see Materials and Methods). These tests revealed significant distortions in 236 gene pairs (v 2 test, P value <0.05, multiple testing corrected) in at least one of the populations.
However, the observed distortions do not necessarily result from genetic incompatibilities in the tested gene. Alternatively, such distortions can also occur if the tested gene duplicate is closely linked to a genetic incompatibility. Hence, in a second step, we examined the alleles of the gene pairs in the founder genomes for LoF variations or hypermethylated promoters ( fig. 1b, see Materials and Methods). This examination revealed three gene pairs with functional disruption in both of the duplicates in at least one of the founder genomes. These three duplicated genes included two, HISTIDINOL PHOSPHATE AMINOTRANSFERASE (HPA) (Bikard et al. 2009) and FOLATE TRANSPORTER (FOLT) (Durand et al. 2012), which were already known for their ability to introduce genetic incompatibilities, as well as one gene pair, which so-far was not reported as the genetic basis Jiao et al. . doi:10.1093/molbev/msaa306 MBE for a genetic incompatibility, TIM22 (TIM22-1: AT1G18320, TIM22-2: AT3G10110) ( fig. 1c). For all other 233 gene pairs, we could not identify nonfunctional alleles in both copies.
We noted that another duplicated gene, tRNA ADENOSINE DEAMINASE 3 (TAD3), which is also know to introduce a genetic incompatibility (Agorio et al. 2017), was not considered by our initial testing even though the genotypes of the AMPRIL founders should lead to incompatible allele combinations in the RIL populations: all founder genomes except for Kyo have a functional TAD3-1 (supplementary table 1, Supplementary Material online), whereas the Kyo TAD3-1 gene is silenced most likely due to its methylated promoter (similar to the Nok-1 and Est-1 accessions in which the incompatibility was described originally; Agorio et al. 2017). The lack of a functional TAD3-1 in Kyo is counterbalanced by additional copies (TAD3-2) (which however were not part of the main chromosome scaffolds of the Kyo genome assembly). Therefore, the TAD3 gene duplicate was not considered initially, however, we also used this incompatibility for further analysis.

Genetic Incompatibility Introduced by Diverged Copies of TIM22
Before we started our analysis of the four incompatibilities, we verified that the LoF alleles of TIM22 are in fact the causal basis of the genetic incompatibility that we observed in the AMPRIL population. TIM22 encodes for a mitochondrial import inner membrane translocase subunit of the TIM17/ TIM22/TIM23 family protein (Murcha et al. 2007). All eight founders feature two annotated TIM22 copies, whereas Cvi-0 includes an extra truncated copy $31 kb downstream to TIM22-2 ( fig. 2a and b). We found significant segregation distortions in the complete AMPRIL population and one subpopulation, EGGE, where the double homozygous allele combination TIM22-1 Col-0 TIM22-2 Cvi-0 was significantly underrepresented (supplementary tables 2 and 3, Supplementary Material online).
We observed an in-frame premature stop-codon in Col-0 (mis-annotated in the reference annotation) suggesting that TIM22-1 is not functional in Col-0 ( fig. 2a j {a,b,c,d,e,f,g,h} FIG. 1. Identification of genetic incompatibilities introduced by duplicated genes in an intercross population. (a) Construction of the extended AMPRIL population. Eight different Arabidopsis thaliana accessions were used as founder lines. For each of the six subpopulations, two F1* hybrids, which were generated by crossing two founder lines, were again crossed to give rise to the F1 individuals of each population. The F1 individuals were further self-crossed to the F6 generation. (b) Workflow for the identification of potentially incompatible alleles in duplicated genes. Unlinked (i.e., on separate chromosomes) duplicated genes were selected and the expected and observed frequencies of all haplotype combinations between the two copies were calculated for each subpopulation and the whole population. Underrepresented allele combinations were identified using v 2 test. Each gene duplication with significantly underrepresented allele combinations was evaluated for nonfunctionalized or deleted gene copies in the respective parental genomes. (c) The location of incompatible alleles in four duplicated gene pairs identified in the AMPRIL population including one so-far unknown incompatibility (red) and one detected a posteriori in an informed way (TAD3).
Evolutionary Dynamics of Duplicated Gene-based Genetic Incompatibilities . doi:10.1093/molbev/msaa306 truly nonfunctional in Col-0, we used the segregation of two T-DNA insertion mutants in the two TIM22 paralogs in Col-0. This showed that tim22-1 could be homozygous for the T-DNA insertion allele but the T-DNA in tim22-2 could not be found in homozygous state ( fig. 2c). This suggests that, in Col-0, TIM22-1 is not functional and TIM22-2 is the only functional copy.
The TIM22 paralogs colocated within the regions of a previously reported genetic incompatibility (hereafter named as LD2: LD2.1 for the locus at chromosome 1 and LD2.3 for the locus at chromosome 3) which was mapped in a Cvi-0 Â Col-0 RIL population (Simon et al. 2008). The genetic underpinnings of this incompatibility however were still unknown. This incompatibility was expressed by a striking underrepresentation of homozygous LD2.1 Col-0 combined with homozygous LD2.3 Cvi-0 , which was in agreement with the reduced allele combinations in the AMPRIL population (supplementary tables 2 and 3, Supplementary Material online). Therefore, we generated heterogeneous inbred family (HIF) lines from Cvi-0 Â Col-0 RILs to fine-map LD2.1 and LD2.3 to respectively 70-and 34-kb intervals (supplementary fig. 2, Supplementary Material online). The two candidates TIM22-1 and TIM22-2 remained within the intervals.
To validate their causative role, we conducted a complementation cross between a heterozygous T-DNA mutant in TIM22-2 in a Col-0 background (i.e., TIM22-1 was nonfunctional) and the original HIF line in which TIM22-1 was homozygous for the Col-0 genotype (i.e., also nonfunctional) and TIM22-2 was heterozygous for Col-0/Cvi-0 ( fig. 2d). Within 123 hybrids of the offspring, among the four possible allelic combinations at LD2.3, we did not find any hybrids combining a Cvi-0 and a T-DNA alleles at TIM22-2 ( fig. 2d), providing strong genetic evidence that the Cvi-0 allele cannot complement a knockout (T-DNA) allele at TIM22-2 (in a background without other functional TIM22 allele) and is thus nonfunctional.
Collectively, these segregation and complementation crosses show that the combination of different nonfunctional alleles of the TIM22 copies leads to a drastic reduction of allelic combinations in offspring populations, and thus evidence the causative role of TIM22 in this genetic incompatibility.

Natural Modifiers Can Rescue Incompatible Allele Combinations
Incompatible allele combinations can result in severe phenotypic defects, which lead to the reduction or the full absence Cvi-0 tim22-2 (het) 8HV408 (RIL) of specific allele combinations. For example, the double homozygous nonfunctional allele combination of HPA1/HPA2 results in embryo lethality (Bikard et al. 2009) and thereby wipes out all carriers of the incompatible allele combination.
HPA encodes a histidinol-phosphate amino-transferase for the biosynthesis of histidine, an essential amino acid (Muralla et al. 2007). All eight AMPRIL founders except of Cvi-0 have a functional HPA1 and a nonfunctional HPA2 due to a premature stop codon or a hypermethylated promoter, whereas Cvi-0 carries a functional HPA2 allele, but does not carry HPA1 at all (supplementary table 5, Supplementary Material online). Unexpectedly, however, we did observe homozygous HPA1/HPA2 incompatible allele combinations (HPA2 À/À HPA1 À/À ) in 11 of the AMPRIL lines within the ABBA and EFFE subpopulations which were derived from Col-0, Cvi-0, Kyo, and Sha (supplementary tables 6-8, Supplementary Material online). Further analysis of these populations revealed an extremely high frequency of the Kyo allele on chromosome 4 (supplementary fig. 3, Supplementary Material online), making us recognize that the 11 AMPRIL lines with the incompatible allele combinations all carried at least one Kyo allele at chr4:9.2-13.7 Mb ( fig. 3a). This suggested that Kyo might contain a modifying allele of the incompatibility complementing the lethal allele combination in this region (a similar feature was described as "conditional incompatibility" by Bikard et al. [2009] when analyzing crosses between Jea and Col-0). Indeed, when we checked the long-read assembly of the Kyo genome, we identified an additional HPA copy (hereafter named as HPA3) which colocated with the mapping interval at chr4:11.11 Mb ( fig. 3b), whereas we could not find this allele in any of the other AMPRIL founder genomes.

Mapping Modifiers of Incompatible Allele Combinations in Natural Populations
This analysis showed that the negative effects of incompatible allele combinations can be overcome if they are rescued by modifying alleles. Therefore, the virtual absence of functional alleles (among the known loci) of a duplicated gene could act as a molecular phenotype to map the location of additional (rescuing) alleles by genome-wide association (GWA) mapping even in natural populations.
To do this, we searched for incompatible allele combinations (separately for all four incompatibilities) in each of the 1,135 accessions of the 1001 Genomes Project (Alonso-Blanco et al. 2016) and used these allele combinations as phenotype for a GWA ( fig. 4). To define nonfunctional alleles, we used the resequencing data to search for LoF variations, and the methylome data from the 1001 Epigenomes Project (Kawakatsu et al. 2016) to identify methylated (silenced) promoters ( fig. 4a, see Materials and Methods). Additionally, RNA-seq data were explored to distinguish pseudoheterozygous variants (which exist due to inaccurate short-read alignment at duplicated genes) and to check for gene silencing.
For the first incompatibility, in HPA, we found pseudoheterozygous LoF variations including two premature stopcodons and three frameshifts at both reference copies of HPA ( fig. 4c and supplementary table 9, Supplementary Material online) due to the repetitiveness of the gene sequences. As some accessions did not show expression of HPA2 most likely due to hypermethylation of their promoters, we tested which of the HPA alleles was present in the RNA-seq read data to assign the LoF to either of the HPA copies ( fig. 4b). With this, we could assign one stop-codon gain (LoF-300) and two frameshifts (LoF-170, LoF-230) to HPA2  By combining the LoF variant genotyping and the methylation analyses, we found 767 accessions with a nonfunctional HPA2 (HPA2 À/À HPA1 þ/þ ) allele and five accessions Un-met. Met. Un-met. Ref.
Alt. Schematic example to illustrate how genome-wide methylation and RNA-seq data are used to assign pseudoheterozygous variants to a specific gene copy despite the repetitive nature of the short-read alignments within duplicated genes. If a specific variation is present in DNA data, but absent in RNA data and one of the gene copies is methylated (i.e., likely expression silenced) the pseudoheterozygous variation is assigned to this (expression silenced) gene copy. Light green, untranslated region; green, coding region; gray, intron or gene up/down-stream. (c) Pseudoheterozygous LoF variants found in the short read alignments (of 1,135 Arabidopsis thaliana genomes; Alonso-Blanco et al. 2016) at HPA2 and HPA1. LoF-170, LoF-230, and LoF-300 could be assigned to HPA2 (using the procedure of (b)), whereas LoF-190 and LoF290 could not be assigned to either of the gene copies. In addition, 49 accessions did not feature any functional alleles (HPA2 À/À HPA1 À/À ) and were expected to carry additional modifier(s) to complement the loss of functional copies. To find the locations of these modifiers, we used the absence of functional copies as phenotype (supplementary data 3, Supplementary Material online) to run a GWA under the mixed linear model using the SNP markers from the 1001 Genomes Project (www.1001genomes.org). This GWAS revealed a significantly associated region at chr4:11.00-11.15 Mb ( fig. 4g and supplementary fig. 5, Supplementary Material online, alpha level of 0.05, Bonferroni correction), corresponding to the HPA3 locus found in Kyo (chr4:11.11 Mb). Though other peaks in unlinked regions were present, these additional loci explained only a small proportion of the heritability.
Analyzing the haplotypes at the 39 significantly associated SNP markers at HPA3 locus revealed a somewhat, but not entirely homogenous haplotype in many of the HPA2 À/À HPA1 À/À carriers ( fig. 4h). To confirm that the modifying haplotype is still identical to the Kyo allele, we aligned the short reads of all accessions of the 1001 Genomes Project against the Kyo reference sequence and found that overall 162 accessions carried the HPA3 allele (supplementary data 3, Supplementary Material online). However, unexpectedly, the Kyo HPA3 was only found in 13 of the 49 accessions without functional HPA2 and HPA1 alleles, suggesting the presence of two different modifiers at this locus on chromosome 4. To find more support for this, we ran a new GWA without the 162 HPA3 (Kyo-like allele) carriers, which still led to a significantly associated locus at the region of HPA3 (supplementary fig. 6, Supplementary Material online). Further short-read mappings of all 1,135 genomes against the Cvi-0 reference sequence (where only one copy of HPA exists) revealed the presence of (at least) two additional copies (in addition to HPA1 and HPA2) in a total of 42 accessions including all the 36 accessions with the unknown rescuing alleles ( fig. 4i and supplementary data 3, Supplementary Material online). Together this suggested that HPA3 also rescues incompatible HPA1 and HPA2 allele combinations in natural populations and that the locus of HPA3 contains an additional haplotype (hereafter named HPA4) which also rescues the incompatibility between HPA1 and HPA2.
We continued to apply the same approach to the other three incompatibilities. For TIM22, 16 accessions of the 1001 Genome Project revealed nonfunctional allele combinations (supplementary table 11 and data 4, Supplementary Material online), again indicating the existence of modifying alleles. However, a GWAS using the presumably-incompatible allele combinations as molecular phenotype did not reveal only one, but numerous significantly associated loci (supplementary fig. 7, Supplementary Material online). This might be explained by the low number of incompatible allele carriers, which could affect the power of association mapping leading to false-positive associations. However, even though we could not locate the modifying allele, further analysis of read mapping coverage in TIM22 revealed that all 16 accessions with nonfunctional TIM22-1/TIM22-2 allele combinations carried at least one additional third copy of TIM22, whereas among all 1,119 other genomes of the 1001 Genomes Project only three genomes carried additional copies. This suggests that, like for HPA, known nonfunctional allele combinations of TIM22 are in fact rescued by additional copies. Moreover, in ten of the 16 accessions, TIM22-2 showed not only one but multiple additional copies (hereafter named as TIM22-3), which was further supported by the genome assembly of Ty-1 (https://genomevolution.org/CoGe/GenomeInfo.pl? gid¼54584, unpublished), where we could find a cluster of four tandemly arranged TIM22 gene copies at the TIM22-2 locus.
Because the reference sequence only contained one copy of TAD3 (TAD3-1) (Agorio et al. 2017) and FOLT (FOLT1) (Durand et al. 2012), we modified our GWA method and only used nonfunctional alleles at the reference gene as the phenotype to map modifiers for the two remaining incompatibilities ( fig. 5 and supplementary fig. 8, Supplementary Material online). Due to this modification we would expect to map also the location of the duplicated genes as we had found them in the AMPRIL founders. For TAD3, an essential ortholog of the yeast tRNA Adenosine deaminase 3 (Gerber and Keller 1999), we did not find any accessions with LoF alleles, but we found 150 accessions with a hypermethylated promoter in TAD3-1 similar to the methylated promoters found in Nok-1 and Est-1, which are known to be nonfunctional due to this methylated promotor region (Agorio et al. 2017) ( fig. 5a and b). The GWA result revealed one significant peak ( fig. 5c and d), which as expected colocated with the TAD3-2 locus (Agorio et al. 2017). All the 150 accessions carried multiple additional copies of TAD3-2 (supplementary data 5, Supplementary Material online), similar to Nok-1 and Est-1. This suggests that the expression silencing of TAD3-1 is common in natural populations and that the rescue of this loss-of-function allele is generally mediated by additional gene copies at the TAD3-2 locus as it was shown in the original description of this incompatibility (Agorio et al. 2017).
The last of the four genetic incompatibilities, introduced by FOLT encoding for a folate transporter, was previously discovered in hybrids from crosses between Col-0 Â C24/ Sha (Törj ek et al. 2006;Simon et al. 2008;Durand et al. 2012). Col-0 has only one copy of FOLT (FOLT1) at chromosome 5, whereas the C24 and Sha have an additional copy, FOLT2, at chromosome 4 including some extra truncated copies near FOLT2 (supplementary table 12, Supplementary Material online). In the earlier study, the truncated copies were shown to express siRNAs and activate the RNAdirected DNA methylation pathway to silence FOLT1 in C24 and Sha (Durand et al. 2012), which resulted in a lethal allele combination in F2 hybrids between Col-0 (FOLT1 þ/þ ) and C24/Sha (FOLT1 À/À FOLT2 þ/þ ) ( fig. 5f). Analyzing the accessions of the 1001 Genome Project for functional and nonfunctional alleles of FOLT, we found 75 accessions with methylated promoters of FOLT1 likely leading to expression silencing ( fig. 5e), which was supported by the lack of FOLT1specific pseudoheterozygous SNPs in the RNA-seq data Evolutionary Dynamics of Duplicated Gene-based Genetic Incompatibilities . doi:10.1093/molbev/msaa306 (supplementary fig. 9, Supplementary Material online). Besides providing evidence for the expression silencing of FOLT1, this also suggested the existence of additional FOLT gene copies in these 75 accessions. When we repeated our GWA approach to find these interacting loci of FOLT1, we found multiple significantly associated loci including one region corresponding to FOLT2 ( fig. 5g and h).
Further analyses of the FOLT gene copies using short read mapping against the C24 and Sha reference sequences revealed the presence of FOLT2 in all the 75 accessions with methylated promoter of FOLT1 and truncated copies of FOLT2 genes in only 46 out of such 75 accessions ( fig. 5e and supplementary table 13 and data 6, Supplementary Material online) suggesting that the methylation of the FOLT1 promoter remains stable even after the truncated copies are segregated out. This is consistent to what was shown in successive generations of Sha Â Col-0 RILs where the inducing locus was segregated away six generations ago (Durand et al. 2012).
Taken together, we found evidence that all four incompatible allele combinations, initially identified within an artificial intercross population, also occur in nature, and that the incompatible allele combinations of three of them were surprisingly common. For all incompatible allele combination carriers, we found evidence for the existence of additional copies, and could even map the locations of some of those using GWA.

Geographic Distribution of Incompatible Allele Combinations
We next asked how prevalent the potential for incompatible allele combinations was within natural populations of A. thaliana by analyzing the presence of different haplotypes in different geographic regions. For this, we first analyzed the allele frequencies of different haplotypes (i.e., the different combinations of functional alleles within individual plants), which varied substantially across the accessions of the 1001 Genomes Project (fig. 6).
For example, among the accessions that carried only one functional copy of HPA, we found 661, 4, 13, and 36 accessions with a single functional copy of HPA1, HPA2, HPA3, or HPA4, respectively ( fig. 6a and b and supplementary fig. 10, Supplementary Material online). Similarly, most accessions only featured one functional copy of TAD3 including 971 accessions with only TAD3-1 and 150 with only TAD3-2 ( fig. 6a and supplementary table 14 and supplementary fig.  11, Supplementary Material online). In contrast, most of the Un-met. Un-met.

TAD3-2
(c)   Genetic incompatibilities become effective in the offspring of accessions with incompatible alleles. In particular, the offspring of accessions with only one functional gene copy will lead to the highest reduction in fitness. We found numerous accessions with the potential to create incompatible allele combinations in their offspring in geographically close regions ( fig. 6c-e). For example, among 30 accessions in the South of Italy we found nine accessions with only a functional HPA1 closely located with 14 accessions with either a functional HPA3 or a functional HPA4 only ( fig. 6c). Likewise, accessions with contrasting functional copies of FOLT and TAD3 were collected from the same regions in Central Asia and Germany ( fig. 6d and e) evidencing that incompatible alleles could segregate within the same populations.
Interestingly, within some of these contact zones of accessions with incompatible alleles, we also found accessions with  HPA1+  HPA2+  HPA3+  HPA4+  HPA1+HPA2+  HPA1+HPA3+  HPA2+HPA3+  HPA1+HPA4+  HPA1+HPA2+HPA3+  FOLT1+  FOLT2+  FOLT1+FOLT2+  TAD3-1+  TAD3-2+  TAD3-1+TAD3-2+  TIM22-2+  TIM22-1+  TIM22-3+  TIM22-1+TIM22-2+ 36 HPA FOLT TAD3 TIM22    Evolutionary Dynamics of Duplicated Gene-based Genetic Incompatibilities . doi:10.1093/molbev/msaa306 multiple functional gene copies. For example, in South Italy there were seven of the 30 accessions that featured multiple functional copies of HPA, and similarly in Central Asia, we could find multiple accessions with two functional copies of FOLT in addition to the accessions with only one functional copy ( fig. 6d and supplementary figs. 10-12, Supplementary Material online). In contrast, however, we only observed a few accessions with multiple functional alleles in the contact zone of incompatible TAD3 alleles, most likely because only 14 such accessions were found in the entire set of accessions. Even though we could not find evidence that such contact zones are enriched for additional copies or that those additional copies would evolve in these regions, the presence of haplotypes with additional gene copies in these contact zones have the potential to mediate the gene flow between the haplotypes with the incompatible allele combinations.

Origin and Evolution of Incompatible Alleles
To figure out when these incompatible alleles originated and how they evolved, we investigated their ancestral genotypes within African genotypes which likely represent the ancestral populations of the Eurasian accessions (Durvasula et al. 2017).
Although almost all (99.7%) of the Eurasian accessions had both HPA2 and HPA1 (either functional or not), we found that only 27% (20 out of 75) of the African accessions carried only HPA2 ( fig. 6f and g; supplementary data 3, Supplementary Material online). This suggested that HPA2 was the ancestral copy of HPA which was further supported by the synteny alignment with the close relative Arabidopsis lyrata (Blevins et al. 2017), which only featured a single HPA gene in a region syntenic to HPA2 and that the duplication events leading to HPA1 happened early on in Africa ( fig. 6h). One accession, Khe32, from Morocco carried HPA1 along with the most frequent LoF variant LoF-300 in HPA2 in the Eurasian accessions, suggesting that the first accession with only a functional HPA1 possibly arose in North-West Africa and thereby suggests that the genetic incompatibility with HPA2 carriers was already possible before the Eurasia colonization of A. thaliana.
In contrast, HPA3 and HPA4 only occur in Eurasian accessions, most likely indicating that the additional duplication events happened later ( fig. 6h). Comparing the gene sequences of the HPA genes revealed that HPA3 was duplicated from HPA2 (supplementary fig. 13, Supplementary Material online), whereas HPA4 might be a tandem duplicate of HPA3 as it is likely located close to HPA3. Interestingly, HPA3 and HPA4 carriers segregated for the ancestral LoF-300 variant at HPA2 (HPA3: 68 with and 94 without; HPA4: 10 with and 32 without, supplementary fig. 14, Supplementary Material online), suggesting free segregation of different HPA alleles. In striking contrast to this, the inactive alleles of HPA1 (i.e., alleles with a nonfunctional HPA1 sequence, but excluding the accessions with full deletion alleles as found in Cvi-0) were almost perfectly coupled (48 of 50) with inactive alleles of HPA2. Such carriers nearly all had HPA4 (35) or HPA3 (13) (supplementary fig. 15, Supplementary Material online) and were mainly located in the Iberian Peninsula and Southern Italy suggesting that the additional HPA copies were necessary to buffer the incompatibility and to allow the foundation of these populations (fig. 6c).
Unlike the incompatibility in HPA, genetic incompatibilities can also arise in recent population history. All African accessions only had the TAD3-1 copy, whereas accessions with multiple copies of TAD3 can only be observed in Eurasia (supplementary fig. 10, Supplementary Material online). Similarly, 38 of the African accessions only have FOLT1, whereas the other 37 accessions have both FOLT1 and FOLT2, but none of the accessions featured a truncated copy of FOLT2 (FOLT2tr), which is the mechanistic origin of the incompatible alleles at FOLT1 and FOLT2 (supplementary fig. 16 and data 6, Supplementary Material online). In contrast, within the Eurasian accession, we even found three different haplotypes of FOLT2tr within a total of 46 accessions based on short read alignments against the eight A. thaliana genomes (Jiao and Schneeberger 2020) (supplementary fig. 17 and data 6, Supplementary Material online). This together suggests that also the genetic incompatibility that is based on FOLT2tr has evolved recently after the migration A. thaliana to Eurasia ( fig. 6i).

Discussion
Although gene duplication provides genetic backup of essential genes, duplicated genes can also lead to incompatible allele combinations when the duplicated genes undergo reciprocal pseudofunctionalization in separate genomes (Lynch and Force 2000). Here, we first identified incompatible allele combinations of four duplicated gene pairs by integrating genetic and genomic information of a multiparental intercross population. Unexpectedly, when we searched the genomes of 1,135 accessions released by the 1001 Genomes Project (Alonso-Blanco et al. 2016), we identified many natural accessions with putatively incompatible allele combinations of these four genes and could elucidate the genetics of their incompatibilities and modifiers as they occur in natural populations using GWA. As duplicated genes are common in plant genomes (Lynch and Conery 2000;Maere et al. 2005;Panchy et al. 2016), duplicated gene based incompatibilities might be more common than typically anticipated, suggesting that our GWA approach could be further applied to search for more candidates in other genes.
Using multiomics data from hundreds of A. thaliana accessions, we found that incompatible alleles are surprisingly frequent in nature and also occur in sympatry, implying that the evolution of genetic incompatibilities does not require separated populations as proposed by BDM model (Bateson 1909;Dobzhansky 1937;Muller 1942). Moreover, incompatible allele combinations can persist over long periods as some of the alleles studied here may have originated before A. thaliana colonized Eurasia. This might be due to the inefficient elimination of nonfunctional alleles in a predominately selfing species (Bustamante et al. 2002). Arabidopsis thaliana wild populations have an outcrossing rate of around 1%, which however, can reach up to 15% in some outcrossing hotspots (Bakker et al. 2006;Bomblies et al. 2010), suggesting that the incompatible alleles might be recombined in the same Jiao et al. . doi:10.1093/molbev/msaa306 MBE individuals at some regions. Some recombined alleles (such as double homozygous recessive alleles) will be removed when no mutation of extrafunctional copy appears, which in turn can reduce the frequency of incompatible allele combinations.
As incompatibilities introduced by duplicated genes are likely caused by neutral mutations and genetic drift (Lynch and Force 2000), such neutral incompatibility alleles of duplicated genes are supposed to be purged even under low level of gene flow (Bank et al. 2012). Alternatively, when duplicated alleles are linked with other regions under constrains of selection, they could be also maintained during evolution. Taken together, the frequency and distribution of incompatible alleles at duplicated genes are dynamically affected by multiple factors including mating system, gene flow, and perhaps selection.
Although it remains unclear if incompatibilities in sympatry are common in other species as well, they have also been identified in Minulus species . Using genotyping data from globally distributed individuals of other species such as rice where major resource have been generated (Huang et al. 2012;Wang et al. 2018) have the potential to reveal the frequency and distribution of incompatible duplicated alleles in these species in future (Mizuta et al. 2010;Yamagata et al. 2010;Nguyen et al. 2017).
Here, new gene copies arose either from distal duplications (in HPA and FOLT) or tandem duplications (in all four cases) and counteract incompatibilities. Even though additional copies reduce the frequency and impact of genetic incompatibilities, they could also increase the potential for more incompatible allele combinations. Subsequent subfunctionalization could be a way out of the trap of genetic incompatibilities, as there would be a selective pressure to keep both gene copies. Although we have assumed functional redundancy of all gene copies in this work, an extensive number of accessions shared the functional copies of both FOLT genes (865 of 1,135), which could indicate that these copies are not fully redundant and thereby limit the establishment of incompatible allele combinations. Thus, the incompatible alleles in sympatry may slow down neofunctionalization or pseudogenization of apparently redundant copies in multiple-copy carriers, and may partially explain why more duplicated genes are preserved than one would expect from theory (Lynch and Conery 2003;Maere et al. 2005).
Taken together, our work demonstrates that the potential for genetic incompatibilities due to duplicated essential genes is surprisingly high in nature. However, the effects of such incompatibilities are counteracted by additional gene copies, which undergo dynamic changes shaped by the recurrent events of gene duplication and nonfunctionalization during population history.

AMPRIL Construction
The eight A. thaliana accessions An-1, C24, Col-0, Cvi-0, Eri-1, Kyo, Ler, and Sha were selected as the AMPRIL founders. We previously constructed a first version of AMPRIL population (AMPRIL I) including six RIL subpopulations (Huang et al. 2011). Here, AMPRIL I was extended with six additional subpopulations (called AMPRIL II) based on different pairwise intercrosses of the eight founders ( fig. 1a). Each subpopulation contains approximately 90 individuals (supplementary data 1, Supplementary Material online). All plants were grown under the normal growth conditions in greenhouse at the MPI-PZ (Huang et al. 2011). DNA of 1,100 samples was extracted from the flower buds and prepared for RAD-seq sequencing (Baird et al. 2008).

RAD-Seq Library Preparation and Sequencing
All plants were grown in the greenhouse. The total DNA from each of the 1,100 samples was extracted from the flower buds using DNeasy Plant Kit 96 (Qiagen) and eluted in 200 ml Elution buffer (EB). DNA of each genotype was isolated twice. Both genotype samples were pooled in 1.5 ml tube and the DNA was concentrated by isopropanol precipitation for 2 h at À20 C. The samples were centrifuged at 12,000 Â g for 5 min at 4 C, the supernatant was removed, and the DNA pellet was washed with ice-cold 70% ethanol. Centrifugation was repeated, the supernatant was removed. Air-dried DNA was resuspended in a nuclease free water to 26 ng/ml and stored at À20 C until use. RAD-seq sequencing libraries were prepared as described (Etter et al. 2011) with modifications. Per genotype, 500 ng DNA were digested with 10 units of CviQI (NEB, cutting site G'TAC) at 25 C for 2 h. The number of expected cutting sites was estimated around 236,000 based on the Col-0 genome sequence. Cut DNA was purified using 96 DNA clean and concentrator kit (Zymo) a diluted in 25 ml EB. The 192 different (selected out of total 210 designed, supplementary data 1, Supplementary Material online) P1 adapters (200 nM) containing unique 12-bp barcodes were ligated by incubation with T4 ligase (NEB) at room temperature for 30 min and the reaction was terminated by 20 min at 65 C. After 30 min at room temperature, 5 ml from each 192 P1-barcoded sample were combined in a 2 ml low bind tube. About 3 Â 130 ml aliquots were transferred to fresh tubes and DNA was fragmented to average size of 500 bp using Covaris. Sheared DNA was purified using QiaMinElute columns (Qiagen), eluted in 10 ml EB, and the three samples were first pooled and then divided into two 15 ml samples that were run on 1% agarose gel. Regions of 300-to 500-bp fragments were dissected and DNA was isolated using MinElute gel extraction kit (Quiagen), eluted in 10 ml EB, the samples were pooled, DNA fragment ends were repaired using Quick Blunting kit (NEB), purified with QIAquick column (Qiagen), and eluted in 43 ml EB. 3 0 -deoxy-adenine overhangs were added using Klenow Fragment (NEB), the sample was purified with QIAquick column, eluted in 45 ml EB, the P2 adapter was ligated, the sample was purified with QIAquick column and eluted in 53 ml EB. To determine the library quality, 10 ml RAD library were PCR amplified (1: 98 C 30 s; 2: 14Â 98 C 10 s, 67 C 30 s, 68 C 30 s; 3: 68 C 5 min, 4: 4 C hold) using NEB Next High-Fidelity master mix (NEB) in 25 ml reaction volume using RAD-Marker for/RAD-Marker rev primers (25 nM each) and 5 ml PCR product were loaded into 1% Evolutionary Dynamics of Duplicated Gene-based Genetic Incompatibilities . doi:10.1093/molbev/msaa306 agarose gel next to the 1 ml RAD library template. If the PCR product smear was at least twice as intense as the template smear, the library was considered as of high quality and the amplification was repeated in 50 ml reaction volume. The product was cleaned using AMPure magnetic beads (Beckman Coulter) and dissolved in 20 ml EB. Finally, the sample was run on 1% agarose gel, the region of 300-to 500-bp fragments was cut out, DNA was isolated using MinElute Gel extraction kit (Qiagen), and eluted in 20 ml EB. The library was sequenced in an Illumina Hiseq2000 sequencing machine. Sequencing reads were demultiplexed according to the barcodes (supplementary data 1, Supplementary Material online).

AMPRIL Genotyping
We used previously released whole-genome short read data (Jiao and Schneeberger 2020) to generate markers for genotyping the AMPRILs. We mapped the reads to the reference sequence and called SNPs using SHORE (version 0.9) (Ossowski et al. 2008) with default parameter settings. Only homozygous SNP calls were selected after removing SNP calls with low quality (quality < 30), in the repetitive regions or in regions with low mapping quality (quality < 30). The actual marker sets for each of the 12 subpopulations (ABBA, ACCA . . . GHHG) were selected based on respective parental genomes, exclusively selecting biallelic SNP markers. After excluding samples with replicates or too few sequencing reads, we performed genotyping on 992 AMPRILs using a Hidden Markov Model-based approach similar to the recently presented method for the reconstruction of genotypes derived from two parental genomes (Rowan et al. 2015) (for a detailed description, see supplementary note 1, Supplementary Material online).

Identifying Genetic Incompatibilities Based on Duplicated Genes
Duplicated gene pairs were selected based on gene family clustering of protein-coding genes from all eight parental genomes using OrthoFinder (version 2.2.6) (Emms and Kelly 2015). We only selected interchromosome duplicated genes to avoid the effects of intrachromosome linkage (supplementary data 2, Supplementary Material online). For each duplicated gene pair, we required two copies in the reference sequence and at least one copy in one of the other genomes (DupGene2), for example, the genes HPA and TIM22 (the reference Col-0 has two copies from two different chromosomes), or one copy in reference sequence and at least one copy on a different chromosome in at least one of the other parental genomes (DupGene1), for example, the gene FOLT (the reference Col-0 only has one copy, but Sha has two copies from two different chromosomes). We determined the genotypes (parental haplotypes) of each of the gene copies in each AMPRIL using the genotypes predicted in the middle of the respective reference gene, or in the case a gene was not present in the reference sequence-using the midpoint between the two closest flanking syntenic regions of the nonreference gene copy (based on synteny calculations from a previous study; Jiao and Schneeberger 2020). For example, the parental haplotype of one progeny at chromosome 5:24-27 Mb (where FOLT2 is inside at 26.513-26.516 Mb) was homozygous Col-0, whereas it was homozygous for An-1 at chromosome 4:10-16 Mb (where FOLT1 is inserted at 13.577 Mb according to synteny in the wholegenome alignment of the Col-0 and An-1 genomes). Therefore, the observed allele pair of FOLT in this progeny was FOLT1 Col-0/Col-0 FOLT2 An-1/An-1 .
We predicted candidate genetic incompatibilities using a two-step approach. In the first step, we performed v 2 tests (eq. 1) to check whether the frequency of allele pairs in duplicated genes was significantly distorted in any of the subpopulations or in any of two merged subpopulations (ABBA and EFFE or CDDC and GHHG which shared the same four founders, respectively).
Here, the o ij and e ij represent the observed and expected allele pair frequency of duplicated genes, respectively, and a, b, c, d represent the parental genotypes in each subpopulation. For example, the two copies of Tim22, Tim22-1 and Tim22-2, are both present in the reference Col-0 genome. The observed and expected allele pair frequencies of Tim22 in the subpopulation EGGE can be found in supplementary table 3, Supplementary Material online. There are 88 individuals in this subpopulation. If the genotype at these two loci in one individual genotype was aabb, the count of the allele combination "ab" was increased by two. However, if the genotype was heterozygous, for example, "abbc," the counts of the allele combinations "ab" and "bc" were both increased by 1. The P value of the v 2 test is 2.5e-09, suggesting a significant distortion of allele combinations.
Additionally, we applied a modified v 2 test (eq. 2) for all the duplicated genes in the whole AMPRIL population by considering the effects of population structure.
Here, the o ij and e ij represent the observed and expected allele pair frequency of interchromosome duplicated genes, respectively, and a, b, c, d, e, f, g, and h represent all parental genotypes. The observed and expected allele pair frequencies in the whole population was the sum of observed ( P o ij ) and expected ( P e ij ) allele pair frequencies in each of the subpopulations.
The gene pairs with at least one significant segregation distortion in their observed allele combinations (FDR < 0.05) were kept for the next step. In this second step, we checked whether the respective gene copies contained LoF variation or methylated promoters (as described below). To address alternative splicing which is known to rescue LoF variation, we also checked the gene annotation within the parental genome assemblies to confirm the LoF. Only duplicated genes with confirmed LoF alleles in both copies of the duplication (in at least one of the parental Jiao et al. . doi:10.1093/molbev/msaa306 MBE genomes) were kept as candidates for genetic incompatibilities based on duplicated genes.

Identification of LoF Variants in Candidate Genes
We mapped all whole-genome resequencing reads of 1,135 accessions from the 1001 Genomes Project (Alonso-Blanco et al. 2016), 75 accessions from Africa (Durvasula et al. 2017), and 118 accessions from China (Zou et al. 2017), to the Col-0 reference genome (TAIR10) (The Arabidopsis Genome Initiative 2000; Lamesch et al. 2012) using BWA (version 0.7.15) (Li and Durbin 2009) with the default parameter settings. SNPs and small indels were called using SAMtools (version 1.9) using default parameters (Li et al. 2009). Homozygous variants with mapping quality of more than 20 and with at least four reads aligned were kept. Pseudoheterozygous variants in HPA and TIM22 were also recorded. The large insertion and deletions in the 40-kb extended genic region of focal genes were predicted using Pindel (version 0.2.5) (Ye et al. 2009) with parameter settings "-T 1 -x 5 -k -r -j" and Delly (version 0.8.1) (Rausch et al. 2012) with parameter settings "delly call -q 20 -r 20 -n -u 20 -g." The functional effects of these variations were annotated using SnpEff (version 4.3p) (Cingolani et al. 2012) using the default parameter settings. The LoF effects include loss of start codon, loss of stop codon, gain of premature stop codon, damage of splicing acceptor or donor sites, frameshift, and CDS loss.

Clustering of Cytosine Methylation Profile in Gene Promoter
Cytosine methylation data (the tab separated file of methylated cytosine positions) of 1,211 samples from the 1001 Epigenomes Project (Kawakatsu et al. 2016) were downloaded from NCBI (927 samples under GEO accession GSE43857 and 284 samples under GEO accession GSE54292). After removing the redundant samples, 888 and 161 data sets from GSE43857 and GSE54292, respectively, were retained. For each sample, we calculated the percentage of methylated cytosines in CG, CHG, and CHH contexts from 500 bp upstream of the transcription start sites to 300 bp downstream of the transcription termination sites of each candidate gene in 100-bp nonoverlapping sliding windows. These methylation profiles were hierarchically clustered using the hclust function implemented in R (version 3.5.1). The pairwise, Euclidean distances between all methylation profiles were calculated and Ward's method was used to cluster the samples into two groups (hypermethylated and unmethylated). This clustering was performed for the samples of GSE43857 and GSE54292 separately as these two data sets were processed in two studies with different pipelines (Dubin et al. 2015;Kawakatsu et al. 2016). The heatmap of methylation patterns was drawn in R using the heatmap.2 function.

Analysis of DNA Methylation within the Genomes of the AMPRIL Founders
For six accessions (Col-0, An-1, C24, Cvi-0, Ler, Kyo), we downloaded the whole-genome DNA methylation data from NCBI from the 1001 Epigenomes Project (Kawakatsu et al. 2016) (GSE43857). For Eri-1 and Sha, DNA methylation data were generated using whole-genome bisulfite sequencing by the Max Planck Genome center. DNA was extracted from plants grown in the greenhouse under standard conditions using the Qiagen DNEasy Plant Mini Kit (Qiagen, Germany) and a sequencing library was prepared using the NEXTflex Bisulfite Library Prep Kit. This library was sequenced on an Illumina HiSeq2000 machine. Sequencing reads were aligned the reference sequence using Bismark (version 0.20.0) (Krueger and Andrews 2011) with these parameters "-q -bowtie2 -N 1 -L 24 -p 20." The cytosine methylation profiles in candidate genes were calculated using the same sliding window method as described above. The cytosine methylation profiles together with the profiles from GSE43857 were then clustered again with the same clustering method as described above.

Mapping Modifiers of Incompatible Alleles Using GWA
We predicted the presence and absence of functional copies of duplicated genes (HPA, TIM22) in each of 1,135 accessions from the 1001 Genomes Project according to the annotations of LoF variations and the clustering of cytosine methylation profiles in promoters (supplementary data 3-6, Supplementary Material online). For accessions without available methylation sequencing data, we assume the focal genes are expressed. The presence or absence of any functional copies of the reference genes were used as binary phenotype (presence: 1, absence: 0). For the association, we selected 238,166 high-quality SNP markers (minor allele frequency >0.05 and missing rate <0.1) from 1001 Genomes Project and imputed missing alleles with IMPUTE2 (Howie et al. 2009;Howie et al. 2012). An in-house R script (https://github.com/ schneebergerlab/AMPRIL-GI/blob/master/GWAS/gwas.LM. multipro.r) implementing the mixed linear model with correction of kinship bias was used to perform the GWA (Bonferroni correction, P < 0.05).

Copy Number Variation Analysis
To test for the existence of HPA3 in a genome, we mapped whole-genome short reads of 1,135 accessions from 1001 Genomes Project (Alonso-Blanco et al. 2016), of 75 accessions from Africa (Durvasula et al. 2017) and of 118 accessions from China (Zou et al. 2017) to the Kyo genome assembly (Jiao and Schneeberger 2020) using BWA (version 0.7.15) with default parameters. Accessions with an average mapping coverage !5 along the HPA3 region and duplicated region breakpoints (identified based on the sequence alignment against Col-0 genome using SyRI; Goel et al. 2019; version 1.0; with default parameters) were considered as HPA3 carriers.
The copy number of duplicated genes was estimated by the ratio between the average mapping coverage within the focal gene and the average mapping coverage across the whole genome. To estimate the copy number of HPA, we mapped the short reads to the Cvi-0 genome assembly using BWA (version 0.7.15) with default parameters as the Cvi-0 genome only has one copy of HPA. For TAD3, FOLT, and TIM22, the copy number was predicted based on short reads mapping against the reference genome where both TAD3 and FOLT only have one copy. For the TIM22, copy number Evolutionary Dynamics of Duplicated Gene-based Genetic Incompatibilities . doi:10.1093/molbev/msaa306 was estimated based on the average mapping coverage at both TIM22-1 and TIM22-2.
Fine-Mapping of Incompatible Allele at LD2 in Cvi-0 Â Col-0 RIL After the observation that it was not possible to obtain a RIL homozygous for the Col-0 allele at LD2.1 while homozygous for the Cvi-0 allele at LD2.3 (Simon et al. 2008), we derived two distinct heterogeneous inbred families (HIF) from two segregating RILs from this population (supplementary fig. 2, Supplementary Material online). 8RV467 is segregating for a region largely encompassing LD2.1 while fixed Cvi-0 at LD2.3. 8RV408 is segregating for LD2.3 while fixed Col-0 at LD2.1. In each derived HIF family, it is again not possible to fix the incompatible allele combination, so the families were used to exclude intervals that do not contribute to the incompatible interaction. Two rounds of fine-mapping were conducted by genotypically screening increasing populations of descendants (at the seedling stage) for recombinants in the interval. Gradually, the causal interval is reduced and delineated by markers exploiting known SNPs and indels in the Cvi-0 sequence.
Segregation of T-DNA Mutant Line at Gene TIM22-1 and TIM22-2 T-DNA lines GABI_848H04 segregates for an insertion in TIM22-1, whereas GABI_626H10 segregates for an insertion in TIM22-2, both in a Col-0 background. Descendant was screened genotypically at the seedling stage to characterize the segregation of the T-DNA insertion allele.
Complementation Cross to Validate the Incompatible Alleles of TIM22 A LD2.3 HIF line (8HV408-Het) was crossed with a GABI_626H10 line, both at the heterozygous state, in order to segregate potentially for all four possible hybrid allelic combinations at LD2.3, while maintaining a fixed Col-0 allele at LD2.1. The F1 descendant (123 individuals, from both cross directions) was screened genotypically at the seedling stage for the presence/absence of both the T-DNA insertion and the Cvi-0 allele.

Code Availability
Custom code used in this study can be freely accessed at https://github.com/schneebergerlab/AMPRIL-GI.

Seed Availability
We are currently preparing seeds of new AMPRIL populations for submission to NASC (http://arabidopsis.info/). Note heterozygous regions as reported in the genotype data might be fixed for one of the alleles in the requested seeds.

Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.