Dominance between self-incompatibility alleles determines the mating system of Capsella allopolyploids

Abstract The shift from outcrossing to self-fertilization is one of the main evolutionary transitions in plants and has broad effects on evolutionary trajectories. In Brassicaceae, the ability to inhibit self-fertilization is controlled by 2 genes, SCR and SRK, tightly linked within the S-locus. A series of small non-coding RNAs also encoded within the S-locus regulates the transcriptional activity of SCR alleles, resulting in a linear dominance hierarchy between them. In Brassicaceae, natural allopolyploid species are often self-compatible (SC) even when one of the progenitor species is self-incompatible, but the reason why polyploid lineages tend to lose self-incompatibility (SI) and the timing of the loss of SI (immediately after ancestral hybridization between the progenitor species, or at a later stage after the formation of allopolyploid lineages) have generally remained elusive. We used a series of synthetic diploid and tetraploid hybrids obtained between self-fertilizing Capsella orientalis and outcrossing Capsella grandiflora to test whether the breakdown of SI could be observed immediately after hybridization, and whether the occurrence of SC phenotypes could be explained by the dominance interactions between S-haplotypes inherited from the parental lineages. We used RNA-sequencing data from young inflorescences to measure allele-specific expression of the SCR gene and infer dominance interactions in the synthetic hybrids. We then evaluated the seed set from autonomous self-pollination in the synthetic hybrids. Our results demonstrate that self-compatibility of the hybrids depends on the relative dominance between S-alleles inherited from the parental species, confirming that SI can be lost instantaneously upon formation of the ancestral allopolyploid lineage. They also confirm that the epigenetic regulation that controls dominance interactions between S-alleles can function between subgenomes in allopolyploids. Together, our results illustrate how a detailed knowledge of the mechanisms controlling SI can illuminate our understanding of the patterns of co-variation between the mating system and changes in ploidy.


Identification of S-alleles and classification into four dominance classes
Despite the fact that most S-alleles are shared transpecifically between A. halleri and A. lyrata (Castric et al. 2008) and even transgenerically between Arabidopsis and Capsella (Paetsch et al. 2006), S-allele identifications/names have been assigned independently in each species, leading to much confusion in the literature.In this paper we introduce an identification scheme based on phylogenetic relationships among SRK sequences, in the form HXYYY, with X corresponding to the dominance class of the allele (ranging from 1 to 4), and YYY corresponding to its functional specificity (ranging from 001 up to a limit of 999).The functional specificity of SRK sequences from different species is assumed to be shared if they are phylogenetically closer to each other than to any other sequence from the same species (Castric et al. 2008).For instance, the most recessive allele in Arabidopsis and Capsella, called AhSRK01 in A. halleri, AlSRK01 in A. lyrata and CgrSRK03 in C. grandiflora, is here noted H1001 because it belongs to the dominance class I and is the first and only specificity in this class.The equivalence between speciesspecific identifications and our new generalized identification system is listed in Table S2.The dominance classes are also defined based on phylogenetic relationships among SRK sequences, as suggested initially by Prigoda et al. (2005) who defined four classes, as follows from the most recessive to the most dominant: A1, B, A3, A2.However, for more clarity and simplicity, we use instead the notation introduced by Goubet et al. (2012), from I to IV in the direction of increase in dominance: I (A1), II (B), III (A3) and IV (A2).

Reference SRK and SCR sequences from Arabidopsis and Capsella
Obtaining S-locus genotypes using the NGSgenotyp pipeline (Genete et al. 2020) requires a series of reference SRK and/or SCR sequences, so we started by assembling them from the literature.For A. halleri, the reference SRK sequences were taken from Table S1 of Genete et al. (2020).For A. lyrata they were taken from Table S14 of Mable et al. (2018) and from Takou et al. (2021).For C. grandiflora we used a set of partial S-domain sequences obtained by Paetsch et al. (2006), Guo et al. 2009, Bachmann et al. 2019, Nasrallah et al. 2007and Neuffer et al. 2023.Reference sequences of SCR from A. halleri and A. lyrata were taken from Guo et al. (2011), Goubet et al. (2012), andDurand et al. (2014).All SRK and SCR reference sequences used are posted at https://www.doi.org/10.6084/m9.figshare.22567558.v2together with their Genbank accession numbers.

S-locus genotypes and SRK sequences of the Cg-9 population of C. grandiflora
We expanded the reference database of Capsella reference SRK sequences by using the de novo assembly module (haploasm) of the NGSgenotyp bioinformatic pipeline (Genete et al. 2020) based on Illumina short-read whole genome resequencing data of 180 individuals from the Cg-9 population of C. grandiflora from Monodendri, Greece (Josephs et al. 2015; Sequence Read Archive (SRA) BioProject PRJNA275635, Table S1).This allowed us to first obtain full sequences of exon 1 of SRK for the C. grandiflora for which only partial sequences were available (49 in total), as well as to obtain full sequences of exon 1 of SRK for the newly identified alleles (25 in total).The sequences of these 74 SRK alleles are available at https://www.doi.org/10.6084/m9.figshare.22567558.v2.Based on this updated database of reference sequences, we then launched a new round of the genotyping module (genotyp) to obtain SRK genotypes.We obtained fully resolved diploid S-locus genotypes for 177 individuals (Table S1), while sequence read depth was too low for two additional individuals.One individual was missing a single allele, and one individual had four alleles, which could correspond to an admixed sample or an autotetraploid individual.Beside these 74 S-alleles, we also identified five sequences clustering with class II SRK alleles (H0002, H0003, H0011, H0012 andH0013 in Table S2, named, respectively, CgrSRK01, CgrSRK06, CgrSRK09, CgrSRK51 andCgrSRK63 in Paetsch et al. 2006 andNeuffer et al. 2023) that we considered as paralogous sequences unlinked to the S-locus for the following reasons : (1) two of them (H0002 and H0003) showed strong sequence similarity with sequences for which paralogy was confirmed with segregation analyses in A. lyrata and A. halleri (Aly13-2 and Aly13-7;Charlesworth et al., 2003;Castric & Vekemans, 2007); (2) the remaining three sequences had higher population frequencies than the other class II SRK alleles (11, 23 and 12 copies for H0011, H0012 and H0013, respectively, while true class II SRK alleles had frequencies ranging from 1 to 10 copies); and (3) they systematically occurred in individuals which already carried two other unambiguous SRK alleles (see Table S1).Among the 74 S-alleles co-segregating in this population, one belongs to the most recessive class, 16 alleles belong to the second most recessive class (class II), 20 alleles belong to the more dominant class III and 37 alleles belong to the most dominant class IV (Table below).As theoretically expected (Schierup et al. 1997;Billiard et al. 2007), we observed that the more dominant classes contain a larger number of different S-alleles, each at a lower population frequency .S1), for which both types of sequence data were available.We used as reference database of SRK sequences a combination of sequences from A. halleri, A. lyrata, and our enlarged set of C. grandiflora sequences.

Number and mean population frequency of S-alleles found in the population
For C. grandiflora, all four individuals were heterozygous at the S-locus, with allele H1001 shared by three individuals, while all other gene copies belonged to distinct S-alleles (H2002, H2006, H2017, H3014, and H4021) (Table S3).In addition, two individuals carried one copy each of putative paralogous sequences (H0011 and H0003 in individuals 85.3 and 86.12, respectively) orientalis parental allele, and all individuals had two copies of the H4047n allele, derived from the functional C. grandiflora parental allele H4047 (Table S3).Determining the exact copy number of alleles is possible using this dataset because analyses are based on genomic data, so that relative read depth informs about copy number (Genete et al. 2020).Indeed, as previously observed by Bachmann et al. (2021), accession DUB-RUS9 was lacking the H4004n allele inherited from C. orientalis, but we found that this individual was carrying two copies of a distinct allele, H2002, in the C. orientalis subgenome.Interestingly, another accession also carried one copy of the H2002 allele, LAB-RUS-4, but this accession also carried one copy of the C. orientalis H4004n and two copies of the C. grandiflora H4047 alleles.
Hence, it appears that Russian accessions of C. bursa-pastoris have a second allele segregating at the Slocus of the C. orientalis subgenome, which was not reported by Bachmann et al. (2021).In addition, we found that all but five C. bursa-pastoris individuals have the C. orientalis subgenome fixed for the H0014 paralogous sequence, and all but two individuals were fixed, probably in the C. grandiflora subgenome, for the H0003 paralogous sequence (Table S3).The genomic locations of H0014 and H0003 were confirmed to be distinct from the S-locus region thanks to Blast analysis against a public genome assembly of C. bursa-pastoris (Kasianov et al. 2017).
Analysis of RNA-seq data from flower buds from the same set of individuals gave strictly identical results as those obtained from genomic data, confirming the codominant expression of SRK transcripts (Burghgraeve et al. 2020).As expected also, RNA-seq data detected no expression of SRK from leaf or root tissues, with only three exceptions: two S-alleles from class II (H2002 and H2006), and one S-allele from class III (H3014).The expression of SRK in leaf tissues was already reported for some SRK alleles in A. lyrata, all from class II (Prigoda et al. 2005).Overall, these results confirm that the NGSgenotyp pipeline can be used to reliably infer S-locus genotype from RNA-seq data from flower buds.

Determining patterns of allelic expression in SRK and SCR in diploid and tetraploid parents.
We combined RNA-seq data from seven diploid and six synthetic tetraploid individuals of C. grandiflora used as parents for the hybrid material with C. orientalis (Duan et al. 2023, BioProject PRJNA848625) with newly obtained RNA-seq data from additional hybrid individuals (BioProject PRJNA946929) and applied the NGSgenotyp pipeline to obtain their S-locus genotype by focusing first on SRK.Overall, five different S-alleles were segregating among the 13 individuals: H1001, H2008, H2022, H4015 and H4035 (Fig. 2).All these alleles were present in the Cg-9 population and thus their sequences were present in the database.We used the "average read depth" computed by the genotyping module of NGSgenotyp to quantify relative expression of the SRK alleles present within each diploid and tetraploid individual.The results showed overall co-dominant transcript levels of the different SRK alleles (Table S4).Note that in the tetraploid individuals analyzed, we observed up to only three different SRK alleles, which implies that one of the three is present in two copies.However, the use of RNA-seq data precludes reliable determination of SRK copy numbers, as can be done using genomic resequencing data (see above).
Then we aimed to obtain the associated SCR sequences, so applied again the de novo assembly module (haploasm) of NGSgenotyp on the same individuals, but this time using the library of SCR reference sequences from A. halleri and A. lyrata described above, and using shorter k-mers (length of 15 instead of the default value of 20).Although reference SCR sequences were available only for H1001 and H2008 (from A. lyrata), we successfully obtained full coding sequences for all five S-alleles.The sequences are available at https://www.doi.org/10.6084/m9.figshare.22567558.v2.We used the SCR reference sequence database enriched with these five newly obtained SCR sequences to launch a new round of the genotyping module (genotyp) to estimate the relative expression of SCR alleles.In the reference database we removed exon 1 sequences of each SCR allele because they show lower allelic specificity which brings noise in the analyses.As the SCR sequences from the database are small, the alignment mode was set to "local" to allow partial mapping of the reads (soft clipping) in a way that optimizes alignment score.The results showed strongly asymmetric allele-specific expressions (see main text and Table S4).The results also showed that only one allele was highly expressed in tetraploid individuals, suggesting that the small RNA-mediated regulation of SCR expression is fully functional in tetraploids.These results also allowed to determine the patterns of hierarchical dominance among these five S-alleles, as follows: (1) H4035 dominant over H4015 (individual cg4-1-3-F); (2) H4035 dominant over H2022 and H2008 (cg4-8-3-F, cg4-12-4-F, cg4-9-2-F); (3) H4015 dominant over H2022 and H2001 (cg2-1-2-F, cg2-2-6-F, cg2-1-6-F, cg2-7-3-F); (4) H2022 dominant over H2008 (cg2-12-3-F, cg2-14-5-F, cg4-6-4-F).Hence, assuming transitivity of the dominance relationships: H4035 > H4015 > H2022 > H2008 > H1001.

Determining patterns of allelic expression in SRK and SCR in diploid and tetraploid hybrids
Finally, we applied the genotyp module of NGSgenotyp with SRK and SCR reference sequences, separately, on RNA-seq data from 27 diploid C. orientalis x C. grandiflora hybrids (F), seven tetraploidized hybrids (Sh) as well as 19 autotetraploid hybrids (Sd) obtained by crossing the parents described above (Fig. 1, Table 1).Overall, five different S-alleles were segregating among these hybrid individuals: the non-functional C. orientalis H4004n allele along with four C. grandiflora alleles, two of which that are predicted to be more recessive (H2008, H2022), and two of which that are predicted to be more dominant (H4015, H4035) than H4004n in pollen (Fig. 2).Among the 27 diploid hybrids (F individuals), 21 were carrying at least one copy of the H4004n allele, whereas six did not, so that the latter can be hypothesized to be SI.Accordingly, these six individuals expressed a functional SCR allele from C. grandiflora, either H2008, H2022 or H4015 (Table 1).Among the 21 homoploid hybrids carrying the H4004n allele, three individuals carried H4015, assumed to be dominant over H4004n.Hence, we can hypothesize that these three individuals are also SI, whereas the 18 remaining individuals are expected to be SC.Accordingly, these three individuals expressed the H4015 functional allele from C. grandiflora, thus confirming that H4015 is indeed dominant over H4004n.The remaining 18 individuals expressed predominantly the H4004n allele, with two exceptions: individuals F-5-6 and F-10-4 carried H2022 and H4004n but expression of the former was higher than that of the latter.All seven tetraploidized hybrids (Sh individuals) carried H4004n and the recessive H2022 allele, and are thus hypothesized to be SC.Again, SCR results confirm that these individuals expressed the H4004n allele only.Finally, the 19 hybrids between tetraploid individuals of C. grandiflora and C. orientalis (Sd individuals) were all found to carry H4004n, presumably inherited from their C. orientalis parent.Nine of them carried at least one of the more dominant C. grandiflora alleles H4015 or H4035, and can be assumed to be SI, whereas the other ten individuals received a more recessive S-allele from C. grandiflora, and are expected to be SC.These predictions were confirmed by SCR expression results (Table 1).Overall, in these hybrid individuals, the asymmetric expression of SCR sequences was strong, as observed above in C. grandiflora individuals, but the range of relative expression was wider (relative expression of the most dominant SCR allele ranging from 0.638 to 1.0) including in diploid hybrids, so this is not caused by polyploidy.

S-haplotypes phylogeny based on full SRK exon1 sequences.
In order to understand which S-haplotype specificities are shared within and among Arabidopsis and Capsella genera, 23 SRK (exon 1) nucleotide sequences from Arabidopsis and Capsella genera were aligned with Muscle v.3.5 (Edgar 2004) using the default strategy and manually checked with Seaview v.5 (Gouy et al. 2021).ARK3 and SRK genes are paralogous therefore an ARK3 sequence of Capsella rubella (Cru) was used as outgroup.On the alignment, Gblocks version 0.91b (Castresana 2000) was applied to remove poorly aligned regions.This resulted in a 1236 nucleotide dataset.The best fitting model under the ML criterion was selected with ModelTest-ng v.

Fig. S1
Fig. S1 Crosses and induced whole-genome duplication (WGD) for generating diploid (F) and two types of tetraploid hybrids (Sd and Sh) from diploid Capsella orientalis (Co2) and Capsella grandiflora (Cg2; Duan et al. 2023).Autotetraploid Capsella orientalis (Co4) and Capsella grandiflora (Cg4) were created as intermediate steps.The generation numbers of the synthetic polyploids or hybrids are indicated by subscripts, and plant materials used in the present study are highlighted in bold.

Fig. S2
Fig. S2 Phylogeny of SRK alleles identified in parents or hybrids between Capsella grandiflora (Cgr) and C. orientalis (Co).Phylogenetic reconstruction using maximum likelihood under the TPM3uf+I+G4 model implemented by PhyML (see Supplementary Information).We also included sequences identified in C. bursa-pastoris (Cbp), and trans-specific alleles from Arabidopsis halleri (Ah), A. lyrata (Al) or A. arenosa (Aar) when available.Functional specificities and dominance classes of the S-alleles are indicated.The outgroup is represented by an ARK3 sequence from Capsella rubella (Cru).Bootstrap values are represented by dots on the nodes: black: BP = 100, grey: 95<BP<99, white: BP=89, no dot: BP < 80.

Fig. S3
Fig. S3Observed relative SCR read depth of H4004n relative to all other alleles in diploid and tetraploid hybrids as a function of expected dominance of H4004n (green, H4004n is dominant over C. grandiflora allele(s); brown, H4004n is recessive to C. grandiflora allele(s)).

Validation of the NGSgenotyp pipeline for use on RNA-seq data We
evaluated how the NGSgenotyp performed when applied to RNA-seq data (from either flower bud, leaf or root tissues) by comparing the results with those obtained from whole genome resequencing data.We performed this comparison on four C. grandiflora and four C. orientalis individuals, as well as on 16 C. bursa-pastoris individuals fromKryvokhyzha et al. (2019, Table Neuffer et al. 2023)9,ll 4 individuals were homozygous for the non-functional allele H4004n (n is for non-functional), in agreement withBachmann et al. (2019,note that these authors refer to this allele as CoS12).All C. orientalis individuals were also homozygous for an unlinked paralogous sequence (H0014, named CorSRK06 byNeuffer et al. 2023).Unlike the putative paralogous sequences observed in C. grandiflora, which are segregating at low or moderate frequencies at unlinked loci, the H0014 sequence appears to be fixed in C. orientalis.For the allotetraploid C. bursa-pastoris, in agreement with the results ofBachmann et al.  (2021), all individuals but two had two copies of the H4004n allele, derived from the non-functional C.

Table S1 .
Josephs et al. (2015)s genotypes of 176 individuals from population Cg-9 of Capsella grandiflora, as obtained by the NGSgenotyp pipeline on whole-genome resequencing data fromJosephs et al. (2015).Allele sequences are identified according to a Brassicaceae functional sequence group nomenclature, as determined based on sequence similarity with alleles from related species of Capsella and Arabidopsis genera (see TableS2for correspondence with

Table S2 .
List of the 74 SRK alleles of C. grandiflora identified within population Cg-9.References and Genbank accession numbers of previously published allele sequences are given, as well the source individual from population Cg-9 used to assemble a full sequence of exon 1 using the NGSgenotyp pipeline (available from file S1).References to five putative paralogous sequences are also given.

Table S3 .
(Bachmann et al. 2019;9)ividuals from C. grandiflora, C. orientalis and C. bursa-pastoris, as inferred from genomic DNA or RNA-seq data from flower buds, leaf or root tissues (based on data fromKryvokhyzha et al. 2019).References and specific notations for C. grandiflora alleles are given in TableS2.H4004n is a non-functional allelle present in C. orientalis and C. bursa-pastoris(Bachmann et al. 2019; Bachmann   et al. 2021), derived from the functional H4004 allele from C. grandiflora.H4047n is a non-functional allele present in C. bursa-pastoris, that has not yet been identified in C. grandiflora(Bachmann et al. 2021).The presence of paralogous sequences (H0003, H0011, H0014) is also indicated (references given in TableS2; H0014 is equivalent to CorSRK06 described inNeuffer et al. 2022).

Table S4 .
Relative expression levels of co-existing S-alleles in SRK and SCR in diploid and tetraploid individuals of C. grandiflora (cg2 and cg4) and C. orientalis (co2 and co4).The predominantly expressed allele in pollen is compared to putative dominance determined by previous knowledge about the dominance classes of the alleles.The difference in expression of the dominant allele is reported as a ratio of mean read depth, obtained by the genotyping pipeline NGSgenotyp.When two alleles belonging to the same dominance class co-occur, we cannot determine a priori which one is the most dominant, but SCR expression data allows us to resolve this uncertainty.§ Undetermined as the individual is homozygous. *

Table S5 .
S-locus genotypes and inference of putative self-compatible (SC) or self-incompatible (SI) phenotype of diploid and tetraploid individuals from C. grandiflora (cg2 and cg4), C. orientalis (co2 and co4) and experimental hybrids (F, Sd, Sh), as inferred from RNA-seq data from flower buds and analysis of SRK with the NGSgenotyp pipeline.Uncertainties in the relative number of gene copies in some tetraploid individuals are indicated.The putative self-incompatible (SI) vs self-compatible (SC) phenotype of a given individual is inferred from previous knowledge of relative dominance of S-alleles in the pollen and from whether the non-functional allele H4004n from C. orientalis is present and putatively dominant in this individual.The presence of paralogous sequences (H0002, H0003, H0011, H0012, H0013, H0014) is also indicated.

Table S6 .
Prediction of self-compatibility phenotype (self-compatible, SC; or self-incompatible, SI) of diploid and tetraploid hybrid individuals based on results on SRK genotype of individuals and previous knowledge about dominance relationships among S-alleles, and based on actual dominance of the nonfunctional allele copy of C. orientalis as estimated from relative expression levels of SCR.Grey boxes indicate the detection of the presence of an S-allele from SRK genotyping results.Numbers within boxes correspond to the mean read depth of SCR of the corresponding allele.Individual F-5-3 was suspected to be a spontaneous allotetraploid based on its phenotypes.