Repeated Evolution Versus Common Ancestry: Sex Chromosome Evolution in the Haplochromine Cichlid Pseudocrenilabrus philander

Abstract Why sex chromosomes turn over and remain undifferentiated in some taxa, whereas they degenerate in others, is still an area of ongoing research. The recurrent occurrence of homologous and homomorphic sex chromosomes in distantly related taxa suggests their independent evolution or continued recombination since their first emergence. Fishes display a great diversity of sex-determining systems. Here, we focus on sex chromosome evolution in haplochromines, the most species-rich lineage of cichlid fishes. We investigate sex-specific signatures in the Pseudocrenilabrus philander species complex, which belongs to a haplochromine genus found in many river systems and ichthyogeographic regions in northern, eastern, central, and southern Africa. Using whole-genome sequencing and population genetic, phylogenetic, and read-coverage analyses, we show that one population of P. philander has an XX–XY sex-determining system on LG7 with a large region of suppressed recombination. However, in a second bottlenecked population, we did not find any sign of a sex chromosome. Interestingly, LG7 also carries an XX–XY system in the phylogenetically more derived Lake Malawi haplochromine cichlids. Although the genomic regions determining sex are the same in Lake Malawi cichlids and P. philander, we did not find evidence for shared ancestry, suggesting that LG7 evolved as sex chromosome at least twice in haplochromine cichlids. Hence, our work provides further evidence for the labile nature of sex determination in fishes and supports the hypothesis that the same genomic regions can repeatedly and rapidly be recruited as sex chromosomes in more distantly related lineages.


Introduction
Sexual reproduction is nearly universal across eukaryotes (Speijer et al. 2015;Garg and Martin 2016). One of the most puzzling aspects of this ancient trait is the remarkable contrast between ultra-conserved features (e.g., meiosis, ploidy changes, and cell fusion) and plastic components (e.g., sex determination and modes of reproduction) (Lode 2012;Heule et al. 2014;Capel 2017;Pannell 2017). In particular, the great diversity of sex-determining (SD) mechanisms suggests their repeated and continuous evolution throughout the eukaryotic tree of life (reviewed by Heitman [2015], Blackmon et al. [2017], Capel [2017], and Pannell [2017]), supporting the view of sex as a threshold phenotype that can be canalized into either one of two discrete states by a variety of extrinsic or intrinsic factors as well as a combination thereof (Perrin 2016;Capel 2017). The involvement of extrinsic factors in SD is summarized under the term environmental sex determination (ESD). Intrinsic factors, commonly referred to as genetic sex determination (GSD), comprise systems ranging from single base pair differences between the sexes (Kamiya et al. 2012) to highly differentiated sex chromosomes as in mammals or birds (Graves 2006(Graves , 2008(Graves , 2014 and including polyfactorial SD (Moore and Roberts 2013) and even SD via RNA instead of protein-coding genes (Akagi et al. 2014;Kiuchi et al. 2014). Sex chromosomes originate from autosomes when one locus acquires a mutation such that heterozygous individuals develop into one sex, whereas homozygous ones develop into the other sex. If sex chromosomes evolve within an ancestrally hermaphroditic (or monoecious) species, at least two mutations are necessary to induce the evolution of GSD (Muller 1932;Westergaard 1958;Charlesworth and Charlesworth 1978) and hence of sex chromosomes.
The canonical model of sex chromosome evolution predicts that suppression of recombination between such proto-sex chromosomes is favored (Muller 1918) and adjacent sexually antagonistic mutations may cause the spread of reduced recombination along the chromosome (Charlesworth [2017] but see also Cavoto et al. [2018]). Suppressed recombination will lead to a reduced effective population size of the sexlimited chromosome (Y in male-heterogametic species; W in female-heterogametic species) and an increase of Hill-Robertson interferences (Charlesworth et al. 1987;Charlesworth and Charlesworth 2000;Charlesworth 2017). Deleterious mutations on the Y/W can no longer be purged and, consequently, accumulate under the impact of Muller's ratchet, background selection, and selective sweeps (Charlesworth and Charlesworth 2000;Charlesworth et al. 2005). This can lead to chromosomal decay, as exemplified by the mammalian Y chromosome (reviewed by Graves [2006], Bellott and Page [2009], and Schartl et al. [2016]). One escape route to this "evolutionary trap" can be sex chromosome turnover suggested to be induced by deleterious mutation load (Blaser et al. 2013) or sex-antagonistic mutations occurring on autosomes (van Doorn and Kirkpatrick 2007;van Doorn and Kirkpatrick 2010) driving the evolution of a new sex chromosome pair. Sex chromosome turnovers have indeed been shown in, for example, fishes and amphibians (Miura 2007;Volff et al. 2007;Kitano and Peichel 2012;Sessions et al. 2016;Jeffries et al. 2018), with cichlids illustrating the role of sexual antagonism as a driving force in this process (Roberts et al. 2009).
Alternatively, low levels of recombination might be maintained between the two sex chromosomes that are sufficient to allow the purging of deleterious mutations from Y or W chromosomes (Guerrero et al. 2012;Dufresnes et al. 2014). The loci that pave the way for sex chromosome evolution are often unknown. Still, comparisons across different animal taxa revealed the recurrent evolution of certain genes as master SD genes. This has led to the proposition that there are "limited options" for SD genes or even sex chromosomes (Marshall Graves and Peichel 2010).
With $3,000-4,000 species, cichlid fishes are one of the largest vertebrate families (Salzburger and Meyer 2004).
Because of their taxonomic richness, their phenotypic and ecologic diversity, and their propensity to diversify, cichlids are an important model system in evolutionary biology (Kornfield and Smith 2000;Henning and Meyer 2014;Seehausen 2015;Salzburger 2018). The most species-rich lineage within Cichlidae is Haplochromini, which includes the members of the adaptive radiations in Lakes Victoria and Malawi (together $1,200 species), many riverine and lacustrine species elsewhere in Africa (Turner et al. 2001;Verheyen et al. 2003;Schwarzer et al. 2009;Schwarzer et al. 2012), as well as $30 species endemic to Lake Tanganyika (the "Tropheini") (Salzburger et al. 2005).
Cichlid fishes perfectly exemplify the plastic components of sexual reproduction in that closely related species feature various breeding systems and a variety of SD mechanisms including ESD and GSD systems (Rö mer and Beisenherz 1996;Cnaani et al. 2008;Ser et al. 2010;Yoshida et al. 2011;Parnell et al. 2012;Parnell and Streelman 2013;Reddon and Hurd 2013;Kudo et al. 2015;Bö hne et al. 2016;Roberts et al. 2016;Peterson et al. 2017;Feulner et al. 2018;Gammerdinger et al. 2018a, b) (fig. 1A). Cichlids are, thus, an excellent model to study the dynamics of SD system evolution. Previous research on the evolution of SD systems in African cichlids lends some support to the "limited options" hypothesis. Two particular chromosomes (corresponding to LG5 and LG7 in the Nile tilapia genome, an outgroup species to the East African Great Lakes, often used as reference) have repeatedly been recruited as sex chromosomes in different species of the East African Great Lakes (Parnell et al. 2012;Kudo et al. 2015;Bö hne et al. 2016;Roberts et al. 2016;Peterson et al. 2017;Gammerdinger et al. 2018a;Ser et al. 2010).
In this study, we approach cichlid sex determination from a phylogenetic perspective by investigating sex chromosome signatures in the Pseudocrenilabrus philander species complex, a member of a sister-clade to the modern haplochromines of Lakes Victoria, Malawi, and Tanganyika. We sampled two populations for whole-genome sequencing in northern Zambia: Mbulu creek and Lake Chila, a small lake 20 km south of Lake Tanganyika, which is connected to the Mbulu creek via its outflow ( fig. 1B and C). The P. philander species complex (Katongo et al. 2005;Koblmuller et al. 2012) comprises two major mitochondrial lineages, one representing the Zambezi-Kafue drainage and one lineage of mainly Congolese origin (Egger et al. 2015). Both lineages occur in Lake Chila, with the Zambezi-Kafue lineage being far more frequent (Egger et al. 2015). Population assignment tests based on microsatellite data suggest that the two lineages represent a single panmictic population. The Mbulu creek population belongs to the Zambezi-Kafue lineage and experienced genetic bottlenecks probably induced by strong seasonal variation in water volume (Egger et al. 2015). Upon the inspection of 24 newly sequenced P. philander genomes and a marker-based approach in a larger set of individuals, we provide strong evidence for an XX-XY SD system on LG7 in  Cnaani et al. (2008), Feulner et al. (2018, Gammerdinger et al. (2018a, b), Kudo et al. (2015), Parnell and Streelman (2013), Peterson et al. (2017), Roberts et al. (2016), Ser et al. (2010), and Yoshida et al. (2011). Haplochromine lineages are depicted in green. (B) Map of East Africa and a zoom on the sampling locations: 1, Lake Chila and 2, Mbulu creek. (C) Male specimen of Pseudocrenilabrus philander. (D) PCA on genome-wide variant data of all P. philander individuals of this study. PC1 separates the lake individuals from the creek population. PC3 separates males from females. The outlier MJB7 and the potential sex-reversed individual MJA8 are highlighted: dark gray: Lake Chila, light gray: Mbulu creek, red: females, and blue: males. the lake population. We could not detect this or any other GSD system in the genomes of the creek population. We compare our results to an XX-XY system in the same genomic region of cichlids from Lake Malawi (Ser et al. 2010;Parnell and Streelman 2013;Peterson et al. 2017). Finally, we show that the XX-XY SD system on LG7 in P. philander possibly evolved within Lake Chila, because it seems absent in other populations of the P. philander species complex.

Materials and Methods
Sampling, DNA Extraction, and Sequencing For this study, we sampled six males and six females of P. philander from Lake Chila and 12 individuals (4 males, 3 females, and 5 juveniles) from the adjacent Mbulu creek for wholegenome sequencing ( fig. 1). In addition, we included 78 specimens sampled for a previous study (Egger et al. 2015) for polymerase chain reaction (PCR) genotyping (see below). Fin clips and whole specimens were preserved in ethanol. Individuals were sexed by visual inspection of the gonads and body coloration. Five specimens from Mbulu creek did not show distinguishable gonads and were defined as juveniles. DNA was extracted from fin clips with EZNA Tissue DNA Kit (Omega Bio-Tek). Individual genomic libraries were prepared with TruSeq DNA PCR-free Low Sample Kit (Illumina), pooled per population and subsequently sequenced (150 bp pairedend) on four lanes of an Illumina HiSeq3000 by the genomics facility of the D-BSSE (Basel, Switzerland; supplementary table S1, Supplementary Material online). Sequencing data were deposited in the SRA (SRP148476). Research involving animals was performed with approval of the Swiss authorities under a research permit issued by the Lake Tanganyika Research Unit, Department of Fisheries, Mpulungu, Zambia.

Population Structure and Phylogeny
To assess population structure, between-population genomewide F ST , average d xy (absolute divergence), and average p (nucleotide diversity) were calculated in 10 kb windows on the filtered VCF file including single nucleotide polymorphisms (SNPs) and indels using evo (https://github.com/millanek/evo/; last accessed January 23, 2019). Average d a (net divergence) was calculated using Nei and Li's formula: d a ¼ d xy -(p x þ p y )/2 (Nei and Li 1979). Tajima's D was calculated for each population in 10 kb windows in VCFtools 0.1.14 (Danecek et al. 2011). Population structure was examined on the whole-genome VCF data set with a principal component analysis (PCA) using smartPCA (Eigensoft 6.1.4) (Patterson et al. 2006). Alignments to the mitochondrial reference scaffold NC_013663.1 were extracted from individual BAM files, sorted with SAMtools 1.3.1. ) and converted to fastq format using Picard 2.8.0 SamToFastq (http://broadinstitute.github.io/picard; last accessed January 23, 2019). Mitochondrial genomes were reconstructed from these reads with MIRA 4 (Chevreux et al. 1999). The regions corresponding to the control region (D-Loop) were subsequently extracted and aligned with additional public sequences from the P. philander species complex (sequences from Egger et al. [2015]); using MAFFT online service 7 (Katoh et al. 2017) under the FFT-NS-i option, that is, with fast construction of an initial alignment followed by iterative refinement until convergence. Identical sequences were collapsed into haplotypes using DNA collapser (FaBox) (Villesen 2007). Bayesian inference of phylogeny was done in MrBayes 3.2.2 (Ronquist et al. 2012). Posterior probabilities were obtained from Markov chain Monte Carlo simulations in two independent runs (10 chains with 10 Mio generations each, chain temperature: 0.25, trees sampled every 1,000 generations) using the best-fit model of molecular evolution as suggested by jModelTest (Posada 2008). A 50% majority-rule consensus tree was constructed after a 1 million generation burn-in (chain stationarity and run parameter convergence were checked with Tracer 1.6, http://tree.bio.ed.ac.uk/software/ tracer/; last accessed January 23, 2019, using posterior probability as a measure of clade support). A whole nuclear genome phylogeny was built by reconstructing for each individual a sequence corresponding to the first haplotype of each linkage group using samtools faidx (LG) ) and bcftools consensus -haplotype 1 (BCFtools 1.5, https://samtools.github.io/bcftools/; last accessed January 23, 2019). The sequences of each linkage group were then concatenated and merged into one sequence per individual using EMBOSS union (Rice et al. 2000). Maximum likelihood inference was done with RAxML 8.2.11 (-k, À# 100, Àf a) (Stamatakis 2014). Branch length estimation (Àk) is given in number of mutations per bp per generation. In order to obtain divergence times in number of generations, we used the Lake Malawi cichlid mutation rate estimation of 3.5 Â 10 À9 per bp per generation (95% CI: 1.6 Â 10 À9 to 4.6 Â 10 À9 ) from Malinsky et al. 2018. The VCF file was phased and genotypes were imputed with Beagle 4.1 Browning 2007, 2016). For topology weighting, we used Twisst (Martin and Van Belleghem 2017) with 1, 5, and 10 kb windows to infer if Chila and Mbulu males were more closely related to each other than to the females of their respective population in a specific region of LG7.

Sex Chromosome Identification and Characterization of the Type of SD System on LG7
Male-female F ST and difference in nucleotide diversity (p diff ¼ p males À p females ) were calculated in 10 kb windows on the filtered VCF file including SNPs and indels with evo (https:// github.com/millanek/evo/; last accessed January 23, 2019).
We tested for a difference in nucleotide diversity between males and females of each population with a Welch two sample t-test in R 3.4.2. (R Core Team 2017). We calculated malefemale F ST per population (five males vs. five females for Lake Chila; four males vs. three females for Mbulu creek) as well as for both populations combined. A maximum likelihood phylogeny was reconstructed as described above on LG7 only and on all chromosomes excluding LG7. A relatedness statistic (unadjusted Ajk statistic) (Yang et al. 2010), of all individuals was calculated separately for LG7 and for all the remaining chromosomes in VCFtools (-relatedness and -chr LG7 ornot-chr LG7). F IS (inbreeding coefficient) was calculated separately for LG7 and all LGs excluding LG7 in VCFtools (-het and -chr LG7 or -not-chr LG7) for each individual within its respective population. The inbreeding coefficient F IS was also calculated in 10 kb windows per sex within each population along LG7 and correlated to male-female F ST following the method described by Rodrigues and Dufresnes (2017). To obtain the average normalized F IS value per sex for each 10 kb window, the per individual genome-wide F IS value excluding LG7 was subtracted from the LG7 individual F IS value. Then, the individual normalized F IS values were averaged per sex. Next, we selected biallelic sites from the initial filtered, unphased VCF file for five males and five females from the lake population of the same mitochondrial linage resulting in a total of 30,811,926 sites. We selected sites for which all females were homozygous and all five males heterozygous (XY-sites) as proposed by Brelsford et al. (2017). XY-sites on LG7 were annotated using SnpEff 4.3 (Cingolani et al. 2012).

De Novo Genome Assemblies and Alignment
We followed the pipeline described in Malmstrøm et al. (2017) to generate a female and male draft genome de novo assembly for Lake Chila P. philander using CeleraAssembler 8.3 (Myers et al. 2000) and FLASh 1.2.11 (Magoc and Salzberg 2011), pooling the raw reads of three females and three males. Assembly quality was assessed with QUAST 4.5 (Gurevich et al. 2013) and assembly completeness with BUSCO 3 (Simao et al. 2015) (supplementary table S2, Supplementary Material online). To anchor contigs onto the O. niloticus reference genome, we used LAST 861 (lastdb -uNEAR -cR11; lastal -m75 -E0.05) (Kiełbasa et al. 2011). MAF alignment output was converted into tabular format with LAST. Female alignments to LG7 were extracted from the tabular output and filtered to keep scaffolds of >2 kb length and alignment sequence coverage of 50% resulting in 3,340 contigs representing the X chromosome. Scaffolds were ordered based on the start position of their longest alignment. For comparative purposes, we extracted the female scaffolds aligning to LG6 with the same settings (2,048 contigs).

Sequence Coverage Analysis
Coverage was calculated for each sex from mapping against the de novo assembled genomes. Quality filtered reads of the five male and five female individuals of Lake Chila were mapped against the female and male draft genome using bwa-mem of BWA (Li and Durbin 2009). Alignments were converted to BAM format, sorted, and indexed with SAMtools 1.3.1 ). Coverage per individual per site was calculated with samtools depth -aa (SAMtools 1.3.1) ). The median coverage against the female de novo assembly over all sites and all individuals per sex for each population was calculated in R 3.3.1, resulting in 17 for Lake Chila males and 18 for Lake Chila females. We did the same analysis keeping only alignments with zero mismatches resulting in a median coverage of 3 in Lake Chila males and 4 in females. Next, we calculated median coverage per site and sex for the scaffolds anchored to LG7 and LG6 (for comparative reasons) and normalized it by the sex-specific median. From these values, we calculated averages of 10 kb windows, which were log 2 transformed for plotting. These steps were run in R using the packages reshape 0.8.7 (Wickham 2007), miscTools 0.6-22 (Henningsen and Toomet 2016), zoo 1.8 (Zeileis and Grothendieck 2005), and ggplot2 2.2.1 (Wickham 2009). From the mapping against the male de novo assembly, we identified regions of "male-only-coverage" (potential Y-specific sequences) as regions in which consecutive positions of 1 kb length had coverage in at least four out of the five males with a total coverage >5 and a coverage over all females <3.

K-mer Analysis and Assemblies
To assemble Y chromosome-specific sequences, we followed a method described by Akagi et al. (2014). We identified Yspecific reads over their difference in k-mer composition compared with female reads. Raw reads were filtered with Trimmomatic 0.36 (Bolger et al. 2014) (PE mode, adapters. fasta.2:30:10LEADING:3TRAILING:3SLIDINGWINDOW:4: 15MINLEN:5). From the trimmed reads, we generated k-mer tables for all 37 k-mers starting with the trigger sequence "AG" and having at least 5 counts reducing the k-mer complexity and computational cost as established by Akagi et al. (2014) using a Python script provided by the Comai lab. Using "CT" as the trigger sequence yielded similar results (data not shown). For comparative reasons, we applied the same method to a human data set of Great Britain ancestry from the 1000 Genomes Project Consortium (http://www.internationalgenome.org/; last accessed January 23, 2019, samples ERR020230, ERR050089, SRR189815, ERR050086, SRR068180, and SRR190845) that had already been used in a k-mer assembly for Y chromosomes (Carvalho and Clark 2013).
Resulting male and female k-mer counts were compared and potential Y-k-mers identified as k-mers that had >9 counts in males but <5 counts in females resulting in 3,612,202 unique Y-k-mers (out of 130,094,951 total unique k-mers). We extracted male reads matching these Y-k-mers and their mate with bbduk (BBTools 37.57, https://jgi.doe. gov/data-and-tools/bbtools/; last accessed January 23, 2019).
The resulting 55,627,673 read pairs were de novo assembled with MEGAHIT 1.1.1 (Li et al. 2015) with stepsize 10, kmin 21, kmax 121, and minimum length 1 kb. Male and female reads were back-mapped on the so-obtained 122,977 contigs. We removed contigs that had over 50 reads coverage at a single position in at least one male (likely individual specific repetitive elements) and those with a 5 read coverage in females. The resulting 233 contigs were blasted against the male and female draft genomes (Blastþ 2.6.0, BlastN with -qcov_hsp_perc 70 and -num_alignments 10, all other settings in default) (Camacho et al. 2009), and discarded if they had a match to the female genome with !95% sequence identity. From these remaining 138 contigs, 35 were also present in the full male genome assembly. The 138 contigs were loaded into Blast2GO (Conesa et al. 2005) and scanned for coding sequences with the integrated version of AUGUSTUS (Hoff and Stanke 2013) and Danio rerio as reference organism. Obtained genes were blasted against nr (BlastX), searched against Interpro, mapped, and annotated with default settings within Blast2GO. We calculated male and female coverage for these contigs following the same method as described for the stringent method of X-chromosomal coverage.

K-mer Composition of the X Chromosome
To extract k-mers from the X chromosome, we selected kmers that had a female/male count ratio between 1.75 and 2.25. The obtained 7,227,218 k-mers were blasted against the reconstructed X chromosome of P. philander with BlastNshort allowing only for perfect matches and maximum 10 alignments per query (Blastþ 2.6.0) (Camacho et al. 2009) resulting in 424,156 k-mers placed on the X chromosome.

Comparison of LG7 in Other Cichlids
LG7 carries an XY system in cichlids from Lake Malawi. WGS sequences for Astatotilapia calliptera, Aulonocara stuartgranti, and Lethrinops lethrinus were downloaded from the SRA (accession numbers in supplementary table S3, Supplementary Material online), transformed to fastq, trimmed, quality filtered, and mapped to the Nile tilapia genome as described above. Variant calling, filtering, and phasing were also performed as described above. For each individual (24 P. philander and 6 from Lake Malawi), a sequence corresponding to the first haplotype of LG7 was extracted using bcftools consensus -haplotype 1 BCFtools 1.5 (https://github.com/samtools/bcftools; last accessed January 23, 2019). Maximum likelihood inference and subsequent divergence time estimation were performed as described above. To infer if P. philander males and Lake Malawi males were more closely related in a specific region of LG7 than to their respective females, fixedlength phylogenies were calculated with Twisst (Martin and Van Belleghem 2017) using 1, 5, and 10 kb window sizes. For each window size, the support for each topology was quantified by counting the number of windows supporting strongly (100% data; >75% data) or moderately (>66% data) each topology. For comparative purposes, the same topology weighting analysis was also performed on LG6.
We extracted reads aligning to the genomic region of gsdf plus 2 kb up-and down-stream (O. niloticus: NC_031972.1: 17,568,814-17,579,211). Alleles per individual of the gsdf region were de novo assembled using SeCaPr (Andermann et al. 2018) and maximum likelihood phylogenies were conducted as described above. In the same way, we constructed phylogenies for eight candidate genes of sex determination (supplementary table S7, Supplementary Material online, candidate genes are marked in yellow and genomic coordinates from the reference genome are indicated).
Sex-specific variant sites for Lake Malawi cichlids were retrieved from O'Quin (2014) and visually inspected. Sequences for two XX-XY loci described by Parnell et al. (2012) and Parnell and Streelman (2013) were downloaded from SNPdb and placed on the Nile tilapia genome using Blast (Camacho et al. 2009). Marker 27028 (SNP: rs267732628) is located on scaffold NW_017615339.1: 59,608-59,966.
Marker 45045 (SNP: rs267732730) is located on NC_031972.1: 1,010,601-1,010,981. We extracted raw reads corresponding to these regions with SAMtools ). BAM files were sorted and indexed using SAMtools ). Genotypes of the two SNPs for each individual (24 P. philander and 6 Lake Malawi cichlids) were visually inspected using SAMtools tview ).

Genome-Wide Statistics, Population Structure, and Demography
The 24 individuals sequenced in this study could all be assigned to previously identified mitochondrial DNA (mtDNA) haplotypes and fell into clades described by Egger et al. (2015). All specimens from Mbulu creek and eight Lake Chila specimens featured mtDNA haplotype Ht13 (supplementary fig. S1A, Supplementary Material online) (Egger et al. 2015), three Lake Chila specimens had Ht18 of the Kafue-Zambezi lineage; and one Lake Chila sample, MJB7, displayed Ht32 (table 1 and  Aligning the P. philander genome sequences to the O. niloticus reference genome resulted in 38,260,972 variant sites (SNPs and indels, table 1). The mean sequencing coverage per individual ranged from 12.7Â to 16.4Â (table 1) being in a range that allows accurate genotyping of heterozygous sites with the GATK multisample caller (Cheng et al. 2014;Meynert et al. 2014). A whole-genome nuclear phylogeny showed that Lake Chila and Mbulu creek populations are reciprocally monophyletic, with an estimated coalescence time of about 620,000 generations for Mbulu creek (95% CI: 472,000-1,357,000) and 912,000 generations for Lake Chila (95% CI: 694,000-1,996,000, supplementary table S4 and supplementary fig. S1B, Supplementary Material online). Genome-wide F ST between the two populations was 0.538, average d xy (absolute genetic divergence) was 0.00764, and average d a (net genetic divergence) was 0.00411 (table 2). Lake and creek individuals were clearly separated on PC1 in a genome-wide PCA (fig. 1D). The Mbulu creek population displayed low levels of within population nucleotide diversity p (0.00193; $2.6-fold smaller than Lake Chila) and a highly positive Tajima's D (0.42; $8-fold larger than Lake Chila, table 2), indicative of an excess of haplotypes compared with the number of segregating sites, compatible with an ongoing population contraction event, that is, a bottleneck. This reduction in effective population One female of the Lake Chila population (MJB7) displayed a negative genome-wide F IS (table 1 and supplementary table  S5, Supplementary Material online), indicating that its heterozygosity is higher than expected under Hardy-Weinberg equilibrium; furthermore, it belongs to a different mtDNA lineage and is clearly separated from all other individuals in PC2 of the genome-wide PCA ( fig. 1D). Taken together, this is suggestive of MJB7 being a hybrid between a Lake Chila Pseudocrenilabrus individual and an unknown second parent. To avoid any bias potentially induced by the high levels of heterozygosity, MJB7 was excluded from further analyses.
Interestingly, PC3 of the genome-wide PCA separated males and females from Lake Chila ( fig. 1D). This signal cannot be explained by intralake genetic structure, as males and females share mtDNA haplotypes (table 1 and 1D). One phenotypic male from Lake Chila (MJA8) clustered with the Lake Chila females, suggesting that it is a sex-reversed individual ( fig. 1D). Therefore, MJA8 was also excluded from further analyses.
LG7 Functions as a Sex Chromosome in the Lake Population of P. philander Given the clear-cut separation of males and females in PC3 of the genome-wide PCA ( fig. 1D), we next aimed to identify the genomic region responsible for the differentiation between the sexes. We first calculated genome-wide F ST between males and females within each population and F ST per chromosome. The average genome-wide male-female F ST within the lake population was 0.04 (average male-female F ST excluding LG7: 0.032), whereas the average F ST for LG7 was 0.18 indicating a large region of male-female differentiation on this chromosome ( fig. 2A and supplementary fig. S2, Supplementary Material online). Next, males and females formed distinct clades in a phylogeny on variant data of LG7 only, whereas no such grouping was found when all LGs excluding LG7 were considered ( fig. 3) fig. S4, Supplementary Material online). Finally, we performed a topology weighting analysis, using four different "populations": Lake Chila males, Lake Chila females, Mbulu creek males, and Mbulu creek females. This analysis did not reveal any region where Chila and Mbulu males were more closely related to each other than they were to females (supplementary fig. S6, Supplementary Material online). Therefore, we did not find any evidence for a common sex locus between the two populations.  Male-female F ST and difference in nucleotide diversity between sexes (p diff ¼ p males À p females ) along LG7. Each gray dot represents a single value per 10 kb window. Black line: smoothed value (loess parameter ¼ 0.01) and red line: no difference in nucleotide diversity between males and females.
LG7 Harbors an XY System in the Lake Population In a simple sex-chromosomal system, the heterogametic sex shares half of its sex-chromosomal alleles with the homogametic sex (e.g., X alleles in an XX-XY system and Z alleles in a ZZ-ZW system), whereas Y/W alleles are specific to the heterogametic sex. This results in an expected maximum malefemale F ST of 0.5 for completely sex-differentiated sites (i.e., if the allele frequency for the heterogametic sex is 0.5 and the allele frequency for the homogametic sex is 1, then the expected F ST is 0.5 in an infinite population) (Brelsford et al. 2017;Fontaine et al. 2017;Rodrigues and Dufresnes 2017). Furthermore, the heterogametic sex (XY or ZW) shows an excess of heterozygous sites compared with the homogametic sex, reflected by negative F IS values. Consequently, F ST and F IS show a negative correlation in the heterogametic sex (Rodrigues and Dufresnes 2017).
In Lake Chila P. philander, males had negative F IS values on LG7 (table 1 and supplementary table S5, Supplementary Material online), indicating higher levels of heterozygosity in males. Furthermore, females had higher F IS values on LG7 (0.31-0.45) compared with the rest of the genome excluding LG7 (0.15-0.22), denoting low levels of heterozygosity on LG7 (table 1). Males of the lake population also showed significantly higher nucleotide diversity (p) compared with females ( fig. 2B  and supplementary fig.S7, Supplementary Material online) and a negative correlation between F IS and male-female F ST on LG7 (supplementary fig. S8A, Supplementary Material online), strongly suggesting that males are the heterogametic sex and that LG7 functions as an XX-XY system.
In the Mbulu creek population, males also displayed higher p compared with females (supplementary fig. S7, Supplementary Material online), yet the male-female difference in mean p was much smaller than for males and females of Lake Chila (mean p Lake Chila males: 0.0039; mean p Lake Chila females: 0.0030; mean p Mbulu creek males: 0.00103; and mean p Mbulu creek females: 0.00094). Moreover, individual F IS values did not differ between males and females on LG7. They were higher in both sexes than their corresponding Sex Chromosome Differentiation and the SD Region in P. philander from Lake Chila To further delimit the SD region in Lake Chila fish, we identified sites that showed an XY sex-specific pattern, that is, sites for which all females are homozygous and all males heterozygous. We identified a total of 41,309 XY-patterned sites across the genome, of which the great majority (38,429; 93%) is placed on LG7 (fig. 4A). The XY-sites of P. philander were distributed along the entire chromosome with a slightly higher frequency at $7-12 Mb, (fig. 4B) and less to no sites between 27 and 60 Mb with the exceptions of two peaks between 45 and 55 Mb. This block-like distribution of XYsites might indicate regions of suppressed recombination (e.g., sex chromosome strata) (Lahn and Page 1999), probably caused by chromosomal rearrangements. Alternatively, and probably more likely, these blocks indicate chromosomal rearrangements between P. philander and the used reference genome O. niloticus and hence a difference in sequence order. The distribution of XY-patterned sites suggests that the SD region is located in the first 25 Mb of LG7. Again, we did not observe such a The number of XY-sites in Lake Chila P. philander exceeded that reported for other cichlid sex-chromosomal system. In the Nile tilapia (O. niloticus), for example, LG1 has a 9-Mb large XY SD region, which contains 12,225 such sites (out of 38,718 total sex-differentiated sites) (Conte et al. 2017). In the blue tilapia, Oreochromis aureus, LG3 carries a ZZ-ZW SD system, which shows 24,983 sex-differentiated sites (total differentiated sites in the genome 103,406) (Conte et al. 2017).
As a next step, we functionally annotated the Lake Chila XYsites to investigate the effect of variants on coding sequences. The highest density of nonsynonymous sites with "moderate" or "high effect" (i.e., coding sequence variant, frameshift, missense mutation, insertions, deletions, and inversions) was detected between 22 and 23 Mb of LG7 (supplementary fig.  S10 and supplementary table S6, Supplementary Material online). The 43 "high effect" variants were located in 33 genes. As expected, Lake Chila males were heterozygous and females homozygous for these SNPs and, all Mbulu creek individuals were homozygous, matching the female lake genotype.
To further investigate the extent of sex-chromosomal differentiation and delimit the SD region, we analyzed malefemale differences in sequence coverage along the sex chromosomes. To avoid any potential bias introduced by using the O. niloticus reference genome, we generated a male and a female draft genome assembly for the lake population. In species with heteromorphic XY sex chromosomes, the X chromosome is present in a hemizygous state in males, resulting in $50% reduced sequencing coverage for the X in males compared with the X in females or any autosome. When all read alignments with default mapping parameters in the two sexes were considered, which is the standard approach (e.g., Vicoso et al. 2013), no difference in sequence coverage was visible along the X chromosome ( fig. 5B, coverage follows the expected black line). This indicates that X and Y in P. philander from Lake Chila are at early stages of sex chromosome differentiation. However, when considering only perfect alignments (excluding alignments that contain any mismatch), a drop in male sequence coverage became To further investigate this pattern of sex chromosome differentiation, we built a catalog of 37-bp-long subsequences (k-mers) and counted their presence in the male and female reads ( fig. 6). Although the sex chromosomes of P. philander are certainly much younger and much less differentiated than the one in humans, the k-mer comparison between males and females is similar in these two species (fig. 6). X-linked k-mers are clearly visible in both species as the second largest cloud with higher counts in females than in males ( fig. 6, red circle). We investigated the location of potential X-linked kmers in the female X chromosome assembly which revealed their highest frequency at $12.5 Mb (corresponding to $15.4 Mb on LG7 in the reference genome, supplementary  fig. S11, Supplementary Material online). Combining the analyses of XY-sites, coverage and X-linked k-mers, the SD region of P. philander Lake Chila is likely located at 0.3-16 Mb on LG7.
This region has 518 protein-coding gene annotations in the reference genome assembly. A full overview of these genes with corresponding gene ontologies is provided in supplementary table S7, Supplementary Material online, and genes with a potential role in SD are highlighted in yellow. These include two HMG-domain genes, a protein domain also encoded by the mammalian SD gene Sry (Sinclair et al. 1990), and foxl1 and foxd1, belonging to the forkhead box family of transcription factors, which play a role in ovarian development and function (Ottolenghi et al. 2005;Uhlenhaut and Treier 2011). They further include wt1, which regulates early gonad development in mammals (Wilhelm and Englert 2002).

Two Reference-Free Approaches to Detect Y-Chromosomal Candidates in the Lake Population
In an XX-XY system, Y chromosome-specific sequences are not present in females resulting in zero sequencing coverage of such regions by female sequencing reads. We searched the male de novo genome assembly for regions of male-only coverage of at least 1 kb in length and detected 12 such regions located on 11 different scaffolds. The longest region was 2,124 bp long. When compared with the reference genome, ten of these scaffolds were placed on LG7 (eight within the first 10 Mb of LG7, supporting the analyses above that this is the SD region) and one on the unplaced scaffold NW_017613955.1. A BlastX search of the candidate regions revealed similarities to five coding sequences (the ubiquitinprotein ligase herc3 in the 2,124-bp region, two transposable element related sequences, two uncharacterized proteins) and two ncRNAs (supplementary table S8, Supplementary Material online). In the creek population, all but three of these regions showed sequence coverage in both sexes. These three remaining regions, which included the one with herc3, do apparently not exist in the creek population genomes (supplementary fig. S12, Supplementary Material online).
Although X and Y are clearly differentiating in Lake Chila P. philander, (most of) our analyses revealed a substantial degree of sequence similarity between X and Y and also could not delimit the SD region further than to the first $16 Mb of LG7. Our male de novo genome assembly likely contains a consensus assembly for X/Y haplotypes of LG7. When sequencing a male genome of a diploid XY species, Y-specific sequences will have reduced coverage in comparison to autosomal regions. Also, differentiating Y chromosomes typically accumulate repetitive sequences (Chalopin et al. 2015). These two factors may hamper the reconstruction of Y chromosomes using standard assembly tools (Tomaszkiewicz et al. 2017). To identify sequence information derived from Y-specific male-only regions also potentially missing in the reference genome, we applied a method described by Akagi et al. (2014) that makes use of k-mers. We extracted male-specific k-mers from the above-mentioned k- Counts of 37 bp k-mers in human males and females. Humans have strongly differentiated sex chromosomes. K-mers derived from the Y chromosome are expected to have zero counts in females; k-mers derived from the X chromosome should have half the count in males than in females. Potential Y-k-mers are highlighted with a blue circle, X-mers with a red circle. mer catalog and used reads containing them for a targeted assembly of putative Y-chromosomal contigs. We obtained 138 Y-contigs containing 48 potential genes (supplementary table S9, Supplementary Material online), of which 38 could be functionally annotated. Strikingly, 15 of these genes ($30%) showed strong similarities to transposable elements, suggesting a higher transposable element content on the P. philander Y chromosome than the genome-wide average for cichlids of 16-19% (Brawand et al. 2014), a characteristic feature of sex chromosomes (Chalopin et al. 2015).
Among the other genes, we detected two genes involved in spermatogenesis, psmb2 (Gupta 2005) and kelch10 (Yan et al. 2004). We also recovered one of the uncharacterized proteins that we previously identified in the full de novo male assembly in a region with zero female coverage (uncharacterized protein K02A2.6-like), which functions in nucleic acid and zinc ion binding (supplementary tables S8 and S9, Supplementary Material online). This gene contains a retropepsinlike domain of invertebrate retrotransposons (DeMarco et al. 2005).
LG7 Probably Evolved Twice as a Sex Chromosome in Haplochromine Cichlids LG7 is known to function as XX-XY system in many haplochromine species endemic to Lake Malawi (Ser et al. 2010;Parnell and Streelman 2013;Peterson et al. 2017) and likely represents the ancestral sex chromosome state of the radiation in this lake (Peterson et al. 2017). We therefore aimed to examine whether or not the Lake Malawi SD system corresponds to the one we identified in P. philander of Lake Chila. To this aim, we performed a topology weighting analysis on LG7 to infer if Lake Chila and Lake Malawi males were more closely related to each other compared with the females of their respective population/species in specific genomic regions. If the XX-XY system was ancestral and shared between Lake Malawi cichlids and P. philander, one would expect that the SD locus and closely linked loci that do not recombine between X and Y cluster by sex and not by species in a phylogeny (Stock et al. 2011). We included three species (A. calliptera XX-XY on LG7 [Peterson et al. 2017]; A. stuartgranti and L. lethrinus), as these represent, to the best of our knowledge, the only currently available full-genome data covering both sexes per species in Lake Malawi cichlids. Our analyses indicated no strongly supported region in which Lake Malawi and Lake Chila males were more closely related to each other than to the females of their respective species ( fig. 7 and supplementary table S10, Supplementary Material online). Rather, the species topology was strongly supported for each window size on LG7, as well as for the non sex-linked LG6 (supplementary fig. S13 and supplementary table S10, Supplementary Material online).
An outstanding candidate gene for the SD locus on LG7 is gsdf (gonadal soma-derived factor), which has been described as a master SD gene in several fish species (e.g., Myosho et al. 2012;Rondeau et al. 2013). In agreement with this, Peterson et al. (2017) proposed gsdf as the SD gene of Lake Malawi cichlids. In another study on Lake Malawi cichlids, focusing on Metriaclima zebra and M. mbenji, O'Quin (2014) also reported sex-patterned sites in gsdf. We thus reconstructed a phylogeny for the gsdf locus in the set of Lake Malawi cichlids and P. philander. Again, male sequences of the different species did not group together (supplementary fig. S14, Supplementary Material online). When examining the sequences for individual sites, none of them supported a shared sex pattern (supplementary table S11, Supplementary Material online).
In a previous study on the Lake Malawi cichlids Cynotilapia afra and Pseudotropheus elongates, Parnell and Streelman (2013) detected two distinct XX-XY loci on LG7. We also inspected these XX-XY markers (RAD-tags 27028 and 45045, see figure 4 in Parnell et al. [2012] and Parnell and Streelman [2013]) in P. philander. The marker 45045 was homozygous (C/C) in all individuals of all species and all P. philander, A. stuartgranti, and L. lethrinus individuals were homozygous (C/C) for the marker 27028. The A. calliptera male was heterozygous (C/T) at this site and the A. calliptera female was homozygous (C/C), supporting an XX-XY pattern for this marker only in this species.
Finally, a full LG7 phylogeny including all 24 P. philander individuals and male and female individuals from Lake Malawi provides further support for a young age of P. philander's sex chromosomes, with a divergence time of $423,000 generations for females and $455,000 generations for males in LG7 (supplementary fig. S15 and supplementary table S12, Supplementary Material online). Assuming one generation per year, it is reasonable to conclude that X and Y of P. philander in Lake Chila diverged less than a million years ago, as the 95% confidence interval did not reach 1 Myr.
We also investigated eight additional single gene phylogenies for genes on LG7 with a potential role in sex determination identified as candidate genes in the SD region of P. philander from Lake Chila in this study (supplementary table S7, Supplementary Material online). Similar to gsdf and the topology weighting analysis, these gene trees mostly recovered the species tree (supplementary fig. S16, Supplementary Material online). Two genes showed differing topologies, however, with overall low support and not indicative of a shared sex locus.
LG7 Likely Evolved as a Sex Chromosome within Lake Chila The LG7 system detected in the Lake Chila population likely evolved independently from the one in Lake Malawi cichlids. Furthermore, we could not detect this system in the adjacent and closely related Mbulu creek population. Given the size of our data set used for full-genome sequencing and the question the origin of this XX-XY system, we aimed to test for the presence/absence of the LG7 XX-XY system in additional individuals of the P. philander species complex. We tested 78 individuals belonging to five clades of the P. philander species complex and P. nicholsi (Egger et al. 2015) by PCR for two markers which were Y chromosome linked in P. philander from Lake Chila, namely herc3 (identified as the largest region absent from the female genomes) and K02A2.6-like (also identified as a region with zero female coverage in Lake Chila and over Y-k-mer specific assembly). Within the Lake Chila samples (additional n ¼ 34), herc3 was present in all tested males (15), and 12 males were positive for K02A2.6-like. All but two phenotypic females were negative for the two markers. We can thus largely confirm male sex-linkage of the two markers within Lake Chila and hence the presence of an XX-XY SD system in this population. Our PCR assay also included individuals of the two divergent mtDNA haplotype lineages. However, all populations other than Lake Chila did not show sex linkage for the two markers, which were either present or absent in both sexes (supplementary table S13 and supplementary fig. S17, Supplementary Material online).

Discussion
Cichlid fishes display a breathtaking diversity in basically every phenotypic trait investigated so far including, coloration, morphology, habitat use, breeding systems, or diet (Albertson and Kocher 2006;Sefc 2011;Muschick et al. 2012;Miyagi and Terai 2013;Salzburger 2018) and sex determination is likely another flexible property of this astonishing group of fish. Here, we investigated sex chromosome evolution in a phylogeographically complex species, the haplochromine cichlid P. philander (Egger et al. 2015). We detected an XX-XY system in the Lake Chila P. philander population, whereas this signature was not detectable in the genomes of an adjacent riverine stock. The creek population likely underwent a genetic bottleneck, so it is possible that the apparent absence of any detectable SD system in this population may be due to demographic events. The creek population may have been founded by XX individuals only, or XY recombination resumed in the creek population. However, markers that were male specific in Lake Chila did not show a sex-specific pattern in specimen from six other P. philander populations nor in P. nicholsi. Given the nested placement of the Lake Chila population FIG. 7.-Topology weighting analysis of LG7. Topology weighting analysis using 1-, 5-, and 10-kb windows between the four "populations" Lake Chila males, Lake Chila females, Lake Malawi males, and Lake Malawi females.  S1, Supplementary Material online), the most parsimonious explanation for this pattern is that the XX-XY system evolved or at least differentiated within the Lake Chila population.
In agreement with this scenario, we could also not find support for a shared (ancestral) XX-XY LG7 system between Lake Malawi cichlids, and P. philander from Lake Chila neither in our divergence time estimates nor in a topology weighting analysis nor in single gene phylogenies for candidate genes of sex determination. We therefore propose that LG7 evolved repeatedly (convergently) as a sex chromosome in different lineages of haplochromine cichlids. This would lend further support to the limited options theory, that is, that certain chromosomes are particularly well suited to become sex chromosomes and evolve as such more often than other chromosomes (Marshall Graves and Peichel 2010). Marshall Graves and Peichel proposed that a likely candidate for a "limited option" is the ancestral teleost chromosome TEL6 (Marshall Graves and Peichel 2010). The sex chromosomes of several fish species are derived from TEL6 including those of the medaka Oryzias luzonensis, the sablefish Anoplomba fimbria as well as the guppy Poecilia reticulata (Marshall Graves and Peichel 2010;Myosho et al. 2012;Rondeau et al. 2013). The marker SLC45A2 that Marshall Graves and Peichel (2010) used to identify these sex chromosomes as being syntenic to TEL6 is indeed located on LG7 of cichlids (O. niloticus LG7: 17,405,428,885). Together with our study, LG7/TEL6 has been described three times as a sex chromosome in cichlids, suggesting that TEL6 evolved to become a sex chromosome in at least five lineages of teleost fish (Lake Malawi cichlids Ser et al. 2010, Lake Tanganyika cichlid Hemibates stenosoma Gammerdinger et al. 2018a, P. philander Lake Chila, medaka, guppy, and sablefish) supporting the "limited options" theory. However, other genes on cichlid LG7 are syntenic to TEL 7 (Marshall Graves and Peichel 2010), indicating additional rearrangements of LG7 in cichlids or in the lineage leading towards them. Certainly more data on cichlid sex chromosomes is needed to properly test the "limited options" theory within cichlids.
With our limited data set, we cannot exclude the presence of yet another SD system in the other P. philander populations or that the LG7 XX-XY system was present also in the creek but has secondarily been lost or started to recombine again. In addition, in the genome-wide data of the creek population we failed to detect any other sex-chromosomal system. It is possible that a SD region in this population is too small to be detected with our limited sample size. We can also not exclude that this population might rely on an ESD system or a multifactorial combination of environmental and genetic factors. We also report a sex-reversed individual in Lake Chila, as well as mismatches between phenotypic sex and Y chromosome markers in a PCR genotypic assay. If we exclude any sexing errors, this could mean an occurrence of 6-10% of individuals that also do not underlie the XX-XY system within Lake Chila. It could also mean that the markers we tested by PCR genotyping still recombine and are hence not fully Y-linked. Note that the two markers also differed in their presence-absence pattern with K02A2.6-like showing only two genotype-phenotype mismatches. It might thus be closer to the actual SD locus than herc3. Further (genome-wide) data would be needed to support either of these scenarios.
Still, specimens from the Lake Chila population showed clear signs of sex chromosome differentiation along large sections of LG7, especially in the first 16 Mb. Yet, there were also peaks in male-female F ST , XY sex-patterned sites as well as male-reduced coverage in other regions along LG7. This block-like distribution of signatures of differentiation (especially visible at 45-50 and 53-55 Mb; figs. 2 and 4) might reflect "sex-chromosome strata," which are parts of a chromosome that stopped recombining at different points of time in the past (Lahn and Page 1999). This strata formation can result from chromosomal rearrangements such as inversions, which immediately cause suppression of recombination (Sturtevant 1921). Alternatively and probably more likely, these blocks result from genome rearrangements between the reference genome O. niloticus and P. philander.
We identified several candidate genes for the SD locus in P. philander from Lake Chila based on male-specific sequence features. Among these, the most promising ones were herc3 and the uncharacterized gene K02A2.6-like. We would like to point out that herc3 is also located on the sex chromosomes in another fish, the medaka (Kondo et al. 2006), but is not the master SD gene in this species. We could not find any support for the previously known SD gene gsdf as the master SD locus in P. philander. The dating of the split between X and Y chromosomes in P. philander from Lake Chila to <$1 Myr suggests a similar age as the one proposed for the origin of the XX-XY system on LG7 in Lake Malawi cichlids (Peterson et al. 2017). When compared with the ZZ-ZW sex determination system on LG3 of another cichlid, the blue tilapia O. aureus (Conte et al. 2017), we found that there are more sex-patterned sites in P. philander than in O. aureus. This suggests a higher level of sex chromosome differentiation in P. philander. The LG3 sex chromosome system is ancestral in the Oreochromini lineage (Lee et al. 2004;Cnaani et al. 2008;Cnaani 2013), dating back to the split before O. aureaus and O. niloticus, estimated to $3 Ma (Xiao et al. 2015). Also, our comparison of male-female k-mer compositions in P. philander and humans points to a remarkable level of differentiation of the P. philander sex chromosomes despite their probably young age.
Autosomes are recruited as sex chromosomes and subsequently follow the path of sex chromosome differentiation as in P. philander or the Oreochromini lineage. Demographic events such as lake colonizations or population size fluctuations might impact the patterns of differentiation. Under which conditions a differentiated sex chromosome system represents a selective advantage remains an open question, at least for cichlids. Elegant work on sticklebacks demonstrated that newly evolving sex chromosomes contribute to phenotypic divergence and reproductive isolation between sympatric species, probably facilitating speciation (Yoshida et al. 2014). Whether or not the XX-XY system of the Lake Chila individuals causes this population to be reproductively incompatible with other populations remains to be tested. Taken together, our study highlights the contrast between genomic signatures that fit the canonical view on sex chromosome evolution (recombination suppression and sequence differentiation) and the instability that such systems nevertheless face. Remarkably, we show that sex-chromosomal systems can differ within a single cichlid species, at the level of geographically separated populations (see also Bö hne et al. [2016]), suggesting that demographic events can impact sex chromosome evolution and, vice versa, that changes in SD systems might contribute to diversification.

Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.