Extensive Phylogenomic Discordance and the Complex Evolutionary History of the Neotropical Cat Genus Leopardus

Abstract Even in the genomics era, the phylogeny of Neotropical small felids comprised in the genus Leopardus remains contentious. We used whole-genome resequencing data to construct a time-calibrated consensus phylogeny of this group, quantify phylogenomic discordance, test for interspecies introgression, and assess patterns of genetic diversity and demographic history. We infer that the Leopardus radiation started in the Early Pliocene as an initial speciation burst, followed by another in its subgenus Oncifelis during the Early Pleistocene. Our findings challenge the long-held notion that ocelot (Leopardus pardalis) and margay (L. wiedii) are sister species and instead indicate that margay is most closely related to the enigmatic Andean cat (L. jacobita), whose whole-genome data are reported here for the first time. In addition, we found that the newly sampled Andean tiger cat (L. tigrinus pardinoides) population from Colombia associates closely with Central American tiger cats (L. tigrinus oncilla). Genealogical discordance was largely attributable to incomplete lineage sorting, yet was augmented by strong gene flow between ocelot and the ancestral branch of Oncifelis, as well as between Geoffroy's cat (L. geoffroyi) and southern tiger cat (L. guttulus). Contrasting demographic trajectories have led to disparate levels of current genomic diversity, with a nearly tenfold difference in heterozygosity between Andean cat and ocelot, spanning the entire range of variability found in extant felids. Our analyses improved our understanding of the speciation history and diversity patterns in this felid radiation, and highlight the benefits to phylogenomic inference of embracing the many heterogeneous signals scattered across the genome.

1 SUPPLEMENTARY TEXT 1.1 Mapping to Geoffroy's cat reference assembly We mapped filtered reads of each sample to the Geoffroy's cat (Leopardus geoffroyi) reference assembly (GenBank assembly accession GCA_018350155.1, Bredemeyer et al. in prep).Mapping, base calling and generating a set of GFs proceeded as described for the Canada lynx reference genome (see Materials and Methods).The resulting effective sequencing depths varied between 17-27× (excl.the outgroup sample at 30×).We called bases from the BAM files for each sample to retrieve pseudohaploid consensus genomes, which we masked for the 34% of repetitive content found in the reference genome.The remaining bases in the consensus genomes covered 83-95% of the nonrepetitive part of the reference (Table S4).These genomic FASTAs were split into 24,233 GFs with a length of 100kb.A set of 21,294 GFs (88%) remained after removing all alignments in which at least one sample had more than 60% missing data.In addition to GFs from consensus genomes, we also created a variant call set from the BAM files, calling 3-9 million SNPs per sample, excluding the outgroup samples with 17-21 million SNPs (Table S4).After masking regions with repetitive content in the reference genome, 2-6 million SNPs remained per ingroup sample.

Consensus genomes not masked for repetitive content
To evaluate the consistency of phylogenomic results with and without inclusion of repetitive genomic regions, we redid our phylogenetic analyses on unmasked versions of the pseudohaploid consensus genomes, using the same methodology as described in the Materials and Methods.
Between 81-91% of the reference genome was covered by the unmasked consensus genomes with the Canada lynx reference and 81-95% with the Geoffroy's cat reference.The genomic FASTAs were split in 100kb GFs, with 23,007 GFs (96%) and 23,252 GFs (96%) remaining for the respective references after removing all alignments in which at least one sample had more than 60% missing data.Across phylogenetic results derived from both reference genomes, all but one (ASTRAL using ML tree set with Canada lynx reference) of the summary methods applied to unmasked data supported the topology shown in Figure 2A.

Demographic history of sample 'Pampas cat (Zoo)'
WGS data of the sample 'Pampas cat (Zoo)', as for other samples used in this study sequenced by Li et al. (2019), was sequenced from a cell culture derived from an individual in captivity with unknown provenance.For that reason, until more pampas cat samples with documented morphology and geographic origin become available for sequencing, we cannot allocate the current sample to any of the proposed clades in the pampas cat species complex (Kitchener et al. 2017, Nascimento et al. 2020).The demographic history of our pampas cat sample from Chile (Figure 5), which represents the Central Chilean pampas cat Leopardus colocola colocola according to Kitchener et al. (2017), is very different from that of the captive sample (Figure S17), indicating that each pampas cat group conceivably has its own distinct demographic history.Therefore, we cannot currently know which group is represented by the demography inferred from sample 'Pampas cat (Zoo)'.In addition, its demographic trajectory shows a large and abrupt increase, then decrease in equal measure, in effective population size (Ne) within the most recent estimates.This result could be an artefact of potential out-and inbreeding during breeding programs in captivity, a conclusion partially support by the high fraction of ROHs in the sample.A similar spike in estimates of recent Ne was observed in a puma (Puma concolor) sample with mixed ancestry followed by close inbreeding (Saremi et al. 2019) and in samples of captive lions (Panthera leo) (Armstrong et al. 2020).* In this study we conservatively adhere to the taxonomic scheme published by the IUCN cat specialist group (Kitchener et al. 2017) with the exception of including the subspecies eastern tiger cat (L.tigrinus emiliae), a choice that was made to distinguish this northeast Brazilian population of northern tiger cat (L.tigrinus) from the species' unsampled type population in French Guiana (L.tigrinus tigrinus).It should be noted that, based on morphological, ecological and limited genetic evidence, the eastern tiger cat has been proposed as a full species (L.emiliae) (Nascimento and Feijó 2017) and five distinct species have been proposed for the pampas cat species complex (Nascimento et al. 2020).The latter is a closely related, monophyletic group represented in our study by two samples belonging to one or two of these five taxa.The number of reads retained in the filtered BAM files was counted with BamQC (Babraham Bioinformatics 2014) and divided by the initial number of raw reads to obtain the percentage.The effective sequencing depth was estimated from the filtered BAMs with deepTools (Ramirez et al. 2016).Coverage refers to the percentage of successfully called bases for sites that are present in the reference sequence and is shown for the total reference genome and the non-repetitive (repeatmasked) part of the reference.The number of SNPs per sample in the unmasked SNP set was counted with VariantQC (Yan et al. 2019).Cf. caption of Table S3.Fractions below 1% are not shown.The analysis was conducted with QuIBL (Edelman et al. 2019).

SUPPLEMENTARY TABLES
Table S7: Genome-wide pairwise nucleotide diversity/divergence (/  ) The percentage of base differences are shown for all pairwise whole-genome comparisons.Data below the diagonal (yellow-green color scheme) are calculated from repeatmasked pseudohaploid consensus genomes.Data above the diagonal (yellow-blue color scheme) are calculated from unmasked pseudohaploid consensus genomes and represent, on average, a 5% increase to the repeatmasked data.The highest values are colored yellow while the lowest values are the most saturated green and blue.Canada lynx was used as reference genome.
Table S10: Autosomal heterozygosity and ROHs (Geoffroy's cat reference) Cf. caption of Table S8.Geoffroy's cat was used as a reference genome.The MSMC-IM model was run on coalescence rates for samples 'Andean tiger cat 32451' and 'C.Am. tiger cat' as obtained with MSMC2.We explored 16 different values for the fitting parameter "-beta", centered around the default value of b1=1×10 -8 , b2=1×10 -6 (default result indicated in red and shown in Figure S4C).For each "-beta" setting, the yellow line shows the cumulative distribution function of migration, M(t).The thin, solid line shows the MSMC rCCR and the dashed line the MSMC-IM rCCR.When converted to years, the full range of time points at 50% of the M(t), subject to different values of "-beta", amounts to 81-176 kya.For 25%, this is 44-61 kya, for 75% this is 89-470 kya.Bayesian inference of ROHs along all autosomes in sample 'C.Am. tiger cat'.Regions with a low posterior probability (low on y-axis) are identified as ROHs, equivalent to 10.0% of the autosomal part of the genome in this sample.The green line in the ROH plots denotes inference from the point estimates of local heterozygosity rates, the magenta and red lines denote inference from the upper resp.lower bounds of the heterozygosity estimates.Reads were mapped to the congeneric Geoffroy's cat reference.Demographic history of each Leopardus sample inferred with MSMC2 (Wang et al. 2020).The x-axis shows the coalescent time transformed to years, using a mutation rate μ = 8.6 × 10 -9 and a generation time of 3.8 years (Wang et al. 2022).The y-axis shows the effective population size (Ne).For each sample, dotted lines show 20 bootstrap replicates and the consensus of the replicates is shown as a bold line.

Figure S1 :
Figure S1: Summary trees inferred from unmasked genomes mapped to Canada lynx reference A) Greedy consensus of 23,007 local NJ trees, with node frequency support.B) Greedy consensus of 23.007 local ML trees, with node frequency support.C) ASTRAL consensus of the NJ tree set, with local PP support.D) ASTRAL consensus of the ML tree set, with local PP support.E) Bootstrapped NJ tree from the genome-wide distance matrix, which is the sum total of all local distance matrices.F) Topology obtained in A-C and E. G) Topology obtained in D.

Figure S2 :
Figure S2: Summary trees inferred from repeatmasked genomes mapped to Geoffroy's cat reference A) Greedy consensus of 21,028 local NJ trees, with node frequency support.B) Greedy consensus of 21,028 local ML trees, with node frequency support.C) ASTRAL consensus of the NJ tree set, with local PP support.D) ASTRAL consensus of the ML tree set, with local PP support.E) Bootstrapped NJ tree from the genome-wide distance matrix, which is the sum total of all local distance matrices.F) Overall topology obtained in A-E.G) Geoffroy's cat position in A-D.H) Geoffroy's cat position in E.

Figure S3 :
Figure S3: Summary trees inferred from unmasked genomes mapped to Geoffroy's cat reference A) Greedy consensus of 23,136 local NJ trees, with node frequency support.B) Greedy consensus of 23,136 local ML trees, with node frequency support.C) ASTRAL consensus of the NJ tree set, with local PP support.D) ASTRAL consensus of the ML tree set, with local PP support.E) NJ tree from the genome-wide distance matrix, which is the sum total of all local distance matrices.F) Topology obtained in A-E.

Figure S4 :
FigureS4: Estimates of the time of divergence between Andean tiger cat and C. Am. tiger cat A) Demographic history derived from samples 'Andean tiger cat 32451' and 'C.Am. tiger cat' with MSMC2, shown as effective population sizes over the last 2 million years.Samples were mapped to the Geoffroy's cat reference.B) Relative cross-coalescent rate (rCCR) between both samples, computed with MSMC2.Conventionally, 50% of the maximum rCCR is taken to be the time of divergence between the two populations, which here corresponds to 157 kya.C) Isolation-Migration model applied to coalescence rates with MSMC-IM, showing the inferred cumulative distribution function of migration, M(t).Dotted blue lines represent 25%, 50% and 75% probability that populations are completely admixed.The 50% probability value is the best estimate of divergence time, with the 25-75% interval used to express uncertainty.The divergence time is estimated at 171 (61-453) kya.

Figure S5 :
Figure S5: Fitting of the Isolation-Migration model with MSMC-IM

Figure S6 :
Figure S6: Branch Attachment Frequencies in the ML tree set Branch Attachment Frequencies (BAF) for a selection of samples (one representative for each species-level population) in a set of 16.186 local ML trees.The BAF indicates the frequency with which a given sample (color-coded) is found attached to a given branch, in the set of local trees.BAFs <1% are not shown.Samples were mapped to the Canada lynx reference and repeatmasked.

Figure S7 :
Figure S7: Implicit consensus network of the NJ tree set Implicit network generated from a random subset of 3,268 local NJ trees (20% of the full set) with the Consensus Network algorithm in SplitsTree5 (Huson and Bryant 2006) using median edge weights and a minimum threshold of 20% for inclusion of edges.Samples were mapped to the Canada lynx reference and repeatmasked.

Figure S8 :
Figure S8: Pairwise average f4-ratios, with the Canada lynx referenceGenome-wide pairwise f4-ratios, computed from the SNP data set.Colors correspond to statistical support (light to dark) and magnitude (blue to red) of the f4-ratio.Samples were mapped to the Canada lynx reference and repeatmasked.

Figure S9 :
Figure S9: Pairwise average Patterson's D, with the Geoffroy's cat reference Genome-wide pairwise Patterson's D, calculated from ABBA/BABA site patterns in the SNP data set.Colors correspond to statistical support (light to dark) and magnitude (blue to red) of the Dstatistic.Samples were mapped to the Geoffroy's cat reference and repeatmasked.

Figure S10 :
Figure S10: Pairwise average f4-ratios, with the Geoffroy's cat referenceGenome-wide pairwise f4-ratios, computed from the SNP data set.Colors correspond to statistical support (light to dark) and magnitude (blue to red) of the f4-ratio.Samples were mapped to the Geoffroy's cat reference and repeatmasked.

Figure S11 :
Figure S11: Pairwise f-branch-statistics, with the Geoffroy's cat reference Pairwise f-branch (fb) statistic as a measure of the fraction of introgression between extant and ancestral populations or species, inferred from the SNP data set.Samples were mapped to the Geoffroy's cat reference and repeatmasked.

Figure S12 :
FigureS12: Slope heuristic and phylogenetic networks, with the Canada lynx reference A) Slope of the negative log-pseudolikelihood of the optimal network under different numbers of hybrid edges (0-3).B-E) Optimal phylogenetic network after 50 iterations, constrained to 0, 1, 2 or 3 hybrid edges.Samples were mapped to the Canada lynx reference and repeatmasked.

Figure S13 :
Figure S13: Slope heuristic and selected phylogenetic network, with the Geoffroy's cat reference Optimal phylogenetic network inferred with PhyloNetworks (Solís-Lemus et al. 2017) under a maximum of 2 hybrid edges, using a single representative sample per (sub)species and the ML tree set as input.

Figure S14 :
Figure S14: Principal Component Analysis (PCA), with the Canada lynx reference (A) PCA with all Leopardus samples; (B) PCA with a subset comprising samples from the subgenera Leopardus and Lynchailurus; and (C) PCA with a subset comprising samples from the subgenus Oncifelis.The Canada lynx was used as a reference genome.

Figure S15 :
Figure S15: Runs of homozygosity in all autosomes of sample 'C.Am. tiger cat'.

Figure S16 :
Figure S16: Runs of homozygosity in all autosomes of sample 'Andean cat'.Bayesian inference of ROHs along all autosomes in sample 'Andean cat', equivalent to 0.8% of the autosomal part of the genome in this sample.Caption cfr. Figure S15.

Table S1 :
Samples used in this study

Table S2 :
Sequencing detailsRaw read sequencing depth was calculated as (read length x number of raw reads)/length of the reference genome.Number of duplicate reads, total GC content and the Q30 statistic were calculated with FastQC (Babraham Bioinformatics 2005).

Table S6 :
Fractions of introgression inferred with QuIBL