Genomic Reconstruction of the Successful Establishment of a Feralized Bovine Population on the Subantarctic Island of Amsterdam

Abstract The feral cattle of the subantarctic island of Amsterdam provide an outstanding case study of a large mammalian population that was established by a handful of founders and thrived within a few generations in a seemingly inhospitable environment. Here, we investigated the genetic history and composition of this population using genotyping and sequencing data. Our inference showed an intense but brief founding bottleneck around the late 19th century and revealed contributions from European taurine and Indian Ocean Zebu in the founder ancestry. Comparative analysis of whole-genome sequences further revealed a moderate reduction in genetic diversity despite high levels of inbreeding. The brief and intense bottleneck was associated with high levels of drift, a flattening of the site frequency spectrum and a slight relaxation of purifying selection on mildly deleterious variants. Unlike some populations that have experienced prolonged reductions in effective population size, we did not observe any significant purging of highly deleterious variants. Interestingly, the population’s success in the harsh environment can be attributed to preadaptation from their European taurine ancestry, suggesting no strong bioclimatic challenge, and also contradicting evidence for insular dwarfism. Genome scan for footprints of selection uncovered a majority of candidate genes related to nervous system function, likely reflecting rapid feralization driven by behavioral changes and complex social restructuring. The Amsterdam Island cattle offers valuable insights into rapid population establishment, feralization, and genetic adaptation in challenging environments. It also sheds light on the unique genetic legacies of feral populations, raising ethical questions according to conservation efforts.


ID
Table S1: Description of the individual sample from the bovine population of the Amsterdam island.Birth year were approximated from individual dental age estimation.All genotyped data (including the new ones) are available from the WIDDE repository (Sempéré et al., 2015) and newly generated whole genome sequence data are publicly available in the NCBI SRA repository (PRJNA1010533 project).35 (Gautier et al., 2009) Table S2: Sample description with country of origin and population type (i.e.EUT=European Taurines, AFT=African Taurines, AFZ=African Zebus, ZEB=Zebus of Indian origin, or HYB=Hybrids); the country of sampling; the GPS coordinate (in decimal degrees) used to retrieve environmental covariates; the number of individuals genotyped on the BovineSNP50 Illumina assay and included in the W50K dataset; and the number of individuals sequenced (WGS) with associated reference.GPS coordinates specifying the environment of each population were taken as the birthplace of the breed including for GIR and NEL in Brazil (Gautier et al., 2010).Individuals from the Jersey (EUT) breed were divided into two distinct groups corresponding to i) 21 individuals (coded JE2) sampled in northwestern France (Maine-et-Loir county) and born between 1982 and 1990 (Gautier et al., 2010); and ii) 28 individuals (coded JER) sampled in the USA (Matukumalli et al., 2009).The newly genotyped (n=49) and sequenced (n=24) individuals are highlighted in bold.All genotyped data (including the new ones) are available from the WIDDE repository (Sempéré et al., 2015) and the newly generated whole genome sequence data are publicly available in the NCBI SRA repository (project PRJNA1010533).
Pop.      et al., 2010).The names of the 12 (resp.6) individuals collected in 1992 (resp. 2006) are colored in blue (resp.light blue).The eight WGS individuals (all sampled in 1992) are highlighted in bold, and the names of the ten individuals deemed unrelated (up to the second degree) followed by an asterisk ( ⋆ ) for identification.Among the 66 pairs of 1992 samples, the estimated kinship coefficients ranged from -0.039 to 0.26 (median of 0.014).The one between TAF 5609 (born in 1985) and TAF 5608 (born in 1989) was within [0.177, 0.354], suggesting a first degree relationship inferred to be a parent/offspring relationship from IBD-segment sharing (here mother/daughter from the birthdates and sex).The four coefficients between TAF 5608 and TAF 5612 (ϕ = 0.144); TAF 5610 and TAF 5617 (ϕ = 0.131); TAF 5609 and TAF 5619 (ϕ = 0.101) and TAF 5608 and TAF 5610 (ϕ = 0.0887) lie within [0.0884, 0.177] suggesting a second degree relationship.Analysis of IBD-segment sharing allowed upgrading the relationship between TAF 5608 and TAF 5612 to a full-sib relationship, birthdates suggesting halfsibs or aunt-niece relationships for the three other ones.The 61 remaining coefficients between the 1992 individuals were all < 0.0884 (relationship more distant than second-degree).Likewise, among the 15 pairs of 2006 samples, the estimated kinship coefficients were < 0.0884, ranging from -0.035 to 0.074 (median of 0.013); as the 72 pairwise comparisons between individuals collected in 1992 and 2006 which ranged from -0.091 to 0.062 (median of -0.0081).The TAF/PMT pairs have the lowest pairwise-F S T compared to others, likely because the PMT (EUT) breed has a small amount of ZEB ancestry, similar to the TAF (see Figure S3 and (Gautier et al., 2010).A tree consisting of two ZEB (GIR and NEL), one AFT (NDA) and two distantly related EUT breeds (JE2 and TAR) was first used as a scaffold to which the SHK (East-African AFZ) and ZMA were jointly added to represent the history of Indian Ocean Zebus (Magnier et al., 2022).The resulting best fitting graph is plotted in A) and was in agreement with results obtained by (Magnier et al., 2022)   F 4 (EUT a , AFT ; X, ZEB) = αe a + γ Hence, the EUT population EUT a , the closest to the European source of X (eu X ) is expected to display the highest f 4 value for the (EUT, AFT ; X, ZEB) configurations.In addition, F 4 (EUT b , ZEB; Z MA, EUT a) = (z 1 + b)(β − 1) − e = τ F 4 (EUT b , ZEB; Z MA, X) = ατ The F 4 -ratio= F 4 (EUT b ,ZEB;Z MA,X) F 4 (EUT b ,ZEB;Z MA,EUT a) then provides an estimator of the European source admixture rate α.

Figure S1 :
FigureS1: Heatmap of the kinship matrix among the 18 TAF individuals estimated with King(Manichaikul et al., 2010).The names of the 12 (resp.6) individuals collected in 1992 (resp. 2006) are colored in blue (resp.light blue).The eight WGS individuals (all sampled in 1992) are highlighted in bold, and the names of the ten individuals deemed unrelated (up to the second degree) followed by an asterisk ( ⋆ ) for identification.Among the 66 pairs of 1992 samples, the estimated kinship coefficients ranged from -0.039 to 0.26 (median of 0.014).The one between TAF 5609 (born in 1985) and TAF 5608 (born in 1989) was within [0.177, 0.354], suggesting a first degree relationship inferred to be a parent/offspring relationship from IBD-segment sharing (here mother/daughter from the birthdates and sex).The four coefficients between TAF 5608 and TAF 5612 (ϕ = 0.144); TAF 5610 and TAF 5617 (ϕ = 0.131); TAF 5609 and TAF 5619 (ϕ = 0.101) and TAF 5608 and TAF 5610 (ϕ = 0.0887) lie within [0.0884, 0.177] suggesting a second degree relationship.Analysis of IBD-segment sharing allowed upgrading the relationship between TAF 5608 and TAF 5612 to a full-sib relationship, birthdates suggesting halfsibs or aunt-niece relationships for the three other ones.The 61 remaining coefficients between the 1992 individuals were all < 0.0884 (relationship more distant than second-degree).Likewise, among the 15 pairs of 2006 samples, the estimated kinship coefficients were < 0.0884, ranging from -0.035 to 0.074 (median of 0.013); as the 72 pairwise comparisons between individuals collected in 1992 and 2006 which ranged from -0.091 to 0.062 (median of -0.0081).

Figure S2 :
FigureS2: Heatmap of the pairwise-population F S T values between all the 32 populations estimated from genotyping data on the 40,426 autosomal SNPs.The estimated pairwise F S T values range from 0.0247 for the MAY/ZMA pair, up to 0.472 for the LAG/NEL pair, with a median of 0.203.For pairs including newly genotyped sample MOK, F S T values range from 0.0578 (MOK/ZMA) to 0.303 (MOK/LAG).For pairs including TAF, values range from 0.208 (TAF/PMT) to 0.436 (TAF/NEL).The TAF/PMT pairs have the lowest pairwise-F S T compared to others, likely because the PMT (EUT) breed has a small amount of ZEB ancestry, similar to the TAF (see FigureS3and(Gautier et al., 2010).

Figure S4 :
Figure S4: Neighbor-Joining tree relating the 876 individuals from the 32 populations based on allele-sharing distance computed on the 40,426 autosomal SNPs.Edges are colored according to the population group (top legend) and tip individual names according to the population of origin (bottom legend).
Figure S5: Admixture graph construction based on the W50K dataset with the R package poolfstat (Gautier et al., 2022).
using BovineHD data.Yet, the positioning of the ancestral source population of SHK that is related to the ZEB ancestor of GIR and NEL (i.e., either on the branch leading to GIR as in A) or on the branch leading to NEL or ancestral to ZEB) lead to graph with closely related BIC.B) Best fitting admixture graph among all possible ways of positioning TAF onto the scaffold graph obtained in A. This graph showing that the TAF population has an admixed origin with two sources related to JE2 (α = 75%) and ZMA (1 − α = 25%) had a very high support (BIC 10.9 units lower than the graph with the second lowest BIC) and good fit to the observed f -statistics (the worst fitted f-stats associated |Z| = 2.23).C) Best fitting admixture graph among all possible ways of positioning MOK onto the scaffold graph obtained in A. This graph showing that the MOK population has an admixed origin with two sources, JE2 (α = 22%) and ZMA (1 − α = 78%), had good support (BIC 4.2 units lower than the graph with the second lowest BIC) but less optimal fit to the observed f -statistics (the worst fitted f-stats associated |Z| = 4.95).Note that the worst fitted f -statistics ( f 4 (MOK, ZMA; NEL, S HK)) suggests that this graph fails to capture some direct ZEB (represented by NEL) ancestry to the MOK.Such ancestry would indeed contribute positively to f 4 (MOK, ZMA; NEL, S HK) making the observed f -statistics higher than the fitted f -statistics (and the associated Z-score negative).Such a three-way admixture origin (EUTxZMAxZEB) of the MOK would be consistent with historical record.D) Best fitting admixture graph among all possible ways of jointly positioning TAF and MOK onto the scaffold graph obtained in A. This graph is in agreement with B) and C) and suggests that MOK might not be viewed as the closest proxy to the source populations from which TAF originates although MOK and TAF share closely related ancestral EUT sources (the relative positioning of which led to graphs with close BIC).

Figure S6 :
FigureS6: Schematic representation of the expected F 4 for various configuration under the (simplified) scenario associated to the inferred admixture graph describing the origin of an admixed population X (e.g., X=TAF or X=MOK) with one source related to one EUT and the other to the ZMA which is itself admixed.Note the ZEBxAFT admixed origin of one of the two ZMA sources can be disregarded for our configurations of interest.Under this scenario:F 3 (X; Z MA, EUT b ) = τ X − β(1 − β) (1 − α) 2 (z 1 + b) + α 2 a 1 + a 2 + e = γ F 3 (X; Z MA, EUT a ) = γ − β(1 − β)e aHence, the EUT population EUT a , the closest to the European source of X (eu X ) is expected to display the lowest f 3 value for the (X; Z MA, EUT ) configurations.Likewise, F 4 (EUT b , AFT ; X, ZEB) = (αe − (1 − α)βa 1 ) = γ

Figure S9 :
Figure S7: Partitioning of inbreeding levels in all TAF individuals using 50K SNP genotyping data (W50K dataset).A) Partitioning of the inbreeding levels into 6 different HBD classes for each individual separated by sampling campaign.B) Overall distribution of inbreeding levels among all individuals for the two sampling campaign (1992 and 2006).

Figure S10 :
Figure S10: Manhattan plot over the bovine genome of the local scores derived from the within TAF iHS and the three pairwise Rsb statistics used to identify footprints of selection.The horizontal dashed lines indicate the chromosome specific 1% P-value threshold on the local score that was apply to select the significant windows.21

Table S3 :
Within population genetic diversity estimated on the W50K data set.The table report for each population i) the percentage of monomorphic SNPs; ii) the average heterozygosity estimated with poolfstat(Gautier et al.,

Table S4 :
Description of the whole genome sequences, including the 24 newly generated ones, used in the study.Mean coverage for autosomes and X chromosome were estimated with Samtools (Li et al., 2009)ram(Li et al., 2009)after read duplicates removal.

Table S5 :
Estimated F 3 values in increasing order for all (TAF;ZMA,EUT) configurations where.The EUT population the closest to the original European source population of the TAF is expected to display the lowest value (see FigureS6).

Table S6 :
F 4 estimated values for all (EUT,NDA;TAF,GIR) configurations where NDA and GIR are representative of AFT and ZEB ancestry.The EUT population the closest to the original European source population of the TAF is expected to display the highest value (see FigureS6).

Table S7 :
Number of segregating sites observed in the different populations in WGS data.The table gives the total number of variant sites and for each functional classes (S=Synonymous; NS=Non-Synonymous; Tol.NS= Tolerated NS; Del NS=Deleterious NS; and LoF=Loss-of-Function).For the four functional classes, the percentage with respect to the overall number of sites and the proportions relative to that observed in TAF population are given in parentheses.

Table S8 :
Number of fixed sites observed in the different populations in WGS data.Sites with non-polarized variants or found monomorphic over all samples were discarded from the counts.The table gives the total number of fixed sites and for each functional classes (S=Synonymous; NS=Non-Synonymous; Tol.NS= Tolerated NS; Del NS=Deleterious NS; and LoF=Loss-of-Function).For the five functional classes, the percentage with respect to the overall number of sites and the proportions relative to that observed in TAF population are given parentheses.

Table S9 :
Description of the regions harboring footprints of selection based on the Rsb JER/Z MA tests.Tests were carried unilaterally to provide insights into the origin of the signals.The two regions that overlap with those identified in Table1highlighted with a ⋆ .

Table S10 :
List of significant diseases and functions associated with the 17 candidate genes detected under selection in the cattle population from Amsterdam island.Results were generated with Ingenuity Pathway Analysis(Ingenuity Systems, Inc., 2023)