Abstract

Background

Baboons are a widely used nonhuman primate model for biomedical, evolutionary, and basic genetics research. Despite this importance, the genomic resources for baboons are limited. In particular, the current baboon reference genome Panu_3.0 is a highly fragmented, reference-guided (i.e., not fully de novo) assembly, and its poor quality inhibits our ability to conduct downstream genomic analyses.

Findings

Here we present a de novo genome assembly of the olive baboon (Papio anubis) that uses data from several recently developed single-molecule technologies. Our assembly, Panubis1.0, has an N50 contig size of ∼1.46 Mb (as opposed to 139 kb for Panu_3.0) and has single scaffolds that span each of the 20 autosomes and the X chromosome.

Conclusions

We highlight multiple lines of evidence (including Bionano Genomics data, pedigree linkage information, and linkage disequilibrium data) suggesting that there are several large assembly errors in Panu_3.0, which have been corrected in Panubis1.0.

Data Description

Introduction

Baboons are ground-living monkeys native to Africa and the Arabian Peninsula. Owing to their relatively large size, abundance, and omnivorous diet, baboons have increasingly become a major biomedical model system (reviewed in [1]). Baboon research has been facilitated by the creation (in 1960) and maintenance of a large, pedigreed, well-phenotyped baboon colony at the Southwest National Primate Research Center (SNPRC) and an ability to control the environment of subjects in ways that are obviously not possible in human biomedical studies. For example, baboons have been used to study the effect of diet on cholesterol and triglyceride levels in experiments where all food consumption is completely controlled [2–4]. In recent years, linkage studies in baboons have helped identify genetic regions affecting a wide range of phenotypes, such as cholesterol levels [5, 6], estrogen levels [7], craniofacial measurements [8], bone density [9, 10], and lipoprotein metabolism [11]. In addition, studies have also documented that the genetic architecture of complex traits in baboons can be directly informative about analogous traits in humans (e.g., [10, 12]). In parallel, baboons have been widely used in studies of animal behavior and evolution. For example, the Amboseli Baboon Research Project has studied wild baboon troops continuously since 1971 and produced ∼300 scientific publications, including the first study of whole-genome sequence data in baboons [13].

The success of these and other studies has been mediated in part by recent advances in molecular genetics technologies. In particular, the ability to cheaply genotype and/or sequence samples of interest has led to a revolution in genetic studies of the associations between genotype and phenotype. While human genetic studies now routinely include the analyses of whole-genome sequence data from many thousands of samples (e.g., [14–18]), comparable studies in model organisms have lagged far behind. Part of the reason for this is the lack of genetic resources in non-human species. Large international projects such as the Human Genome Project [19, 20], International HapMap Project [21–23], and the 1000 Genomes Project [24–26] have provided baseline information on sequences and genetic variation, and subsequent human genetic studies have used this background information.

The first published baboon genome assembly was from a yellow baboon [13]. This assembly used a combination of Illumina paired-end and Illumina mate-pair sequence data (with mean library insert sizes ranging from 175 to 14 kb) to produce a highly fragmented assembly with contig N50 of 29 kb and scaffold N50 of 887 kb. The public olive baboon assembly, Panu_3.0, has the same problem of small contigs and scaffolds (contig N50 of 139 kb and de novo scaffold N50 of 586 kb) [27]. The authors of the public olive baboon assembly chose to distribute a reference-guided assembly with scaffolds mapped onto rhesus (Macaca mulatta) chromosomes. As a consequence, any syntenic differences between rhesus and baboon will result in large-scale assembly errors in Panu_3.0. One additional drawback of this baboon genome assembly was its informal embargo from 2008 to 2019 under the guidelines of the Fort Lauderdale agreement. Hence, its influence on scientific research has been negligible.

In this project, we focus on providing a high-quality, de novo genome assembly for olive baboon (Papio anubis, NCBI:txid9555), which we call Panubis1.0, with the hope that this resource will enable future high-resolution genotype-phenotype studies. Unlike previous baboon genome assembly efforts, we use a combination of 3 recently developed technologies (from 10x Genomics linked reads, Oxford Nanopore long reads, and Hi-C) to increase the long-range contiguity of our assembly. These newly developed technologies enable us to generate assemblies in which the autosomes (and the X chromosome) are each spanned by a single scaffold at a cost that is orders of magnitude cheaper than the Panu_3.0 assembly. We also verify that many of the large-scale syntenic differences between our Panubis1.0 and Panu_3.0 are due to errors in the public assembly rather than our own. Our assembly is available for scientific use without any restrictions.

Genome sequencing

Index animal

We used individual number 15,944 (currently deceased) from the SNPRC pedigreed baboon colony for all of the sequencing and genome assembly work associated with this project.

10x Genomics sequencing

High molecular weight genomic DNA extraction, sample indexing, and generation of partition barcoded libraries were performed according to the 10x Genomics (Pleasanton, CA, USA) Chromium Genome User Guide and as published previously ([28]). A mean depth of ∼60× was produced and analyzed for this project.

Oxford Nanopore sequencing

Libraries for the Oxford Nanopore sequencing were constructed as described previously ([29]) using DNA derived from whole blood. The sequencing was conducted at Genentech, Inc. (South San Francisco, CA, USA); we analyzed data with a mean depth of ∼15× for this project.

Bionano optical maps

High molecular weight DNA was extracted, nicked, and labeled using the enzyme Nt.BspQI (New England Biolabs, Ipswich, MA, USA) and imaged using the Bionano Genomics Irys system (San Diego, CA, USA) to generate single-molecule maps for assessing breaks in synteny between Panu_3.0 and Panubis1.0.

Hi-C sequencing

High molecular weight DNA from Jenny Tung (Duke University) was sent to Phase Genomics. We obtained ∼15× Hi-C data using previously described techniques [30].

Genome assembly

The main strength of our approach is in combining data from multiple platforms (10x Genomics linked reads, Oxford Nanopore long reads, Illumina paired-end short reads, and Hi-C), which have complementary advantages. Fig. 1 describes our assembly strategy. We began by assembling 10x Genomics reads generated with their Chromium system (mean depth ∼60×) using the Supernova assembler (Supernova assembler, RRID:SCR_016756) version 1.1, default parameters [28], which yielded an assembly with a contig N50 of ∼84 kb and a scaffold N50 of ∼15.7 Mb (Table 1). The gap lengths between the contigs in a scaffold obtained by assembling 10x linked reads are arbitrary [31]. Hence, to leverage the Oxford Nanopore long reads for gap closing, we split the 10x scaffolds at every stretch of non-zero N's to obtain a collection of contigs.

Illustration of our genome assembly strategy.
Figure 1:

Illustration of our genome assembly strategy.

Table 1:

Assembly statistics for each step of the adopted assembly strategy

Assembly10x10x Contigs10x Contigs + Nanopore scaffolding10x Contigs + Nanopore Scaffolding + Nanopore gap filling10x Contigs + Nanopore scaffolding + Nanopore gap filling + Illumina polishingPanubis1.0Panu_3.0
Total length of scaffolds2,892,554,2202,809,352,2552,871,292,5572,871,210,9252,870,847,1622,869,821,1632,959,373,024
No. of scaffolds24,51387,63215,80315,80315,80311,14563,235
Scaffold N5015,720,19584,2581,695,5731,695,7721,695,642140,274,886585,721
Total gap length83,203,960050,344,0342,030,9082,030,9082,321,98322,434,732
Total length of contigs2,809,350,2602,809,352,2552,820,948,5232,869,180,0172,868,816,2542,867,510,3252,937,001,527
No. of contigs87,34787,63262,25217,00417,00415,243122,216
Contig N5084,25884,258134,2221,469,7601,469,6021,455,705138,819
Assembly10x10x Contigs10x Contigs + Nanopore scaffolding10x Contigs + Nanopore Scaffolding + Nanopore gap filling10x Contigs + Nanopore scaffolding + Nanopore gap filling + Illumina polishingPanubis1.0Panu_3.0
Total length of scaffolds2,892,554,2202,809,352,2552,871,292,5572,871,210,9252,870,847,1622,869,821,1632,959,373,024
No. of scaffolds24,51387,63215,80315,80315,80311,14563,235
Scaffold N5015,720,19584,2581,695,5731,695,7721,695,642140,274,886585,721
Total gap length83,203,960050,344,0342,030,9082,030,9082,321,98322,434,732
Total length of contigs2,809,350,2602,809,352,2552,820,948,5232,869,180,0172,868,816,2542,867,510,3252,937,001,527
No. of contigs87,34787,63262,25217,00417,00415,243122,216
Contig N5084,25884,258134,2221,469,7601,469,6021,455,705138,819

Total length of scaffolds is the sum of lengths of scaffolds (including A, C, G, T and N) in each scaffold. Total gap length is the total number of N's in the assembly. Contigs are constructed by splitting the assembly at every stretch of ≥1 N. The total length of contigs is the sum of the number of sequenced base pairs (including only A, C, G, and T) in each scaffold.

Table 1:

Assembly statistics for each step of the adopted assembly strategy

Assembly10x10x Contigs10x Contigs + Nanopore scaffolding10x Contigs + Nanopore Scaffolding + Nanopore gap filling10x Contigs + Nanopore scaffolding + Nanopore gap filling + Illumina polishingPanubis1.0Panu_3.0
Total length of scaffolds2,892,554,2202,809,352,2552,871,292,5572,871,210,9252,870,847,1622,869,821,1632,959,373,024
No. of scaffolds24,51387,63215,80315,80315,80311,14563,235
Scaffold N5015,720,19584,2581,695,5731,695,7721,695,642140,274,886585,721
Total gap length83,203,960050,344,0342,030,9082,030,9082,321,98322,434,732
Total length of contigs2,809,350,2602,809,352,2552,820,948,5232,869,180,0172,868,816,2542,867,510,3252,937,001,527
No. of contigs87,34787,63262,25217,00417,00415,243122,216
Contig N5084,25884,258134,2221,469,7601,469,6021,455,705138,819
Assembly10x10x Contigs10x Contigs + Nanopore scaffolding10x Contigs + Nanopore Scaffolding + Nanopore gap filling10x Contigs + Nanopore scaffolding + Nanopore gap filling + Illumina polishingPanubis1.0Panu_3.0
Total length of scaffolds2,892,554,2202,809,352,2552,871,292,5572,871,210,9252,870,847,1622,869,821,1632,959,373,024
No. of scaffolds24,51387,63215,80315,80315,80311,14563,235
Scaffold N5015,720,19584,2581,695,5731,695,7721,695,642140,274,886585,721
Total gap length83,203,960050,344,0342,030,9082,030,9082,321,98322,434,732
Total length of contigs2,809,350,2602,809,352,2552,820,948,5232,869,180,0172,868,816,2542,867,510,3252,937,001,527
No. of contigs87,34787,63262,25217,00417,00415,243122,216
Contig N5084,25884,258134,2221,469,7601,469,6021,455,705138,819

Total length of scaffolds is the sum of lengths of scaffolds (including A, C, G, T and N) in each scaffold. Total gap length is the total number of N's in the assembly. Contigs are constructed by splitting the assembly at every stretch of ≥1 N. The total length of contigs is the sum of the number of sequenced base pairs (including only A, C, G, and T) in each scaffold.

We scaffolded the resulting contigs with Oxford Nanopore long reads (mean depth ∼15×) using the LR_Scaf (version 1.1.4, default parameters) [32] scaffolding method. (In accordance with the Canu assembler documentation [33], we did not have a sufficient depth of coverage to perform de novo assembly directly from the Nanopore reads.) This resulted in an assembly with a contig N50 of ∼134 kb and a scaffold N50 of ∼1.69 Mb (Table 1). These resulting scaffolds are more amenable to gap closing because the gap lengths (number of N's between 2 consecutive contigs) are estimated by long reads that span each gap and align to the flanking regions of that gap.

Upon performing gap closing with the same set of Oxford Nanopore long reads using LR_Gapcloser (v1.1, default parameters) [34], we obtained an assembly with a contig N50 of ∼1.47 Mb and a scaffold N50 of ∼1.69 Mb (Table 1). Note that this increase in contig N50 of ∼84 kb from the 10x Genomics linked-read assembly, to a contig N50 of ∼1.47 Mb, would not have been possible if we had simply performed gap closing with the Oxford Nanopore long reads directly on the 10x-based assembly without first splitting it into its constituent contigs. Finally, we polished the resulting assembly by aligning Illumina paired-end reads (mean depth ∼60× in PE150 reads) using Pilon (Pilon, RRID:SCR_014731) version 1.22, default parameters [35].

To scaffold the resulting assembly with Hi-C data, we first set aside scaffolds shorter than 50 kb, which made up only ∼1.8% of the total sequence base pairs. This was done because Hi-C–based scaffolding is more reliable for longer scaffolds because there are more Hi-C reads aligning to longer scaffolds. We then ordered and oriented the remaining scaffolds using the 3D de novo assembly (3d-dna) pipeline (3D de novo assembly, RRID:SCR_017227) version 180,419, default parameters [36] using ∼15× Hi-C data generated by Phase Genomics [37]. Finally, we manually corrected misassemblies in the resulting Hi-C–based assembly by visualizing the Hi-C reads aligned to the assembly, using Juicebox Assembly Tools (version 1.6.11) [38], following the strategy described in [39]. Fig. 2 shows Hi-C reads aligned to the resulting assembly, with the blue squares on the diagonal representing chromosomes.

Hi-C map of our Panubis1.0 genome. The figure represents the Hi-C map obtained by aligning Hi-C paired-end reads to the Panubis1.0 genome assembly laid out on the X-axis as well as the Y-axis. Because each read-pair consists of 2 reads, a position (i, j) on this map represents the number of read-pairs where one read aligned to position i and the other read aligned to position j on the Panubis1.0 genome. The intensity of each pixel in this Hi-C map represents the number of reads aligning within that bin. The Hi-C map has been drawn at a resolution of 1.25 Mb. Each blue square on the diagonal represents a chromosome-length scaffold. Autosomes are listed first, ordered by size, and the last square corresponds to the X chromosome. The axes are labeled in units of megabases.
Figure 2:

Hi-C map of our Panubis1.0 genome. The figure represents the Hi-C map obtained by aligning Hi-C paired-end reads to the Panubis1.0 genome assembly laid out on the X-axis as well as the Y-axis. Because each read-pair consists of 2 reads, a position (i, j) on this map represents the number of read-pairs where one read aligned to position i and the other read aligned to position j on the Panubis1.0 genome. The intensity of each pixel in this Hi-C map represents the number of reads aligning within that bin. The Hi-C map has been drawn at a resolution of 1.25 Mb. Each blue square on the diagonal represents a chromosome-length scaffold. Autosomes are listed first, ordered by size, and the last square corresponds to the X chromosome. The axes are labeled in units of megabases.

The resulting P. anubis genome assembly, which we name Panubis1.0, contains ∼2.87 Gb of sequenced base pairs (non-N base pairs) and 2.3 Mb (<0.1%) of gaps (N's). Single scaffolds spanning the 20 autosomes and the X chromosome together contain 95.14% (∼2.73 Gb) of the sequenced base pairs. We number the autosomes as chr1–chr20, in decreasing order of the scaffold length, so some chromosome numbers in our convention are different from Panu_3.0’s numbering. We note that Panubis1.0 has a contig N50 of 1.46 Mb, which is a >10-fold improvement over the contig N50 (∼139 kb) of the Panu_3.0 assembly. As a result, Panubis1.0 contains 5 times fewer scaffolds (11,145 scaffolds with a scaffold N50 of ∼140 Mb) compared with the Panu_3.0 assembly (63,235 scaffolds with a scaffold N50 of ∼586 kb); see Table 1 for a further comparison of the 2 assemblies. Gene completion analysis of the assembly using BUSCO version 3 (BUSCO, RRID:SCR_015008) and the euarchontoglires odb9 ortholog dataset [40] suggests that chromosomes in the Panubis1.0 assembly contain 5,167 of 6,192 (83.4%) complete genes, comparable to 5,166 of 6,192 (83.4%) complete genes found in the chromosomes of the Panu_3.0 assembly. Furthermore, the chromosomes in the Panubis1.0 assembly contained 247 of 6,192 (4.0%) fragmented genes, comparable to 262 of 6,192 (4.2%) fragmented genes in the chromosomes of the Panu_3.0 assembly.

Y chromosome assembly

The Hi-C scaffolding with 3d-dna yielded an ∼8 Mb scaffold that putatively represents part of the baboon Y chromosome. Because rhesus macaque is the phylogenetically closest species to baboons that has a chromosome-scale assembly, we aligned this putative baboon Y chromosome scaffold with the rhesus macaque Y chromosome (Fig. 3). We observed a substantial amount of synteny between the putative baboon Y and the rhesus Y, comparable to what is observed between the chimpanzee Y and the human Y chromosomes. This suggests that the Panubis1.0 chromosome Y captures at least part of the true chromosome Y. (For comparison, genetic divergence between baboon and rhesus is similar to human–chimpanzee divergence [41].) The observed breaks in synteny are consistent with the well-documented high rate of chromosomal rearrangements on mammalian Y chromosomes [42].

Dot plots showing chromosome Y synteny suggest that the Panubis1.0 chromosome Y is putatively at least a part of the true chromosome Y. A dot plot between rhesus chromosome Y and Panubis1.0 putative chromosome Y is shown on the left, while a dot plot between the chimpanzee chromosome Y and the human chromosome Y is shown on the right. Each dot represents an aligned block, with purple representing an alignment on the positive strand and cyan an alignment on the negative strand. The axis labels are in units of megabases. The phylogenetic distance between baboon and rhesus macaque is similar to that between human and chimpanzee. Hence, the broadly conserved synteny between the rhesus and baboon putative chromosome Y as compared to the synteny between the chimp and human chromosome Y suggests that the scaffold representing the putative chromosome Y in the Panubis1.0 assembly is indeed capturing at least a large part of chromosome Y.
Figure 3:

Dot plots showing chromosome Y synteny suggest that the Panubis1.0 chromosome Y is putatively at least a part of the true chromosome Y. A dot plot between rhesus chromosome Y and Panubis1.0 putative chromosome Y is shown on the left, while a dot plot between the chimpanzee chromosome Y and the human chromosome Y is shown on the right. Each dot represents an aligned block, with purple representing an alignment on the positive strand and cyan an alignment on the negative strand. The axis labels are in units of megabases. The phylogenetic distance between baboon and rhesus macaque is similar to that between human and chimpanzee. Hence, the broadly conserved synteny between the rhesus and baboon putative chromosome Y as compared to the synteny between the chimp and human chromosome Y suggests that the scaffold representing the putative chromosome Y in the Panubis1.0 assembly is indeed capturing at least a large part of chromosome Y.

Genome annotation

Annotation of the protein and non-protein coding genes was performed by NCBI's RefSeq database (RefSeq, RRID:SCR_003496), based on RNA sequencing (RNA-seq) of 4 captive baboons at the SNPRC (BioProject PRJNA559725), as well as other publicly available baboon expression data. Panubis1.0 contains 21,087 protein-coding genes and 11,295 non-coding genes. This is a slight decrease in the number of protein-coding genes relative to Panu_3.0 (21,087 vs 21,300), which can be explained by merging genes together (n = 252) and an increase in the number of non-coding genes (11,295 vs 8,433). Panubis1.0 also contains slightly more pseudogenes (6,680 vs 5,998) and genes with splice variants (14,526 vs 13,693). Many of these differences may reflect insights gained from an improved assembly leading to an increased ability to map sequencing data; indeed, during genome annotation, 88% of RNA-seq reads mapped to Panubis1.0 while only 80% mapped to Panu3.0.

Overall, most genes (66%) are highly similar or identical between Panubis1.0 and Panu_3.0. Of the remaining genes, 13% of genes contain major changes (e.g., were split, moved, changed gene type, or changed substantially in completeness), 20% are novel in Panubis1.0, and 12% deprecated from Panu_3.0.

Comparisons with the publicly available Panu_3.0 assembly

Fig. 4 presents a dot plot between the chromosomes of the Panubis1.0 and the Panu_3.0 assemblies. There are chromosomes with large differences between the 2 assemblies, and these differences are evident even in the chromosome-scale dot plots. Table 2 presents a list of large (>100 kb) differences between the Panubis1.0 and Panu_3.0 assemblies where we have evidence based on Hi-C data that suggested that the Panubis1.0 assembly is correct. We used several orthogonal sources of information to assess whether these were errors in our Panubis1.0 assembly or in the Panu_3.0 assembly. These included Bionano Genomics optical maps obtained from the same individual used for generating Panubis1.0, linkage information from a pedigree of baboons that were all sequenced to high coverage, and linkage disequilibrium information from 24 unrelated olive baboons from the SNPRC pedigreed baboon colony. We manually examined each of these breaks in synteny between Panubis1.0 and Panu_3.0 to determine whether these independent sources of evidence supported one assembly over the other (summarized in Table 2). Overall, in 11 of 12 large syntenic differences between Panubis1.0 and Panu_3.0 where the Hi-C data support the Panubis1.0 assembly, ≥1 of these independent sources provided additional evidence that the Panubis1.0 assembly is correct (Fig. 5, Supplementary Figs S1–S5).

Dot plots showing alignment of Panu_3.0 reference-assisted chromosomes vs Panubis1.0 chromosome-length scaffolds. The Panu_3.0 assembly is shown on the Y-axis and the Panubis1.0 assembly is shown on the X-axis. Each dot represents the position of a syntenic block between the 2 assemblies as determined by the nucmer alignment. The color of the dot reflects the orientation of the individual alignments (purple indicates consistent orientation and blue indicates inconsistent orientation). The dot plots illustrate that there are chromosomes containing large inversions and translocations in the Panu_3.0 assembly with respect to the Panubis1.0 assembly.
Figure 4:

Dot plots showing alignment of Panu_3.0 reference-assisted chromosomes vs Panubis1.0 chromosome-length scaffolds. The Panu_3.0 assembly is shown on the Y-axis and the Panubis1.0 assembly is shown on the X-axis. Each dot represents the position of a syntenic block between the 2 assemblies as determined by the nucmer alignment. The color of the dot reflects the orientation of the individual alignments (purple indicates consistent orientation and blue indicates inconsistent orientation). The dot plots illustrate that there are chromosomes containing large inversions and translocations in the Panu_3.0 assembly with respect to the Panubis1.0 assembly.

Evidence for misassembly on chromosome NC_018167.2 in Panu_3.0. (a) Bionano optical map alignment to the Panu_3.0 assembly demonstrates an inversion on chromosome NC_018167.2 beginning at ∼29.38 Mb and ending at ∼44.71 Mb. (b) Estimates of the population recombination rate ρ near the potential synteny breaks of the inversion identified on chromosome NC_018167.2. (c) The x-axis shows positions along chromosome NC_018167.2 in Panu_3.0, where each row represents 1 of the 9 offspring of sire 10,173. Switches between red and blue within a row represent a recombination event. The 2 vertical black lines represent locations where ≥3 recombinations occur at the same locus, indicating a potential misassembly.
Figure 5:

Evidence for misassembly on chromosome NC_018167.2 in Panu_3.0. (a) Bionano optical map alignment to the Panu_3.0 assembly demonstrates an inversion on chromosome NC_018167.2 beginning at ∼29.38 Mb and ending at ∼44.71 Mb. (b) Estimates of the population recombination rate ρ near the potential synteny breaks of the inversion identified on chromosome NC_018167.2. (c) The x-axis shows positions along chromosome NC_018167.2 in Panu_3.0, where each row represents 1 of the 9 offspring of sire 10,173. Switches between red and blue within a row represent a recombination event. The 2 vertical black lines represent locations where ≥3 recombinations occur at the same locus, indicating a potential misassembly.

Table 2:

Likely large (>100 kb) assembly errors in Panu_3.0, ordered by size

Panu_3.0 chromosomePanu_3.0 (Mb)Panu_2.0 (Mb)TypeLinkage supportBNG supportLDhelmet support
StartEndStartEnd
NC_018164.288.05104.9987.61104.98InvStart1YesUnknown1
NC_018167.229.3844.7129.2544.53InvStart + endYesStart + end
NC_018156.24.048.674.188.63InvNoYes2No
NC_018162.282.4286.4781.9184.01TransStart + endNo3No
NC_018166.2104.28108.05103.66107.44InvNoYesNo
NC_018165.215.9319.4815.8519.40InvNoNoNo
NC_018166.296.94100.1296.3999.54TransStart + endYes4Start + end
NC_018160.236.0536.7535.8836.55TransNoYes4Start
NC_018163.223.1923.6600.47TransNoYes2No
NC_018164.24.054.493.994.45TransNo5YesNo
NC_018165.2100.91101.18100.31100.59TransNoYesNo
NC_018152.2166.73166.89169.86170.10TransStart + endYesEnd
Panu_3.0 chromosomePanu_3.0 (Mb)Panu_2.0 (Mb)TypeLinkage supportBNG supportLDhelmet support
StartEndStartEnd
NC_018164.288.05104.9987.61104.98InvStart1YesUnknown1
NC_018167.229.3844.7129.2544.53InvStart + endYesStart + end
NC_018156.24.048.674.188.63InvNoYes2No
NC_018162.282.4286.4781.9184.01TransStart + endNo3No
NC_018166.2104.28108.05103.66107.44InvNoYesNo
NC_018165.215.9319.4815.8519.40InvNoNoNo
NC_018166.296.94100.1296.3999.54TransStart + endYes4Start + end
NC_018160.236.0536.7535.8836.55TransNoYes4Start
NC_018163.223.1923.6600.47TransNoYes2No
NC_018164.24.054.493.994.45TransNo5YesNo
NC_018165.2100.91101.18100.31100.59TransNoYesNo
NC_018152.2166.73166.89169.86170.10TransStart + endYesEnd

Note that a “no” in the “Linkage support” or “LDhelmet support” columns is inconclusive and should not be interpreted as support for the Panu_3.0 assembly being correct.

1

Unable to determine whether linkage and LDhelmet provide support at the end breakpoint due to a lack of synteny between Panu_2.0 and Panu_3.0.

2

Panu_2.0 assembly seems to be correct.

3

Bionano Genomics (BNG) maps do not support a translocation with these breakpoints. However, they do support a potential large structural variant at the starting breakpoint.

4

BNG maps support the presence of a large structural variant, which may be a translocation.

5

Linkage data suggest a potential polymorphic inversion (in 16,413) partially overlapping with this interval.

Table 2:

Likely large (>100 kb) assembly errors in Panu_3.0, ordered by size

Panu_3.0 chromosomePanu_3.0 (Mb)Panu_2.0 (Mb)TypeLinkage supportBNG supportLDhelmet support
StartEndStartEnd
NC_018164.288.05104.9987.61104.98InvStart1YesUnknown1
NC_018167.229.3844.7129.2544.53InvStart + endYesStart + end
NC_018156.24.048.674.188.63InvNoYes2No
NC_018162.282.4286.4781.9184.01TransStart + endNo3No
NC_018166.2104.28108.05103.66107.44InvNoYesNo
NC_018165.215.9319.4815.8519.40InvNoNoNo
NC_018166.296.94100.1296.3999.54TransStart + endYes4Start + end
NC_018160.236.0536.7535.8836.55TransNoYes4Start
NC_018163.223.1923.6600.47TransNoYes2No
NC_018164.24.054.493.994.45TransNo5YesNo
NC_018165.2100.91101.18100.31100.59TransNoYesNo
NC_018152.2166.73166.89169.86170.10TransStart + endYesEnd
Panu_3.0 chromosomePanu_3.0 (Mb)Panu_2.0 (Mb)TypeLinkage supportBNG supportLDhelmet support
StartEndStartEnd
NC_018164.288.05104.9987.61104.98InvStart1YesUnknown1
NC_018167.229.3844.7129.2544.53InvStart + endYesStart + end
NC_018156.24.048.674.188.63InvNoYes2No
NC_018162.282.4286.4781.9184.01TransStart + endNo3No
NC_018166.2104.28108.05103.66107.44InvNoYesNo
NC_018165.215.9319.4815.8519.40InvNoNoNo
NC_018166.296.94100.1296.3999.54TransStart + endYes4Start + end
NC_018160.236.0536.7535.8836.55TransNoYes4Start
NC_018163.223.1923.6600.47TransNoYes2No
NC_018164.24.054.493.994.45TransNo5YesNo
NC_018165.2100.91101.18100.31100.59TransNoYesNo
NC_018152.2166.73166.89169.86170.10TransStart + endYesEnd

Note that a “no” in the “Linkage support” or “LDhelmet support” columns is inconclusive and should not be interpreted as support for the Panu_3.0 assembly being correct.

1

Unable to determine whether linkage and LDhelmet provide support at the end breakpoint due to a lack of synteny between Panu_2.0 and Panu_3.0.

2

Panu_2.0 assembly seems to be correct.

3

Bionano Genomics (BNG) maps do not support a translocation with these breakpoints. However, they do support a potential large structural variant at the starting breakpoint.

4

BNG maps support the presence of a large structural variant, which may be a translocation.

5

Linkage data suggest a potential polymorphic inversion (in 16,413) partially overlapping with this interval.

Table 3 presents an additional list of large inversion differences between Panubis1.0 and Panu_3.0, where, on the basis of the current data, it is difficult to conclude whether it is Panubis1.0 or Panu_3.0 that is correct. For these regions, Hi-C data only weakly support the Panubis1.0 assembly and do not provide direct evidence that the Panu_3.0 assembly is incorrect. In addition, the aforementioned orthogonal sources of information are inconclusive as to which assembly is correct for each of these regions. Further research will be needed to assess the correct orientation of the baboon genome sequence in each of these problematic regions.

Table 3:

Additional large (>100 kb) inversion differences between Panubis1.0 and Panu_3.0, ordered by size

Panubis1.0 chromosomePanubis1.0 Start (Mb)Panubis1.0 End (Mb)Panu_3.0 chromosomePanu_3.0 chromosomePanu_3.0 Start (Mb)Panu_3.0 End (Mb)
NC_044992.128.8945.01CM001506.2NC_018167.229.3844.79
NC_044995.10.0013.00CM001509.2NC_018170.20.0013.31
NC_044987.1101.26106.48CM001504.2NC_018165.2101.44107.53
NC_044978.1176.83181.37CM001495.2NC_018156.2175.08180.09
NC_044986.186.6190.73CM001499.2NC_018160.285.5690.30
NC_044988.10.003.50CM001505.2NC_018166.20.003.78
NC_044996.186.6789.58CM001511.2NC_018172.286.9190.23
NC_044982.1154.35156.82CM001497.2NC_018158.2155.71158.53
NC_044984.17.9610.58CM001501.2NC_018162.28.0310.83
NC_044991.133.0935.09CM001500.2NC_018161.232.4635.05
NC_044996.193.6795.52CM001511.2NC_018172.294.2296.59
NC_044981.168.6171.05CM001494.2NC_018155.269.3771.65
NC_044996.140.4942.78CM001511.2NC_018172.241.1543.34
NC_044996.110.0111.79CM001511.2NC_018172.210.2012.06
NC_044996.131.8033.37CM001511.2NC_018172.232.1133.97
NC_044979.1142.32144.05CM001493.2NC_018154.2141.96143.71
NC_044996.190.7792.54CM001511.2NC_018172.291.4292.99
NC_044993.163.5965.52CM001510.2NC_018171.262.3163.73
NC_044991.126.7928.49CM001500.2NC_018161.226.5227.82
NC_044980.10.020.78CM001496.2NC_018157.20.021.26
NC_044979.10.000.73CM001493.2NC_018154.20.000.75
Panubis1.0 chromosomePanubis1.0 Start (Mb)Panubis1.0 End (Mb)Panu_3.0 chromosomePanu_3.0 chromosomePanu_3.0 Start (Mb)Panu_3.0 End (Mb)
NC_044992.128.8945.01CM001506.2NC_018167.229.3844.79
NC_044995.10.0013.00CM001509.2NC_018170.20.0013.31
NC_044987.1101.26106.48CM001504.2NC_018165.2101.44107.53
NC_044978.1176.83181.37CM001495.2NC_018156.2175.08180.09
NC_044986.186.6190.73CM001499.2NC_018160.285.5690.30
NC_044988.10.003.50CM001505.2NC_018166.20.003.78
NC_044996.186.6789.58CM001511.2NC_018172.286.9190.23
NC_044982.1154.35156.82CM001497.2NC_018158.2155.71158.53
NC_044984.17.9610.58CM001501.2NC_018162.28.0310.83
NC_044991.133.0935.09CM001500.2NC_018161.232.4635.05
NC_044996.193.6795.52CM001511.2NC_018172.294.2296.59
NC_044981.168.6171.05CM001494.2NC_018155.269.3771.65
NC_044996.140.4942.78CM001511.2NC_018172.241.1543.34
NC_044996.110.0111.79CM001511.2NC_018172.210.2012.06
NC_044996.131.8033.37CM001511.2NC_018172.232.1133.97
NC_044979.1142.32144.05CM001493.2NC_018154.2141.96143.71
NC_044996.190.7792.54CM001511.2NC_018172.291.4292.99
NC_044993.163.5965.52CM001510.2NC_018171.262.3163.73
NC_044991.126.7928.49CM001500.2NC_018161.226.5227.82
NC_044980.10.020.78CM001496.2NC_018157.20.021.26
NC_044979.10.000.73CM001493.2NC_018154.20.000.75

We cannot definitively determine which orientation is correct for these inversions, and they should be considered provisional.

Table 3:

Additional large (>100 kb) inversion differences between Panubis1.0 and Panu_3.0, ordered by size

Panubis1.0 chromosomePanubis1.0 Start (Mb)Panubis1.0 End (Mb)Panu_3.0 chromosomePanu_3.0 chromosomePanu_3.0 Start (Mb)Panu_3.0 End (Mb)
NC_044992.128.8945.01CM001506.2NC_018167.229.3844.79
NC_044995.10.0013.00CM001509.2NC_018170.20.0013.31
NC_044987.1101.26106.48CM001504.2NC_018165.2101.44107.53
NC_044978.1176.83181.37CM001495.2NC_018156.2175.08180.09
NC_044986.186.6190.73CM001499.2NC_018160.285.5690.30
NC_044988.10.003.50CM001505.2NC_018166.20.003.78
NC_044996.186.6789.58CM001511.2NC_018172.286.9190.23
NC_044982.1154.35156.82CM001497.2NC_018158.2155.71158.53
NC_044984.17.9610.58CM001501.2NC_018162.28.0310.83
NC_044991.133.0935.09CM001500.2NC_018161.232.4635.05
NC_044996.193.6795.52CM001511.2NC_018172.294.2296.59
NC_044981.168.6171.05CM001494.2NC_018155.269.3771.65
NC_044996.140.4942.78CM001511.2NC_018172.241.1543.34
NC_044996.110.0111.79CM001511.2NC_018172.210.2012.06
NC_044996.131.8033.37CM001511.2NC_018172.232.1133.97
NC_044979.1142.32144.05CM001493.2NC_018154.2141.96143.71
NC_044996.190.7792.54CM001511.2NC_018172.291.4292.99
NC_044993.163.5965.52CM001510.2NC_018171.262.3163.73
NC_044991.126.7928.49CM001500.2NC_018161.226.5227.82
NC_044980.10.020.78CM001496.2NC_018157.20.021.26
NC_044979.10.000.73CM001493.2NC_018154.20.000.75
Panubis1.0 chromosomePanubis1.0 Start (Mb)Panubis1.0 End (Mb)Panu_3.0 chromosomePanu_3.0 chromosomePanu_3.0 Start (Mb)Panu_3.0 End (Mb)
NC_044992.128.8945.01CM001506.2NC_018167.229.3844.79
NC_044995.10.0013.00CM001509.2NC_018170.20.0013.31
NC_044987.1101.26106.48CM001504.2NC_018165.2101.44107.53
NC_044978.1176.83181.37CM001495.2NC_018156.2175.08180.09
NC_044986.186.6190.73CM001499.2NC_018160.285.5690.30
NC_044988.10.003.50CM001505.2NC_018166.20.003.78
NC_044996.186.6789.58CM001511.2NC_018172.286.9190.23
NC_044982.1154.35156.82CM001497.2NC_018158.2155.71158.53
NC_044984.17.9610.58CM001501.2NC_018162.28.0310.83
NC_044991.133.0935.09CM001500.2NC_018161.232.4635.05
NC_044996.193.6795.52CM001511.2NC_018172.294.2296.59
NC_044981.168.6171.05CM001494.2NC_018155.269.3771.65
NC_044996.140.4942.78CM001511.2NC_018172.241.1543.34
NC_044996.110.0111.79CM001511.2NC_018172.210.2012.06
NC_044996.131.8033.37CM001511.2NC_018172.232.1133.97
NC_044979.1142.32144.05CM001493.2NC_018154.2141.96143.71
NC_044996.190.7792.54CM001511.2NC_018172.291.4292.99
NC_044993.163.5965.52CM001510.2NC_018171.262.3163.73
NC_044991.126.7928.49CM001500.2NC_018161.226.5227.82
NC_044980.10.020.78CM001496.2NC_018157.20.021.26
NC_044979.10.000.73CM001493.2NC_018154.20.000.75

We cannot definitively determine which orientation is correct for these inversions, and they should be considered provisional.

Linkage disequilibrium analyses

We estimated the scaled recombination rate ρ (=4Nr, where N is the effective population size and r is the recombination rate per generation) using LDhelmet [43] from 24 unrelated olive baboons [44]. We then identified potential breaks in synteny as regions with total ρ > 500 and ρ/bp > 0.2. We considered there to be evidence of a synteny break if 1 of these regions was within 50 kb of a potential breakpoint (as identified in Panu_3.0 vs. Panubis1.0 comparisons). The false discovery rate for this definition is ∼4%.

To calculate recombination rates, we used a variant call set mapped onto the old assembly Panu_2.0, as described in Robinson et al. [44]. For the potential breaks in synteny identified above, we used liftover to convert the breakpoints into Panu_3.0 coordinates and verified that Panu_2.0 and Panu_3.0 were syntenic with each other across the breakpoints.

Finally, owing to the inherent noise in linkage disequilibrium–based estimates of ρ, the lack of evidence for a synteny break in Panu_3.0 is not positive evidence that the Panu_3.0 assembly is correct.

Inference of crossovers in a baboon pedigree

We used a previously described vcf file for the baboons shown in Fig. 6, which was mapped using Panu_2.0 coordinates and lifted over to Panu_3.0 coordinates. We considered only biallelic single-nucleotide polymorphisms (SNPs), and required a depth ≥15, QUAL >50, and genotype quality ≥40 to make a genotype call. We further required an allelic balance of >0.3 for heterozygote calls and <0.07 for homozygote calls, and excluded all repetitive regions as described in [44].

Pedigree of baboons used in linkage analysis. Circles represent females, and squares, males.
Figure 6:

Pedigree of baboons used in linkage analysis. Circles represent females, and squares, males.

We focused our analyses on those SNPs that were most informative about recent crossover events. For example, to detect paternal crossovers, we restricted our analyses to SNPs where 10,173 was heterozygous, both 9,841 and 12,242 were homozygous, and all 9 offspring had genotype calls. (For maternal crossovers, we required 10,173 to be homozygous and both 9,841 and 12,242 to be heterozygous.) For these sites, it is straightforward to infer which allele (coded as 0 for reference allele and 1 for alternative allele) was passed on from 10,173 to his offspring. While the haplotypic phase of 10,173 is unknown, we can infer crossover events on the basis of the minimum number of crossovers needed to be consistent with the observed patterns of inheritance in the offspring of 10,173 [45]. For example, Fig. 5c shows that the inheritance pattern near position 29.38 requires ≥3 crossovers (e.g., in individuals 17,199, 18,385, and 19,348).

For each potential error in the Panu_3.0 assembly, we converted the breakpoint location into Panu_2.0 coordinates and verified synteny between Panu_2.0 and Panu_3.0 across the breakpoint region. We then determined whether there were an abnormally large number of crossovers inferred right at the breakpoint. Specifically, if we inferred ≥3 crossover events (out of 18 total meioses, 9 paternal and 9 maternal), then we considered this as evidence that the Panu_3.0 assembly is incorrect, as in Fig. 5c (cf. “Linkage upport” column in Table 2). Note that the converse is not true: <3 inferred crossover events is not evidence that the Panu_3.0 assembly is correct at a particular location.

Repeat analysis

We analyzed the repeat content of the Panubis1.0 and Panu_3.0 genome assemblies using RepeatMasker (RepeatMasker, RRID:SCR_012954) [46] version open-4.0.8 in sensitive mode and with blastp version 2.0MP-WashU using the RepeatMasker Combined Database: Dfam_Consensus-20181026, RepBase-20181026. The following parameters were used to run RepeatMasker: RepeatMasker -engine wublast -species “papio anubis” -s -no_is -cutoff 255 -frag 20 000.

Fig. S6 summarizes the distribution of various types of repeats found in the 2 genome assemblies. We found that the genome assemblies are comparable in terms of their repeat content.

Conclusion

The development and commercialization of new technologies by companies such as Illumina, 10x Genomics, Bionano Genomics, Dovetail Genomics, and Phase Genomics has enabled researchers to cheaply generate fully de novo genome assemblies with high scaffold contiguity (e.g., [36, 39, 47–49]). When used in combination with long-read sequences (e.g., from Oxford Nanopore or Pacific Biosciences), these technologies can produce high-quality genome assemblies at a fraction of the cost of traditional clone library–based approaches (e.g., [48, 50]). In this context, our assembly Panubis1.0 provides a 10-fold increase in contig N50 size and a 240-fold increase in scaffold N50 size relative to Panu_3.0 at <1% of the reagent cost. The contiguity of this assembly will be especially useful for future studies where knowing the genomic location is important (e.g., hybridization or recombination studies).

One natural question that arises with any new genome assembly is how one assesses that an assembly is correct.” Indeed, some of the recently published Hi-C–based assemblies have not provided any corroborating evidence supporting their assemblies (e.g., [51]). Here, we used 3 independent sources of information to provide evidence that 11 of 12 large syntenic differences identified from the dot plots are correct in our new baboon assembly (Panubis1.0) relative to the previous assembly Panu_3.0 (Table 2). In all, the incorporation of optical maps and linkage and linkage disequilibrium data provide substantially more support for our assembly than was produced by previous Hi-C–based assemblies (e.g., [48–50]) and counters any potential criticism of the fact that our genome assembly (using individual “15944” from the SNPRC baboon colony) comes from a different individual from the previous baboon assembly (individual 1 × 1155 from the SNPRC baboon colony).

There is however a larger list of 21 inversion differences between Panubis1.0 and Panu_3.0 where the Hi-C data do not provide definitive evidence on which orientation is correct (Table 3). While Hi-C–based assemblies may be prone to small contig inversions within scaffolds, this should be less of a problem for the large inversions outlined here because there will be few interactions that span the full length of the contig, and the correct orientation is generally apparent from the higher weight of links. These changes to the baboon assembly should be considered provisional until additional data can be collected (e.g., high-coverage long-read data) that provide a more definitive answer.

Data Availability

All of the raw sequence data from individual 15,944, as well as the Panubis1.0 assembly, are available without restriction from NCBI under BioProject PRJNA527874. New RNA-seq data used for genome annotation are available under BioProject PRJNA559725. The genome annotation report and raw files can be found at [52]. All supporting data and materials are available in the GigaScience GigaDB database [53].

Additional Files

Supplemental Figure S1:Linkagedisequilibrium–based evidence for misassemblies in Panu_3.0. Estimates of the population recombination rate ρ near the potential synteny breaks of the misassemblies identified in chromosomes (a) NC_018166.2, (b) NC_018160.2, and (c) NC_018152.2. Red represents the beginning of a misassembly event, and blue, the end of a misassembly event.

Supplemental Figure S2:Recombination-based evidence for misassemblies in Panu_3.0. Shown on the x-axis are positions along chromosomes in Panu_3.0, where each row represents 1 of the 9 offsprings of sire 10173. Switches between red and blue within a row represent a recombination event. The vertical black lines represent locations where ≥3 recombinations occur at the same locus, indicating a potential misassembly; except in (d), where recombination occurs at ∼167 Mb but is not shown by a vertical black line.

Supplemental Figure S3:Evidence for inversions in Panu_3.0 based on Bionano alignment. (a) Inversion on chromosome NC_018164.2 demonstrated by Bionano optical map alignment. (b) Alignment to Bionano optical map shows inverted coordinates due to an inversion on chromosome NC_018156.2. (c) Bionano optical map alignment shows an inversion on chromosome NC_018166.2.

Supplemental Figure S4:Evidence for translocations in Panu_3.0 based on Bionano alignment. (a) Breaks in Bionano alignment on chromosome NC_018166.2 indicate a misassembly. (b) Bionano optical map alignment demonstrates a misassembly on chromosome NC_018160.2. (c) Bionano optical map alignment shows a translocation between chromosomes NC_018163.2 and NC_018164.2.

Supplemental Figure S5:Evidence for translocations in Panu_3.0 based on Bionano alignment. (a) Breaks in Bionano alignment on chromosome NC_018164.2 indicate a misassembly. (b) Bionano optical map alignment demonstrates a misassembly on chromosome NC_018165.2. (c) Bionano optical map alignment shows a translocation on chromosome NC_018152.2.

Abbreviations

BLAST: Basic Local Alignment Search Tool; bp: base pairs; BUSCO: Benchmarking Universal Single-Copy Orthologs; Gb: gigabase pairs; kb: kilobase pairs; Mb: megabase pairs; NCBI: National Center for Biotechnology Information; NIH: National Institutes of Health; RNA-seq: RNA sequencing; SNP: single-nucleotide polymorphism; SNPRC: Southwest National Primate Research Center.

Competing Interests

The authors declare that they have no competing interests.

Funding

The work was supported in part by NIH grants R24 OD017859 (to L.A.C. and J.D.W.), R01 GM115433 (to J.D.W.), R01 GM094402 (to Y.S.S.), R35 GM134922 (to Y.S.S.), R01 HG005946 (to P.Y.K.) and by a Packard Fellowship for Science and Engineering (to Y.S.S.). Y.S.S. is a Chan Zuckerberg Biohub Investigator.

Authors' Contributions

J.D.W., L.A.C., and Y.S.S. conceived the project. J.G., S.D., S.S., M.L.S., and P.Y.K. generated data for the project. M.L.S. and S.S.B. performed the genome assembly. S.S.B., M.L.S., J.R., T.P.V., and J.D.W. performed the other analyses. S.S.B. and J.D.W. wrote the manuscript with contributions from all authors.

ACKNOWLEDGEMENTS

We thank Jenny Tung for providing some of the high molecular weight DNA used in this study.

References

1.

VandeBerg
 
JL
,
Williams-Blangero
 
S
,
Tardif
 
SD
, eds.
The Baboon in Biomedical Research
.
Springer
;
2009
.

2.

McGill
 
HC
,
McMahan
 
CA
,
Kruski
 
AW
, et al.  
Responses of serum lipoproteins to dietary cholesterol and type of fat in the baboon
.
Arteriosclerosis
.
1981
;
1
:
337
44
.

3.

Kushwaha
 
RS
,
Reardon
 
CA
,
Lewis
 
DS
, et al.  
Effect of dietary lipids on plasma activity and hepatic mRNA levels of cholesteryl ester transfer protein in high- and low-responding baboons (Papio species)
.
Metabolism
.
1994
;
43
:
1006
12
.

4.

Singh
 
AT
,
Rainwater
 
DL
,
Kammerer
 
CM
, et al.  
Dietary and genetic effects on LDL size measures in baboons
.
Arterioscler Thromb Vasc Biol
.
1996
;
16
:
1448
53
.

5.

Kammerer
 
CM
,
Rainwater
 
DL
,
Cox
 
LA
, et al.  
Locus controlling LDL cholesterol response to dietary cholesterol is on baboon homologue of human chromosome 6
.
Arterioscler Thromb Vasc Biol
.
2002
;
22
:
1720
5
.

6.

Rainwater
 
DL
,
Kammerer
 
CM
,
Mahaney
 
MC
, et al.  
Localization of genes that control LDL size fractions in baboons
.
Atherosclerosis
.
2003
;
168
:
15
22
.

7.

Martin
 
LJ
,
Blangero
 
J
,
Rogers
 
J
, et al.  
A quantitative trait locus influencing activin-to-estrogen ratio in pedigreed baboons maps to a region homologous to human chromosome 19
.
Hum Biol
.
2001
;
73
:
787
800
.

8.

Sherwood
 
RJ
,
Duren
 
DL
,
Havill
 
LM
, et al.  
A genomewide linkage scan for quantitative trait loci influencing the craniofacial complex in baboons (Papio hamadryas spp.)
.
Genetics
.
2008
;
180
:
619
28
.

9.

Havill
 
LM
,
Mahaney
 
MC
,
Cox
 
LA
, et al.  
A quantitative trait locus for normal variation in forearm bone mineral density in pedigreed baboons maps to the ortholog of human chromosome 11q
.
J Clin Endocrinol Metab
.
2005
;
90
:
3638
45
.

10.

Havill
 
LM
,
Cox
 
LA
,
Rogers
 
J
, et al.  
Cross-species replication of a serum osteocalcin quantitative trait locus on human chromosome 16q in pedigreed baboons
.
Calcif Tissue Int
.
2005
;
77
:
205
11
.

11.

Rainwater
 
DL
,
Cox
 
LA
,
Rogers
 
J
, et al.  
Localization of multiple pleiotropic genes for lipoprotein metabolism in baboons
.
J Lipid Res
.
2009
;
50
:
1420
8
.

12.

Cox
 
LA
,
Glenn
 
J
,
Ascher
 
S
, et al.  
Integration of genetic and genomic methods for identification of genes and gene variants encoding QTLs in the nonhuman primate
.
Methods
.
2009
;
49
:
63
9
.

13.

Wall
 
JD
,
Schlebusch
 
SA
,
Alberts
 
SC
, et al.  
Genomewide ancestry and divergence patterns from low-coverage sequencing data reveal a complex history of admixture in wild baboons
.
Mol Ecol
.
2016
;
25
:
3469
83
.

14.

Telenti
 
A
,
Pierce
 
LCT
,
Biggs
 
WH
, et al.  
Deep sequencing of 10,000 human genomes
.
Proc Natl Acad Sci U S A
.
2016
;
113
:
11901
6
.

15.

McCarthy
 
S
,
Das
 
S
,
Kretzschmar
 
W
, et al.  
A reference panel of 64,976 haplotypes for genotype imputation
.
Nat Genet
.
2016
;
48
:
1279
83
.

16.

Halldorsson
 
BV
,
Palsson
 
G
,
Stefansson
 
OA
, et al.  
Characterizing mutagenic effects of recombination through a sequence-level genetic map
.
Science
.
2019
;
363
(
6425
):
eaau1043
.

17.

Karczewski
 
KJ
,
Francioli
 
LC
,
Tiao
 
G
, et al.  
The mutational constraint spectrum quantified from variation in 141,456 humans
.
Nature
.
2020
;
581
:
434
43
.

18.

Taliun
 
D
,
Harris
 
DN
,
Kessler
 
MD
, et al.  
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program
.
bioRxiv
.
2020
, doi:.

19.

Lander
 
ES
,
Linton
 
LM
,
Birren
 
B
, et al.  
Initial sequencing and analysis of the human genome
.
Nature
.
2001
;
409
:
860
921
.

20.

International Human Genome Sequencing Consortium
.  
Finishing the euchromatic sequence of the human genome
.
Nature
.
2004
;
431
:
931
45
.

21.

International HapMap Consortium
.  
A haplotype map of the human genome
.
Nature
.
2005
;
437
:
1299
320
.

22.

International HapMap Consortium
.  
A second generation human haplotype map of over 3.1 million SNPs
.
Nature
.
2007
;
449
(
7164
):
851
.

23.

International HapMap 3 Consortium
,
Altshuler
 
DM
,
Gibbs
 
RA
, et al.  
Integrating common and rare genetic variation in diverse human populations
.
Nature
.
2010
;
467
:
52
8
.

24.

1000 Genomes Project Consortium
,
Abecasis
 
GR
,
Altshuler
 
D
, et al.  
A map of human genome variation from population-scale sequencing
.
Nature
.
2010
;
467
:
1061
73
.

25.

1000 Genomes Project Consortium
,
Abecasis
 
GR
,
Auton
 
A
, et al.  
An integrated map of genetic variation from 1,092 human genomes
.
Nature
.
2012
;
491
:
56
65
.

26.

1000 Genomes Project Consortium
,
Auton
 
A
,
Brooks
 
LD
 et al.  
A global reference for human genetic variation
.
Nature
.
2015
;
526
:
68
74
.

27.

Rogers
 
J
,
Raveendran
 
M
,
Harris
 
RA
, et al.  
The comparative genomics and complex population history of Papiobaboons
.
Sci Adv
.
2019
;
5
(
1
):
eaau6947
.

28.

Weisenfeld
 
NI
,
Kumar
 
V
,
Shah
 
P
, et al.  
Direct determination of diploid genome sequences
.
Genome Res
.
2017
;
27
:
757
67
.

29.

Jain
 
M
,
Koren
 
S
,
Miga
 
KH
, et al.  
Nanopore sequencing and assembly of a human genome with ultra-long reads
.
Nat Biotechnol
.
2018
;
36
:
338
45
.

30.

Rao
 
SS
,
Huntley
 
MH
,
Durand
 
NC
, et al.  
A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping
.
Cell
.
2014
;
159
:
1665
80
.

31.

Ma
 
ZS
,
Li
 
L
,
Ye
 
C
, et al.  
Hybrid assembly of ultra-long Nanopore reads augmented with 10x-Genomics contigs: demonstrated with a human genome
.
Genomics
.
2019
;
111
(
6
):
1896
901
.

32.

Qin
 
M
,
Wu
 
S
,
Li
 
A
, et al.  
LRScaf: improving draft genomes using long noisy reads
.
BMC Genomics
.
2019
;
20
(
1
):
955
.

33.

Canu assembler documentation
.  .

34.

Xu
 
GC
,
Xu
 
TJ
,
Zhu
 
R
, et al.  
LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly
.
GigaScience
.
2019
;
8
(
1
):
giy157
.

35.

Walker
 
BJ
,
Abeel
 
T
,
Shea
 
T
, et al.  
Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement
.
PLoS One
.
2014
;
9
(
11
):
e112963
.

36.

Dudchenko
 
O
,
Batra
 
SS
,
Omer
 
AD
 et al.  
De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds
.
Science
.
2017
;
356
:
92
5
.

37.

Phase Genomics
.  
https://www.phasegenomics.com/. Accessed 10 January 2019
.

38.

Dudchenko
 
O
,
Shamim
 
MS
,
Batra
 
S
, et al.  
The Juicebox Assembly Tools module facilitates de novo assembly of mammalian genomes with chromosome-length scaffolds for under $1000
.
Biorxiv
.
2018
, doi:.

39.

Matthews
 
BJ
,
Dudchenko
 
O
,
Kingan
 
SB
, et al.  
Improved reference genome of Aedes aegypti informs arbovirus vector control
.
Nature
.
2018
;
563
:
501
7
.

40.

Simão
 
FA
,
Waterhouse
 
RM
,
Ioannidis
 
P
, et al.  
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
.
Bioinformatics
.
2015
;
31
:
3210
2
.

41.

Perelman
 
P
,
Johnson
 
WE
,
Roos
 
C
, et al.  
A molecular phylogeny of living primates
.
PLoS Genet
.
2011
;
7
(
3
):
e1001342
.

42.

Skov
 
L
,
Schierup
 
MH
,
Danish Pan Genome Consortium
.
Analysis of 62 hybrid assembled human Y chromosomes exposes rapid structural changes and high rates of gene conversion
.
PLoS Genet
.
2017
;
13
(
8
):
e1006834
.

43.

Chan
 
AH
,
Jenkins
 
PA
,
Song
 
YS
.
Genome-wide fine-scale recombination rate variation in Drosophila melanogaster
.
PLoS Genet
.
2012
;
8
(
12
):
e1003090
.

44.

Robinson
 
JA
,
Harrahill
 
NJ
,
Yetman
 
AT
, et al.  
Analysis of 100 high-coverage genomes from a pedigreed captive baboon colony
.
Genome Res
.
2019
;
29
:
848
56
.

45.

Coop
 
G
,
Wen
 
X
,
Ober
 
C
, et al.  
High-resolution mapping of crossovers reveals extensive variation in fine-scale recombination patterns among humans
.
Science
.
2008
;
319
:
1395
8
.

46.

Tarailo‐Graovac
 
M
,
Chen
 
N
.
Using RepeatMasker to identify repetitive elements in genomic sequences
.
Curr Protoc Bioinform
.
2009
;
25
(
1
):
4
10
.

47.

Mostovoy
 
Y
,
Levy-Sakin
 
M
,
Lam
 
J
, et al.  
A hybrid approach for de novo human genome sequence assembly and phasing
.
Nat Methods
.
2016
;
13
:
587
90
.

48.

Bickhart
 
DM
,
Rosen
 
BD
,
Koren
 
S
, et al.  
Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome
.
Nat Genet
.
2017
;
49
:
643
50
.

49.

Rice
 
ES
,
Kohno
 
S
,
John
 
JS
, et al.  
Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling
.
Genome Res
.
2017
;
27
:
686
96
.

50.

Kalbfleisch
 
TS
,
Rice
 
ES
,
DePriest
 
MS
, et al.  
Improved reference genome for the domestic horse increases assembly contiguity and composition
.
Commun Biol
.
2018
;
1
:
197
.

51.

Nuss
 
AB
,
Sharma
 
A
,
Gulia-Nuss
 
M
.
Chicago and Dovetail Hi-C proximity ligation yield chromosome length scaffolds of Ixodes scapularis genome
.
bioRxiv
.
2018
, doi:.

52.

NCBI Genome Annotation
. NCBI Papio anubis Annotation Release 104. .

53.

Batra
 
SS
,
Levy-Sakin
 
M
,
Robinson
 
J
, et al.  Supporting data for “Accurate assembly of the olive baboon (Papio anubis) genome using long-read and Hi-C data.”.
GigaScience Database
.
2020
, .

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.