Next-Generation Sequencing of the Complete Mitochondrial Genome of the Endangered Species Black Lion Tamarin Leontopithecus chrysopygus (Primates) and Mitogenomic Phylogeny Focusing on the Callitrichidae Family

We describe the complete mitochondrial genome sequence of the Black Lion Tamarin, an endangered primate species endemic to the Atlantic Rainforest of Brazil. We assembled the Leontopithecus chrysopygus mitogenome, through analysis of 523M base pairs (bp) of short reads produced by next-generation sequencing (NGS) on the Illumina Platform, and investigated the presence of nuclear mitochondrial pseudogenes and heteroplasmic sites. Additionally, we conducted phylogenetic analyses using all complete mitogenomes available for primates until June 2017. The single circular mitogenome of BLT showed organization and arrangement that are typical for other vertebrate species, with a total of 16618 bp, containing 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region (D-loop region). Our full phylogenetic tree is based on the most comprehensive mitogenomic dataset for Callitrichidae species to date, adding new data for the Leontopithecus genus, and discussing previous studies performed on primates. Moreover, the mitochondrial genome reported here consists of a robust mitogenome with 3000X coverage, which certainly will be useful for further phylogenetic and evolutionary analyses of Callitrichidae and higher taxa.

Several Callitrichidae species have been under threat mainly due to habitat fragmentation, gathering, and illegal trade. According to the Red List of IUCN, around 21 Neotropical primates from Platyrrhini parvorder are threatened by extinction. From this total, five species are considered the most endangered primates in the world (Schwitzer et al. 2014;IUCN 2017).
Despite numerous phylogenetic studies performed on platyrrhines in the last decades, issues involving evolutionary relationships among species and higher taxa remain unclear, and different phylogenies have been proposed for the Callitrichidae family (Barroso et al. 1997;Schneider 2000;Schneider et al. 2012;Harris et al. 2014). The commonly accepted consensus is that Saguinus and Leontopithecus were the first genera to separate from the others, and that Callimico and Cebuella are the closest sister groups of Callithrix (Kay 1990;Schneider et al. 1993Schneider et al. , 2012Schneider 2000;Opazo et al. 2006;Wildman et al. 2009;Springer et al. 2012;Duran et al. 2013;Finstermeier et al. 2013;Rosenberger et al. 2013;Harris et al. 2014). Some studies have also placed Saguinus as a sister group of Leontopithecus (Finstermeier et al. 2013). However, the phylogenetic reconstructions seem to be insufficiently resolved in this family.
Recent studies in primates have shown that mitochondrial-based phylogenies may provide more reliable information about evolutionary relationships among species and higher taxa than nuclear genes, and can also be successfully used to determine the timescale of their evolution (Finstermeier et al. 2013). Nevertheless, analyses of mitochondrial sequences (mtDNA) can reveal distinct results from nuclear sequences (nDNA) (Mundy and Kelly 2001;Perelman et al. 2011;Liedigk et al. 2014), although, both nDNA and mtDNA approaches have already shown congruent results (Schneider 2000;Schneider et al. 2012).
Alternatively, phylogenies using complete mitochondrial genomes may enable more robust statistical support when compared to analyses based on single genes (Liedigk et al. 2014). Recent studies using whole mitochondrial genomes of a wide range of primates have shown that mitogenomics may be more effective in defining some taxa species relationships than smaller mitochondrial fragment analyses (Malukiewicz et al. 2017). However, contrasting results have been found among representatives from the Callitrichidae family.
In this study, we characterized a high-coverage complete mitochondrial genome of L. chrysopygus and performed phylogenetic analysis using 124 complete mitogenomes available for the Primates order. The phylogenetic discussion focuses on the Callitrichidae family, from the Platyrrhini parvorder (Rylands and Mittermeier 2009); a full phylogeny has also been obtained for 65 Old World Monkeys (OWMs) and 34 Strepsirrhini, providing a highly robust mitogenomic phylogeny for Primates, and adding new data for callitrichids. Our complete tree includes 90 Haplorrhini species. Of these, 25 are New World Monkeys (NWMs) and nine are from the Callitrichidae family.

Ethical statement
Sample collection followed all ethical requirements proposed by the American Society of Primatologist for the Ethical Treatment of Non-Human Primates, and was approved by SISBIO #50616-1 (Authorization System and Biodiversity Information, Chico Mendes Institute for Biodiversity Conservation, Ministry of Environment, Brazil), and CEUA #9805200815 (Ethics Committee on Animal Experimentation and Research, UFSCar, São Carlos, São Paulo, Brazil).

Sample collection
The biological sample of L. chrysopygus was obtained from the Primatology Center of Rio de Janeiro (CPRJ), located in Guapimirim (Rio de Janeiro, Brazil). One adult male Black Lion Tamarin, who was born in captivity in 2007, was anesthetized using an inhalation mask with Isoflurane (2%) and Oxygen (2 L/minute), and then, 2 mL of peripheral venous blood were collected with vacutainer containing EDTA (3.6 mg). The sample was stored at -20°, and then used for DNA extraction.
DNA extraction, next generation sequencing experiments Total Genomic DNA was extracted using a ReliaPrep Blood gDNA Miniprep System Kit (Promega, Fitchburg, WI, USA), and DNA quality and quantity were evaluated on a NanoDrop Spectrophotometer (Thermo Fischer, Waltham MA, US). About 2 ug of DNA were used to construct short-insert libraries using a Nextera DNA Library Prep Kit (Illumina, San Diego, CA, USA). A HiSeq SBS Kit v4 PE was used to sequence runs of paired-end reads (2 · 101 bp) on the HiSequation 2500 Illumina Platform (Illumina, San Diego, CA, USA).

Mitogenome assembly
The mitogenome of the BLT was assembled in several steps. First, we employed the LeeHom tool (Renaud et al. 2014) to merge read pairs with overlapping of ten or more bases. Then, we trimmed low-quality bases using an in-house program. Trimmed sequences were preserved only if they were at least 30 bases long. We mapped the reads using mtDNA sequences of four close relatives of BLT (Callithrix jacchus; Callithrix pygmaea, here named as Cebuella; Saguinus oedipus; and Leontopithecus rosalia) as references (see Table S1) and the assembler Velvet 1.2.10 (Zerbino and Birney 2008). To obtain a contiguous sequence, we implemented the De Brujin graph approach, using long k-mers. More details for the complete mitogenome assembly are described in the Supplemental Material (see Appendix S1).

NUMTs and heteroplasmies searching
We mapped the fastq reads to our consensus sequence and generated pileup files using SAMtools (Li et al. 2009;Li 2011). For almost 600 positions, the consensus base was seen less than 99% of the time. We then filtered out each read (and its partner if it was a read pair) with edit distance of two or more from the consensus sequence. After this filtering step, we only observed three positions in which the consensus was seen less than 99% of the time. We then filtered out each read (and its partner if it was a read pair) with soft-clipped bases. These bases are typical of the boundary between a NUMT (Nuclear Mitochondrial pseudogene) and the rest of the chromosomal sequence. After filtering these reads, and considering alternate bases seen at least three times, we observed the most frequent alternative base was present at a frequency of less than 0.25% (supported by 7 reads maximum), allowing us to assess the presence of heteroplasmies (see Appendix S1).

Mitogenome characterization and phylogenetic analyses
We performed an initial automatic annotation in the MITOS webserver (Bernt et al. 2013). Next, we conducted a more accurate annotation in the Bioedit software (Hall 1999(Hall , 2011, using both L. rosalia (NC_021952) and Homo sapiens (NC_012920) mitochondrial genomes as references.
We downloaded all complete mitogenomes available for 123 primate species, collected by June 2017, and also of three other mammal species using the NCBI's taxonomy browser (https://www.ncbi.nlm.nih.gov/ genome) (see Table S1) and aligned them using the MAFFT program (Katoh and Standley 2013). We used the parameter Translator X (Abascal et al. 2010) for protein coding regions and concatenated them. We conducted Maximum Likelihood (ML) analyses using RAxML (Stamatakis 2014) and implemented the GTR+G model for the partition identified in the Partition Finder (Lanfear et al. 2012).

Data availability
The complete mitochondrial genome sequence described here is available at GenBank (https://www.ncbi.nlm.nih.gov/genbank/) under accession number MG933868. Supplemental Material was uploaded on Figshare data repository (https://figshare.com/s/5d20d00529afaf60f390). Figure S1 contains the quartiles for the distribution of fragment lengths for the mitochondrial genome assembly; Figure S2 contains the llustration for the tRNA mitochondrial; Figure S3 contains the full phylogenetic tree, using Maximum Likelihood for 124 primates and three outgroup species; Table S1 contains the list of the GenBank accession numbers for the complete mitochondrial genome sequences previously published by different authors and used in our phylogenetic analysis. Details of the methodology employed to perform the mitogenome assembly, and NUMTs and heteroplasmy searching are described in Appendix S1. Supplemental material available at Figshare: https://figshare.com/s/5d20d00529afaf60f390.

Mitogenome organization and nucleotide composition
The mitogenome of L. chrysopygus was assembled, with 3000X coverage, as a single circular molecule of 16618 bp (Figure 1), which is comparable to other Callitrichidae mitochondrial genomes, including the 16,499 bp of C. jacchus and 16,872 bp of L. rosalia (Finstermeier et al. 2013;Malukiewicz et al. 2017). We do not report any heteroplasmy after the several filtering steps (see Figure S1). We annotated 37 genes, including 13 protein coding, 22 tRNA, and 2 rRNA genes ( Table 1).
The D-loop region presents a total length of 1192 bp, and contains an STR region (TA) 14 between the nucleotides 957 and 984. Seventeen inter-genic spacers were found to have a total length of 75 bp, ranging from 1 to 35 bp, with the longest located between tRNA-Asn and tRNA-Cys. Gene-overlap is observed between fifteen contiguous genes, by a total of 76 bases ranging from 1 to 46 bp. Overlapping genes for tRNA-Ile and tRNA-Gln and for ND5 and ND6 are encoded in opposite strands. The composition of the L. chrysopygus mtDNA is biased toward adenine and thymine. The proportion of A+T content is 62.65% for protein-coding genes, 65.64% for tRNAs, 60.54% for rRNAs, and 65.52% for the D-loop region. The protein-coding genes have almost equal amounts of A and T; however, they are GC-skewed. The A+T content increases and the GC-skew decreases with codon position. tRNAs preferably contain A and G, while rRNAs have a greater fraction of A and C (Table 2).
Although characterization of mitochondrial genomes has sustainedly increased in recent years, the number of complete mitogenomes for primate species is minuscule in light of the extreme importance and high species diversity of this group. This is especially true when we consider the remarkable utility of mitochondrial data in resolving phylogenetic relationships among taxa, including those with recent divergence time, and detecting evolutionary events involving gene duplication, loss, and rearrangements. Comparative analyses of complete mitogenomes can also be successfully used to provide insights into adaptive processes (Finstermeier et al. 2013;Oceguera-Figueroa et al. 2016;Wang et al. 2016). Despite this, there are currently only 65 complete mitochondrial genomes described for catarrhines, and 24 for plat-yrrhines. Of those, eight are from the Callitrichidae family and one from the genus Leontopithecus.
Despite this, taxonomic studies using morphological, reproductive, and molecular data have indicated that Callimico is a sister group of marmosets (Barroso et al. 1997;Harris et al. 2014;Buckner et al. 2015;Schneider and Sampaio 2015).
In regards to the Calltrichidae family, our phylogenetic tree links Callithrix to Cebuella and places Callimico as the sister group to Callithrix/Cebuella, supported by high bootstrap values (100 and 97, respectively). Saguinus appears as the basal genus among the callitrichines, as is the case in other studies previously performed in Callitrichidae (Schneider 2000;Opazo et al. 2006;Wildman et al. 2009;Springer et al. 2012;Menezes et al. 2013;Rylands et al. 2016;Malukiewicz et al. 2017). However, our data did not support the characterization of Saguinus and Leontopithecus as sister groups ( Figure 2).
Previous studies using nDNA and mtDNA sequences have described Saguinus and Leontopithecus as non-sister groups. Wildman et al. (2009) obtained a well-supported tree with non-coding genomic regions, which exhibits the same relationship found by Opazo et al. (2006), using seven nuclear genes. Springer et al. (2012) reported similar results for Saguinus, Leontopithecus, Callimico, and Callithrix when they performed a concatenated phylogenetic analysis using both mtDNA and nDNA data. In a recent study based on nuclear data produced by Perelman et al. (2011), Rylands et al. (2016) also report Saguinus as a non-sister group of Leontopithecus, and Callithrix as more closely related to Cebuella, as we also found in this study. Callimico and Mico form a clade that is linked to that of Callithrix/Cebuella. Although we had not included Mico in our phylogenetic analysis, due to the absence of a complete mitogenome in this genus, our tree is more congruent with these arrangements than with the n complete mitogenome phylogenetic analyses performed by Finstermeier et al. (2013) and Malukiewicz et al. (2017), which place Leontopithecus as a sister group of Saguinus. In sum, our phylogenetic tree of the Callitrichidae family results in a well-supported monophyletic group. Nonetheless, the internal phylogeny does not support the Leontopithecus genera as a sister group of Saguinus.

Conclusions
In this study, we have successfully assembled (with high coverage) the whole mitochondrial genome of L. chrysopygus, and have obtained a well-resolved phylogeny for primates based on all the protein-coding mitochondrial genes. These data decisively contribute to our knowledge of the evolutionary relationships within Callitrichidae and can be useful in further understanding the phylogenetic and evolutionary relationships within Callitrichidae and higher taxa. Considering that Leontopithecus is a rare endangered genus, understanding its phylogenetic relationships within Callitrichidae can also be beneficial to the conservation of these animals in cases where management decisions depend upon robust phylogeny.