Independent Occurrences of Multiple Repeats in the Control Region of Mitochondrial DNA of White-Tailed Deer

Deer in the genera Mazama and Odocoileus generally have two copies of a 75–base-pair (bp) repeat in the left domain of the control region of the mitochondrial DNA (mtDNA). Phylogenetic analyses further suggest an ancient origin for the duplication supporting a previously stated contention that this event occurred before the separation of Mazama and Odocoileus . However, white-tailed deer ( Odocoileus virginianus ) had three or four copies of a 75-bp repeat in the control region of their mtDNA in 7.8% of the individuals analyzed, and all of these animals were from the coastal plain of the southeastern United States. When copy 3 is present, it is very similar in sequence to copy 2, but variation suggests that copy 3 probably evolved multiple times from copy 2. The pattern of phylogenetic clustering of the haplotypes from across the coastal plain also suggests that phenotypes with three or four copies of the repeat have originated multiple times. The 44 observed haplotypes showed strong spatial subdivision across the area with subpopulations frequently showing complete shifts in haplotype frequencies from others taken from nearby areas.

Survival and evolution of genes in space and time is the central question of phylogeography (Avise 2000), and population dynamics has important effects on fluctuations of genetic landscapes (Rhodes et al. 1996). While evolution of genes at the macrogeographic scale is well understood, gene dynamics at a finer scale are not so well studied, especially for large, highly mobile wildlife. Use of DNA markers has simplified the study of such spatial genetic patterns, but the choice of an appropriate genetic system depends on the scale of the study. In general, the finer the geographic scale the more variable the marker must be to detect the influence of local population dynamics on spatial genetic patterns (Avise 1994).
The control region, or D-loop, of mitochondrial DNA (mtDNA) is highly variable in mammals. It is noncoding but important for the initiation of replication of the molecule (Clayton 1992;Douzery and Randi 1997;Wilkinson et al. 1997). The control region is divided into three domains, a central conserved region flanked by left and right domains, which tend to be variable (Douzery and Randi 1997). Both the left and right domains can have variable numbers of tandem repeats (VNTRs). Some mammals, such as bats (Wilkinson and Chapman 1991), shrews (Stewart and Baker 1994), cats (Lopez et al. 1996), and sheep (Wood and Phua 1996), have long VNTRs as a common feature in the left domain. In the Cervidae, only Cervus nippon and species of Mazama and Odocoileus show a repeat in the left domain, specifically two copies of a 75-base-pair (bp) sequence in an area known as the RS2 region (Douzery and Randi 1997) ( Figure 1A). Because C. nippon clearly falls in a subfamily different from that for Mazama and Odocoileus, Douzery and Randi (1997) concluded that the 75-bp repeat evolved independently in each subfamilial lineage. They also suggest that the repeat in RS2 had an ancient origin. More recently, some white-tailed deer (Odocoileus virginianus) have been shown to have three or four copies of the repeat, suggesting that further evolution in the RS2 region is occuring at the present time (Purdue et al. 2000).
Even though the D-loop is noncoding, increased sequence length, such as that introduced by multiple copies of the repeat in the RS2 region, may slow the rate of replication of mtDNA (Wilkinson et al. 1997). Within the variable regions, variable number of tandem repeat sequences that have been described for numerous species of mammals, including deer for more then a decade (Hoelzel 1993;Hoelzel et al. 1994;Mahmut et al. 2002) may impact survival (Wilkinson et al. 1997). Selection may act against those individuals with multiple copies of the repeat in the RS2 region and constrain the evolutionary pathways taken by the D-loop. The mtDNA genes are important in cell metabolism and the rate of mtDNA transcription. The southeastern coastal plain of the United States may be an ideal region to look for such phenomena in local whitetail populations because the region has not been subject to large-scale migrations of species caused by the movement of glaciers (Hewitt 2001) and has been relatively free from the reintroductions of deer that have occurred during the last century (Blackard 1971).
In addition, there are no absolute barriers to gene flow in the region, although females are very philopatric (Purdue et al. 2000). We have observed deer swimming across major rivers in the area and in the intercoastal waterway. Male dispersal to the barrier islands off the coast must be relatively common because the populations on some of these islands have normal levels of genetic variability for allozymes (Hillestad 1984;Rowland 1989). Whitetails are among the most genetically variable mammals (Breshears et al. 1988) and also show strong spatial and temporal heterogeneity in gene frequencies (Chesser et al. 1982;Smith et al. 1990).
Our primary objective was to document the spatial pattern for different numbers of copies of the repeat in a large sample of white-tailed deer from the southeastern coastal plain of the United States. To put this pattern into a larger geographical context, we have included whitetails from other regions of North America. We also sequenced the control region for a subsample of deer, especially from localities where individuals display varying numbers of copies of the repeat. This allowed us to estimate the order of events that led to the various copies. Finally, we tested the idea that whitetails show fine-scale spatial structuring over the southeastern coastal plain as suggested by Purdue et al. (2000) and compared this degree of structuring to that occurring on a larger geographical scale.

Materials and Methods
Muscle samples were collected from hunter-killed whitetailed deer from 30 localities in the United States and Guatemala (Table 1). Total DNA from each sample was extracted with a high salt-alcohol precipitation protocol (Medrano et al. 1990). The left domain of the control region was amplified by the polymerase chain reaction (PCR) using forward primer (LGL283: 5#-TACACTGGTCTTGTAA-ACC-3#) set in the flanking tRNA Pro gene. The reverse primer (ISM015: 5#-ATGGCCCTGTAGAAAGAAC-3#)  Douzery and Randi (1997)]. The region spanning the TAS3 and TAS4 is replicated in tandem repeats in O. virginianus. All individuals have two copies of RS2 (1 and 2), but many could have up to four copies (Table 1). (B) Sequence divergence between the consensus sequence of O. virginianus and the control region of Mazama [from Douzery and Randi (1997)]. Divergence is calculated in 100-bp sliding windows with 1-bp steps using VISTA (Mayor et al. 2000). RS2 sequences are conserved and are followed by a transversional hot spot between positions 355 and 360 as indicated by an arrow.
was located in the central conserved region of the control region. PCR was accomplished using Taq polymerase. Generally, 50 or 100 ml of PCR product was produced by the following constituents and parameters: 0.5 ll DNA template added to 1 Â PCR buffer (supplied by manufacturer); 0.2 lM of each primer; 1 mM each of the four deoxynucleotide triphosphates (dNTPs): dATP, dCTP, dGTP, and dTTP; and 1.25 units of Taq. Usually, 32-34 thermal cycles (denature, 95°C for 60 s; anneal, 54°C for 40 s; and extend, 70°C for 60 s, with 3 s added per cycle) were necessary to complete PCR. For success with some samples, slight alterations were necessary in the amount of template, annealing temperature, or number of cycles. The RS2 region was incorporated in the resulting PCR product, and the number of copies of the 75-bp repeat was easily determined by comparing the position of the product in a 1% agarose electrophoretic gel to that of a standard.
Genetic variants (haplotypes) among the PCR samples were identified by the restriction fragment length polymorphism (RFLP) technique. Haplotypes were determined from the positions of bands in 2.5% Metaphor (FMC, Corporation, Rockland, ME) eletrophoretic gels after independent digestions with the restriction enzymes AluI, AseI, BfaI, DdeI, and RsaI (Purdue et al. 2000).
A representative of each RFLP haplotype with more than two copies of the 75-bp repeat was sequenced for the entire length of the control region. Also sequenced were representatives of all haplotypes found with the more  b RFLP haplotypes were determined using protocols described in Purdue et al. (2000). c Abbreviations used elsewhere in the paper.

Purdue et al. mtDNA Repeats in Deer
than two copies-variants at the locality as well as selected haplotypes from a variety of other populations. A total of 33 individuals were sequenced. For sequencing, forward primer LGL283 was paired with reverse primer CST-39 (5#-GGGTCGGAAGGCTGGGACCAAACC-3#) set in the flanking tRNA Phe gene (Strobeck 1992). Usually, several internal primers were needed to produce a complete sequence (forward primers: ISM001, 5#-GCCATATTACA-TTCTTTAATACC-3#; ISM005, 5#-GTATCCCGTCCC-YTAGATCACCAC-3#; and ISM016, 5#-CATCTCGAT-GGACTAATGAC-3# and reverse primers: ISM002UGA, 5#-GATTTGACTTAATGTGCTATG-3# and IMS017, 5#-CCAAACCTATGTGTTTATGG-3#). Sequencing was accomplished on automated sequencers in the Molecular Genetics Instrumentation Facility of The University of Georgia in Athens. Sequence alignments were accomplished by eye using the ESEE (The Eyeball Sequence Editor ver 1.09d) Sequence Editor (Cabot 1989). Alignments were accomplished easily, except for a long string of guanines (Gs) in the right domain that tended to disrupt the sequencing reaction; the segment of Gs was not included in subsequent analyses. The sequence numbering system used by Douzery and Randi (1997) was adopted in our analysis. Phylogenetic analyses were accomplished with the beta version of PAUP* (Swofford 1998) using maximum likelihood (ML). For each analysis, default parameters were assumed: equal rates for all sites, empirical base frequencies, and the ratio of two for transitions versus transversions. Particulars of each analysis are outlined in text and figure captions.

Results
We examined the length of the left domain of the mtDNA for 1,135 deer (Table 1). RFLP analysis revealed much sequence variation with 44 haplotypes identified. All deer examined had at least two copies of the 75-bp repeat in the RS2 region. However, some individuals (7.8%) displayed haplotypes with one or two additional copies of the repeat. PCR product for a few individuals was double banded, which suggested that heteroplasmy might be present. However, slight alterations of one or more parameters of the PCR nearly always produced a single-banded product. The individuals with additional copies of the 75-bp repeat occurred in populations either along or near the Savannah River drainage or along the coast of southern Georgia (Figure 2). In five populations, haplotypes with three or four copies were the predominant variant present. No instances of added repeats were detected in haplotypes in deer from localities outside the southeastern United States (N 5 244). Gene trees based on the sequences of the control region were constructed using maximum parsimony, ML, and Neighbor-Joining algorithms (Swofford 1998). All analyses produced a common set of well-supported major clades (results not shown), but only the ML tree is illustrated here (Figure 3). One major branch, depicted in Figure 3 by dashed lines, contains haplotypes that occur in the southeastern United States, Illinois, and Guatemala. All haplotypes in this lineage have only two copies of the repeat in their sequences.
Another major clade, shown by solid lines in Figure 3, consists of variants that have two, three, or four copies of the repeat in the RS2 region. Within this lineage, haplotypes with three or four copies appear in at least four separate, well-supported clades (Figure 3). Within these clades, most haplotypes with extra copies of the repeat are paired with sister haplotypes that contain only two copies. Two haplotypes displayed four copies of the repeat, and both types occurred in the same clade along with variants that display two or three copies of the repeat. A geographic trend is apparent in the gene tree. Clade D (Figure 3) is composed of haplotypes found in populations CKL, HI, POB, SPR, and WC (Table 1), which are located along the southern portion of the Savannah River or on the lower coastal plain of a nearby river drainage to the north (Figure 2). The geographic distribution of Clade B, a star phylogeny of unresolved relationships, overlaps that of Clade D but is offset north along the Savannah River. Haplotypes included in Clade C are found in southern Georgia. Nodes within Clade C are well supported with three lineages, two of which are found only on barrier islands [Cumberland (CUM) and Jekyll (JKL) islands in Figure 2]. Haplotypes from a mainland locality, PPA, represent the third lineage in Clade C. Surprisingly, haplotype SRS-d, which is found on the Savannah River Site about 240 km north of PPA, is included in the lineage. Finally, Clade A, which contains just two haplotypes, appears restricted to HI.
Copies of the repeat in the RS2 region are not identical. For convenience, repeats are numbered from the 3# to the 5# end of the control region, which presumably represents the order of evolution (Douzery and Randi 1997). For three clades in Figure 3 that included haplotypes with more than two copies, we considered each copy of a repeat as an independent operational taxonomic unit and examined their phylogenetic relationships (Figure 3, panels B, C, and D). For Clade B, the first copies of all haplotypes form an unresolved group with SRS-c, copy 2, which is separated with strong support from a loose group of second and third copies. A similar pattern exists for clades C and D, where first copies group together separate from another supported cluster of all the second, third, and fourth (Clade D only) copies.

Discussion
Our data suggest that white-tailed deer with three or more copies of the 75-bp repeat in the RS2 region occur only in one major lineage of the species, restricted among our populations to the southeastern United States. Deer were sampled from a large geographic area, ranging from South Dakota south to Guatemala and southeast to South Carolina and Georgia. Only in South Carolina and Georgia did we find haplotypes with more than two copies of the repeat. Other population-based analyses have examined mtDNA variability in white-tailed deer (Carr et al. 1986;Cronin et al. 1991;Ellsworth et al. 1994), but the techniques used in these studies were not sensitive enough to detect the length variation observed in the control region. Even at a smaller geographic scale, a spatial pattern in the occurrence of more than two copies is also evident. In the southeastern United States, longer-than-normal haplotypes appear along and near the Savannah River and again in southern Georgia, but the two areas are interrupted in northern coastal Georgia by a suite of haplotypes with the customary two copies of the repeat. Because female deer are not strong dispersers (Purdue et al. 2000) and because mtDNA is transmitted only through the female line, the geographic distribution of haplotypes with more than two copies of the repeat suggests that independent events have been involved in the evolution of the molecule.
The phylogenetic analysis of the control region supports strongly the notion that extra copies of the RS2 repeat have evolved multiple times in the major lineage of haplotypes that occur in the southeastern United States. Extirpation and subsequent transplanting efforts complicate the picture, but the counties along the Savannah River in the upper and lower coastal plain and the lower coastal plains of South Carolina and Georgia only rarely received relocated deer (Blackard 1971). In contrast, virtually all of deer in Illinois derive from restocking efforts (Pietsch 1954). Northern stocks provided most of the introduced animals in Illinois (Pietsch 1954) and parts of Georgia (Blackard 1971), but apparently these lineages only carry two copies of the repeat. The sequence of the deer from Illinois (RAN-a in Figure 3) is divergent from those representing southeastern deer (SRS-e, which is similar to RAN-a, is thought to stem from introduced deer that probably spread from Georgia to South Carolina). In Illinois, 11 RFLP haplotypes have been identified (Purdue unpublished), and based on the similarities in RFLP banding patterns, all would appear to be similar to RAN-a. Unlike the lineage of deer in the southeast, there is no instance known of haplotypes with more than two copies of the repeat  Table 1. Solid circles represent populations that contain haplotypes that contain only two copies of RS2. Open circles are used to represent populations with at least one haplotype containing more than two repeats. Letters within circles indicate clades defined in Figure 3. Shaded area indicates the recognized extent of the North American range of white-tailed deer Odocoileus virginianus (Wilson and Ruff 1999). occurring in animals from Illinois or elsewhere, even though many mtDNA variants have been identified. Both Neigel and Avise (1993) and Templeton et al. (1995) note the tendency for closely related mtDNA variants to be found in geographic proximity.
Four clades containing haplotypes with more than two copies of the repeat are identified in Figure 3. All have high levels of support (bootstrap values are 87 for Clade A, 85 for Clade B, 61 for Clade C, and 80 for Clade D), although in-ternal nodes within Clade C are better supported than the basal node (Figure 3). This may indicate that within Clade C, third copies of the repeat have evolved independently three times. Elaborating on the situation for Cumberland Island, haplotypes CUM-a and -b are unique to the island and form a well-differentiated branch on which one haplotype has two and the other three copies of the repeat. A similar situation is present on Jekyll Island, where the JKL-series haplotypes form a distinct clade with variants with two and Figure 3. Unrooted bootstrap consensus tree using ML based on 1,207 bp of the control region of the mtDNA in white-tailed deer. The tree was generated in beta version 4.0 PAUP* (Swofford 1998). Heuristic search and bootstrap options were exercised, the latter for 100 replicates. Each sequenced haplotype is identified by a two-or three-digit locality code given in Table 1, followed by a variant notation. All of the haplotypes on the dashed branches have no more than two repeats. In addition, three independent gene trees (panels B, C, and D) depict relationships between individual copies of the 75-bp repeat for three out of four clades defined (B, C, and D). Each individual copy of the repeat within each haplotype was considered an independent operational taxonomic unit and analyzed in a separate ML tree. The number of the copy shows their order as in Figure 1 and is represented by different symbols. Copy 1 (dark circles) always forms a separate group with high level of bootstrap support (68%-99%). The remaining clade (A) represents two sequences from a single location: Halls Island, Beauford county, GA (HI). three copies in the RS2 region. Also on the mainland, another well-supported clade includes haplotypes with two and three copies. The presence of three haplotypes with three copies of the repeat in Clade C is probably the result of convergent evolution.
Unlike that for Clade C, the structure within Clade B is totally unresolved. Within the clade, there are six haplotypes, half of which has two copies of the repeat and the other half has three. Although no internal structure was decipherable, the overall distinctness of Clade B is well supported and thus likely represents at least one other instance of the independent evolution of a third copy of the repeat. Although fewer haplotypes are involved, a similar situation and conclusion exist for Clade A.
Clade D is more complex, with the presence of five haplotypes (Figure 3). One of the haplotypes carries two copies of the repeat, whereas two haplotypes display three copies and the remaining two haplotypes contain four copies. The branching pattern within the clade is only supported moderately. Also, the haplotypes within the clade are distributed over a fairly wide geographic area along drainages of the middle and lower Savannah River and the northerly adjacent Broad River. However, no geographic patterning is evident for Clade D, and resolution is lacking to evaluate the particulars of evolution within the clade. Probably the events that led to formation of haplotypes with extra copies of the repeat within the clade occurred convergent with the evolution of variants with added copies elsewhere in the southeast.
Among the enumerated clades, there is a continuum of internal support: Clade C has well-defined internal nodes, whereas Clade B has none and Clade D is somewhere in between. This feature may reflect the timing of the formation of the third and fourth copies of the repeat. The unresolved Clade B may stem from relatively recent events, especially when compared to those that produced the well-structured Clade C. Clade D, with its intermediate level of internal support, may have evolved after Clade C but before Clade B. The presumed timing of the mutational events supports the contention that some copies of the repeat have evolved independently.
The phylogenetic comparison of copies in separate panels represented next to the corresponding plates of Figure 3 addresses the order of events for the evolution of the RS2 region. Within each of the three clades analyzed, the first copies form a cluster distinctly different from a second cluster that comprised second, third, and fourth (Clade D only) copies (represented by the darker circles in panels B, C, and D of Figure 3). Only in Clade B does a second copy (SRS-c, copy 2) appear in a first copy cluster, but even then it is quite different from any of the copy 1 sequences in the cluster (bootstrap support is at 68, Figure 3B). Douzery and Randi (1997) concluded that the second copy of the repeat in the RS2 region for Mazama and Odocoileus evolved early in the history of the Odocoileini. Our data support this view, given that all the deer we examined carried at least two copies of the repeat and the differentiation between sequences of the first and second copies was always distinct. However, copies 3 and 4, when they occur, seem to represent more recent events. Our data suggest that the third copy derives from the second copy, given the similarities of the sequences. The origin of a fourth copy is unresolved from our data, but the fourth probably stems from the third.
Repeats are probably formed by intra-or intermolecular recombination (Rand and Harrison 1989), strand slippage, or competitive displacement during replication (Levinson and Gutman 1987;Wilkinson et al. 1997). Strong secondary structure may stabilize the repeat region (Wilkinson et al. 1997), which may account for the apparent antiquity of the RS2 repeats in cervids (Douzery and Randi 1997). All the repeats show high levels of sequence conservation as compared to Mazama sp., its likely closest relative of Odocileus that also has multiple RS2 copies ( Figure 1B). On the other hand, these RS2 repeats are characterized by the presence of a 22bp-long sequence called RS-XXII, which has the ability to form a single-strand hairpin (Douzery and Randi 1997). This hairpin may indicate the region of DNA slippage during replication, which may be the origin of the instability RS2 repeats in white-tailed deer. Because more than two copies of the repeat can occur within white-tailed deer (Purdue et al. 2000) and domestic sheep (Wood and Phua 1996), the addition of new copies of the repeat may not be rare. New combinations with additional repeats are found in a number of different areas on the southeastern coastal plain. This situation may indicate one way in which new DNA variants can become established during the evolution of a species. Wilkinson et al. (1997) suggest that the number of copies present in the left domain of the control region in mammals is the result of a balance between selection and mutation. They note further that there may be a fixed probability that a repeat in an array will fold and either be duplicated or deleted during replication. If that were the case, then length mutation rates would increase additively with the number of copies of the repeat. Deer, like many other mammals, have three conserved sequence elements (TAS, mt5, and mt6) in each copy of the RS2 repeat (Wilkinson et al. 1997). Apparently, the order of these elements is critical for the stable secondary structure needed for the binding of a regulatory protein crucial for replication. Douzery and Randi (1997) also note that a robust secondary structure could stabilize the repeat region in Mazama and Odocoileus. While it is unclear how the secondary structure of the RS2 region and regulatory proteins work together during replication of the mtDNA, the added copies of the repeat could provide redundancy in the event when binding is hampered by a point mutation, which apparently can be a common event (Wilkinson et al. 1997). However, selection may favor short sequences in the RS2 region because overall replication of shorter mtDNA would be faster (Wilkinson et al. 1997). Consequently, selection for rapid replication of the mtDNA genome would tend to keep the number of copies of the RS2 repeat low.
The major lineage of deer native to the southeastern United States apparently is more susceptible to the mutational events that produce additional copies of the 75-bp repeat. Alternatively, it may be under selection that favors additional redundancy and therefore greater retention of the added copies. If, however, the balance between redundancy and control region length is shifted toward more copies of the repeat, then selection must not be uniform across our study area or there is also a stochastic component involved in the appearance of the additional repeats. Variants with more than two copies of the repeat are distributed in at least two discreet geographic pockets, which may imply that the direction of selection has a spatial component. If this is the case, it is unlikely that such conditions only occur on the coastal plains of South Carolina and Georgia. The situation may be more apparent than real due to our intensive sampling effort in the southeastern United States.
The spatial pattern of occurrence for the additional repeats may be due to stochastic population processes. The characteristics of the base-pair sequences in the central region of the D-loop in the major lineage of southeastern coastal plain deer may allow the establishment of an occasional addition of another copy of the repeat. Once present, the copy remains until random drift results eventually in either the elimination or the fixation of the variant in a population. Among our study localities, we found diversity in the frequency of individuals with more than two copies of the repeat. In three populations (HI, POB, and WC), haplotypes with more than two copies were dominant and conceivably could be drifting toward fixation. Selection and drift interact, and their interaction could be the most important cause for the observed spatial pattern (Wright 1978). A shifting balance between selection and drift could be caused by dynamics within the metapopultion of white-tailed deer on the coastal plain (Hanski 1996). Allozyme studies have also documented strong spatial heterogeneity in allele frequencies for a variety of loci (Chesser et al. 1982;Scribner et al. 1997;Smith et al. 1990). These results along with the high degree of spatial divergence in gene frequencies observed in this study and that of Purdue et al. (2000) for mtDNA suggest that a metapopulation structure is characteristic for this species on the southeastern coastal plain. However, this structure may be modified where the metapopulation has been heavily impacted by reintroductions. This paper gives an example of how the repeat in the RS2 region could have evolved. Many attempts at establishing the second copy of this repeat may have been attempted before a more stable combination became established. Because of female philopatry and the maternally inherited nature of mtDNA, drift likely occurred in local deer populations within the overall metapopulation. Many of the combinations were probably lost due to chance before fixation occurred. Whether this will occur for the third and fourth copies is questionable. There are undoubtedly limits to the amount of extra material that can be added to the control region because of its effects on the speed of replication for the mtDNA. This scenario is very similar to the way in which Wright (1978) envisioned evolution occurring in his shifting balance model.