Genetic diversity and population structure of Arabidopsis thaliana along an altitudinal gradient

In this study Tyagi et al. determined the genetic diversity and population structure of the model plant, Arabidopsis thaliana. These populations inhabit west Himalaya, an undersampled region. Using 19 genomic SSR and 11 chloroplast markers they determined that these populations are highly structured and genetically distinct from the rest of the world populations. They also observed that the populations were structured at the altitudinal level. Additionally their molecular clock analysis showed that these populations are not recent introductions and have inhabited the Himalayan region for about 130,000 years.


Introduction
An understanding of the amount and distribution of genetic variability and structure in natural populations is fundamental in ecological genetic studies. The amount of genetic variability found within and among populations will both influence the response to, and be influenced by, selective pressures and evolutionary processes such as gene flow and genetic drift (Hamrick and Godt 1996;Futuyama 1998), whereas the studies on population structure can be useful for the understanding of population dynamics, historical events, etc., which in turn are greatly influenced by environmental conditions. Variations in environmental conditions may lead to phenotypic (Gugerli 1998;Till-Bottraud and Gaudeul 2002;Pluess and Stöcklin 2005;Haider et al. 2012) as well as genetic differentiation among populations (Ohsawa and Ide 2008).
Arabidopsis thaliana, commonly known as rock cress, is an annual weed having a worldwide geographical distribution. It has been extensively employed as a model plant not only for plant molecular genetics but also for ecological and evolutionary studies (Mitchell-Olds 2001;Pigliucci 2002;Weigel 2012). Thus, it is important to quantify the genetic diversity and the distribution pattern across its geographical range to understand the demographic and ecological effects (Mitchell-Olds and Schmitt 2006). There are several studies on natural genetic variation of A. thaliana on a regional scale in the regions of its native distribution such as Europe (Le Corre 2005;Stenøien et al. 2005;Bakker et al. 2006;Pico et al. 2008), Central Asia (Schmid et al. 2006), China (He et al. 2007) and Africa (Brennan et al. 2014), as well as in regions of presumed recent introduction and expansion such as Japan (Todokoro et al. 1995) and North America (Jørgensen and Mauricio 2004;Bakker et al. 2006). These studies have provided an insight into the genetic diversity and population structure of A. thaliana at a regional scale.
Being a weed, A. thaliana is frequently found in humandisturbed areas such as roadsides, agricultural fields, mountain slopes, etc. Seeds of A. thaliana are tiny and prone to transportation by humans. This artificial and unnoticed seed dispersal can disturb its phylogeographic structure. While earlier studies indicated that the present worldwide distribution of A. thaliana populations is unstructured or highly disturbed (King et al. 1993;Miyashita et al. 1999;Vander Zwan et al. 2000;Maloof et al. 2001), more recent studies have observed global population structure (Nordborg et al. 2005;Beck et al. 2008) in A. thaliana. Due to its highly self-fertilizing nature, few heritable variations exist within the populations of A. thaliana, and a majority of the genetic variations have been found among the populations (Hanfstingl et al. 1994;Todokoro et al. 1995;Bergelson et al. 1998;Breyne et al. 1999;Miyashita et al. 1999). There are also contradictory reports on correlation between geographical origin and the level of genetic variability in A. thaliana populations. Some studies did not find a correlation between geographical origin and genetic variability (Todokoro et al. 1995;Bergelson et al. 1998;Miyashita et al. 1999;Provan and Campanella 2003;Bakker et al. 2006;Ostrowski et al. 2006), while others have indicated a significant correlation (Sharbel et al. 2000;Schmuths et al. 2004;Schmid et al. 2006;Beck et al. 2008).
The origin and the biogeography of A. thaliana has been extensively described in several studies (Sharbel et al. 2000;Symonds and Lloyd 2003;Schmuths et al. 2004;Nordborg et al. 2005;Bakker et al. 2006;Jakobsson et al. 2006;Ostrowski et al. 2006;Schmid et al. 2006;Beck et al. 2008). However, none of these studies, including the RegMap panel study proposed by Horton et al. (2012), included the populations from India. This partial sampling has consistently limited the insight into A. thaliana population genetics, and a call for expanded sampling has been echoed by many (Hoffmann 2002;Bakker et al. 2006;Mitchell-Olds and Schmitt 2006;Ostrowski et al. 2006).
In India, A. thaliana is primarily found in the western Himalayas (Hooker 1875;Hoffmann 2002) with its specimen preserved in some of the herbaria in India. This region is one of the biodiversity hot spots and includes several climatic zones (Joshi and Joshi 2011). However, no detailed studies on this model species have been conducted from this region. There are only two ecotypes in Arabidopsis stock Centre (ABRC), which have been reported from this region, viz. Kas-1 and Kas-2, collected from Kashmir region of India, but concerns have been raised about the origin of Kas-1 accession, as it stands aloof from the rest of Asian accessions in phylogenetic studies (Yin et al. 2010). Additionally, its genomic signatures show higher affinity towards European accessions, and it is presumed that Kas-1 was transported from Europe to India sometime in the 19th century (Vander Zwan et al. 2000). In a recent report, 77 accessions from Asia, Europe, North Africa, North America and Atlantic islands using 11 chloroplast (CP) markers were studied (Yin et al. 2010). Their study suggested that the extant populations along the Yangtze River might have dispersed there via the west Himalaya or Kunlun mountains during postglacial expansion and underwent rapid expansion around 90 000 years ago. Conversely, they have further suggested that Yangtze River populations might be the source of present Himalayan populations. Inclusion of populations from west Himalaya might shed light in this direction.
In the present study, our goals were (i) to estimate the genetic diversity of A. thaliana populations from the west Himalayan region, which had never before been analysed, (ii) to establish the relationship of A. thaliana of west Himalaya with respect to other accessions of the world and (iii) to determine the probable origin of west Himalayan populations and their expansion. We attempt to provide here the link between populations of China and west Himalaya and further extend the demographic model proposed by Sharbel et al. (2000), where the authors suggested that the central Asian and Peninsula region might be the source of European colonization.
Our sample sets include 48 accessions from six populations and were analysed using nuclear microsatellite (MS) and CP loci. These populations were further compared with worldwide accessions reported by Yin et al. (2010). Recently, our group has characterized four of these populations using morphological traits and has shown that some of the morphological traits were highly correlated with their corresponding altitudes (Singh et al. 2015). Inclusion of Indian populations of A. thaliana will greatly enhance our understanding of global population structure and evolution of this model species.

Collection of accessions
A total of 48 A. thaliana accessions were collected from six different populations located at Dehradun [Deh, 700 m above mean sea level (a.m.s.l.)] designated as low altitude; Dhapa (Dha, 1800 m a.m.s.l.) and Munsiyari (Mun, 2000 m a.m.s.l.), designated as medium altitude and Sangla (San, 2600 m a.m.s.l.), Koksar (Kok, 3400 m a.m.s.l.) and Chitkul (Chi, 3400 m a.m.s.l.), designated as high-altitude localities of the west Himalayan region (Fig. 1). No specific permissions were required for sample collections from these locations as these do not come under any protected area. Further, A. thaliana is not an endangered or protected species. Each population was geo-referenced for their latitude, longitude and altitude with a global positioning system receiver (Garmin, USA). The minimum and maximum distances between populations were 4 km (Mun and Dha) and 384 km (Kok and Mun), respectively. The accessions were named by the first three letters of the name of the nearest location or village from where samples were collected, followed by a numeric indicating the individual accession number. The habitats of the populations varied from humandisturbed lawn to undisturbed mountainous river valley and slopes [see Supporting Information- Table S1].

DNA isolation, sequencing and genotyping
Genomic DNA was isolated from field-collected leaf samples using the Qiagen TM DNeasy Plant mini kit (USA) following the manufacturer's protocol. The quality of the DNA was checked on 0.8 % agarose gel and quantified using Nanodrop spectrophotometer (Thermofisher, USA). For amplification of CP loci, the same set of primer pairs was used as described in Yin et al. (2010). Polymerase chain reaction (PCR) was performed in 20 mL reaction volume containing 10 mL of 2× Amplitaq Gold 360 Master Mix (Applied Biosystems, USA), 40 -50 ng template DNA and 10 pmol of each of the forward and reverse primers. The thermocycler programme was an initial denaturation at 94 8C for 4 min followed by 35 cycles of denaturation at 94 8C for 30 s, annealing at 50 -59 8C for 30 s (depending on T m of primer pairs) and extension at 72 8C for 1 min followed by a final extension step at 72 8C for 5 min. The amplified products were cleaned up using the Qiaquick PCR purification kit (Qiagen, USA). Sequencing PCR was performed using ABI Prism BigDye Terminator v3.1 cycle sequencing Kits (Applied Biosystems) in 10 mL reaction volume containing 2 mL cleaned PCR product as template, 1 mL of either forward or reverse primer, 0.5 mL of RRmix and 1.75 mL of 5× dilution buffer. The products were ethanol precipitated and denatured with Hi-Di formamide and sequenced on ABI 3730XL Automated DNA Sequencer (Applied Biosystems). All the samples were genotyped for 19 previously described MS loci (Bell and Ecker 1994) through fluorescent labelling PCR using M13 universal primer following Schuelke (2000). An 18-bp tail (5 ′ -TGTAAAACGACGGCC AGT-3 ′ ) complementary to M13 sequence was added to each forward MS primer. Four different fluorescent dyes were used to label the M13 primer: 6-FAM (blue), NED (yellow), PET (red) and VIC (green). Each MS marker was optimized for annealing temperature. Each 20 mL reaction contained 2.5 mm MgCl 2 , 0.2 mm deoxyribonucleotides, 4 pmol fluorescently labelled M13 primer, 4 pmol of forward primer with M13 tail and 8 pmol of MS reverse primer, 2 mL Promega buffer and 0.5 U of Promega Taq polymerase. The PCR reactions were performed in the following conditions: 94 8C (1 min), then 30 cycles at 94 8C (1 min)/specific annealing temperature (56 -60 8C) for each MS marker, (1 min)/72 8C (1 min), followed by 8 cycles at 94 8C (1 min)/53 8C (1 min)/72 8C (1 min) and a final extension at 72 8C for 10 min. The amplified products were run on a capillary-based 3730xl DNA Analyser (Applied Biosystems) and the products were precisely sized for major, comparable and conspicuous peaks using GeneMapper v4.0 (Applied Biosystems), using default quality indicators of GeneMapper. Details of the MS data are shown in Supporting Information- Table S2.

Sequence alignment
We sequenced 11 CP loci from 8 individual plants collected from each of the 6 locations (total 528 sequences).
Each sequence was visually inspected and matched with its corresponding electropherogram. Only the highquality sequences (Phred score .20) were considered for analysis. Besides these 528 sequences, we included the sequences of worldwide samples provided in for comparative analyses. The average sequence length of our samples was slightly shorter than the sequence length obtained by Yin et al. (2010). Therefore, the sequences were trimmed individually from both sides to avoid large gaps in the aligned data set. DNA sequences were aligned using MUSCLE (Edgar 2004) implemented in MEGA 5 (Tamura et al. 2011). After alignment, the sequences were edited 'by eye' to minimize any misalignments or erroneous introduction of indels. The sequences were concatenated and the resulting aligned length of the sequences was 6299 nucleotides.

Population data analysis
Three sets of population data were considered for analysis: (i) 48 accessions from six different populations from west Himalaya using 19 MS markers, (ii) 48 west Himalayan accessions using 11 CP loci and (iii) 48 west Himalayan accessions and 77 worldwide accessions using 11 CP loci from Yin et al. (2010). While the data sets (i) and (ii) were used for genetic analyses of only Indian populations, data set (iii) compared west Himalayan and the rest of the world populations. Population genetic parameters, viz. genetic diversity measures, such as the percentage of polymorphic loci (%PL), mean number of alleles (Na), mean number of effective alleles (Ne), mean allelic richness (Rs), mean private allelic richness (Rp), mean gene diversity (Hs), number of multilocus haplotypes (Nh), expected heterozygosity (He) and observed heterozygosity (Ho) were estimated for the three data sets using software programs GENALEX 6.5 (Peakall and Smouse 2012), FSTATv 2.9.3 (Goudet 1995) and HP-Rare v.1.0 (Kalinowski 2005). Allelic richness and private allelic richness were estimated to the smallest sample size (n ¼ 2) using HP-rarefaction tools to avoid sampling disparity.
Genetic differentiation among populations was estimated by analysis of molecular variance (AMOVA) using Arlequin v3.5 (Excoffier and Lischer 2010), which calculates F ST like statistics. Statistical significance of AMOVA was tested from 20 000 permutations. To determine whether the genetic distances between populations were determined by geographic distance or difference in their altitude, IBD analysis between populations was performed. To determine the IBD, partial Mantel test (PMT) were performed. This test allows two pairwise distance matrices to be compared, while controlling for the effect of a third. Partial Mantel test was performed between pairwise F ST and the corresponding paired population matrix of geographic distances while controlling for pairwise altitudinal distances. Alternatively PMTs were performed between pairwise F ST and pairwise altitudinal distances while controlling for geographic distances. Pairwise F ST between six populations were calculated using the Arlequin program. Geographic distances between location pairs were calculated using Geographic Distance Matrix Generator v1.2.3 software (Ersts 2011). Partial Mantel tests were performed with 999 permutations using the PASSaGE program (Rosenberg and Anderson 2011). To determine whether the within-population genetic diversity was correlated with altitudes, Pearson's correlation test was performed between %PL and corresponding altitudes of the sites.
The population structure and assignment of individual west Himalayan accessions to populations on the basis of MS genotype data were performed using STRUCTURE v2.3.3 (Pritchard et al. 2000) and principal coordinate analysis (PCoA). In STRUCTURE, model settings included diploid multilocus genotypes, correlated allele frequencies between populations, admixture model and running the algorithm with 200 000 burn-in MCMC iterations and 500 000 MCMC post burn-in repetitions for parameter estimation. To determine the appropriate value of K, STRUCTURE was run in 20 replicates with assumed K from 2 to 10. The K value where the posterior probability (ln P(D)) began to plateau was selected as the true K (Wang et al. 2013). When varied from 2 to 10, the ln P(D) began to plateau at K ¼ 6 [see Supporting Information- Fig. S1], which was also incidentally the number of our populations. To determine how the individuals of populations respond to increasing K, we plotted STRUCTURE results from K ¼ 2 to K ¼ 6. Principal coordinate analysis was performed using Nei's distance matrix as implemented in GENALEX 6.5, and the resulting eigenvalues were plotted using k-means clustering implemented in cluster package R v3.2 (http://www.R-project.org). Phylogenetic relationships among accessions were determined by the neighbour-joining (NJ) method using PAUP*4.0 (Swofford 2003). Visual observation showed the presence of an insertion, TTCTAAT, at position 245 in Himalayan and Yangtze River accessions, TTCTA at position 465 and TGGAT at position 1528 in the rest of the world accessions in the concatenated sequence data set. Therefore, we decided to include indels in the analysis. It has been suggested that inclusion of indels in phylogenetic analysis is a reasonable choice to avoid loss of important character states and has been used for CP sequences to resolve branching order, phylogenetic resolution and nodal branch support (Crayn and Quinn 2000;Ingvarsson et al. 2003;Egan and Crandall 2008). The indels were coded using the modified complex indel coding (MCIC) method (Mü ller 2006) using SeqState version 1.4.1 (Mü ller 2005). The NJ analysis was performed with 1000 bootstrap replicates.
Haplotype diversity in the combined sequence data was determined using DnaSP v5 (Rozas et al. 2003). Haplotype network was generated using TCS v1.21 (Clement et al. 2000) implemented in PopART (http://popart. otago.ac.nz). The haplotype with the highest out group weight as estimated by TCS was considered as the ancestral haplotype.
The more intensive sampling of west Himalayan populations can cause sampling bias when comparing with the rest of the world populations. Therefore, one accession per population was randomly selected for phylogenetic and haplotype analyses.
For MS loci, pairwise Nei's genetic distances were determined using GENALEX 6.5 and the matrix was used to construct majority rule consensus tree of 500 unrooted NJ trees using PHYLIP package (Felsenstein 2005).

Demographic expansion and estimation of divergence time
To test the neutral mutation hypothesis, different statistical tests such as Tajima's D (Tajima 1989) and Fu and Li's D*, D, F* and F test (Fu and Li 1993) were conducted for each marker using DnaSP v5 (Rozas et al. 2003) with A. arenosa as out group (Yin et al. 2010). Chloroplast markers are reported to be reliable indicators of demographic events due to their lack of ability to recombine (Beck et al. 2008). Therefore, we chose CP markers to determine the mismatch distribution using ARLEQUIN. The 'raggedness statistic' (Rag), mode of mismatch distribution (t) and 'sum of squared deviation' (SSD) values were determined to infer the demographic expansion event. To estimate the time since expansion, substitution rate of 2.9 × 10 29 changes per site per year for non-coding sequence of the A. thaliana CP was considered as reported by Sall et al. (2003). For estimation of time since expansion, the following equation was adopted: where t is the expansion time in number of generations, t is the mode of the mismatch distribution and m is the mutation rate per generation for the entire DNA sequence.
The divergence time between the accessions was estimated using Bayesian methods implemented in BEAST v1.8.0 (Drummond et al. 2012 (Jakobsson et al. 2006) and third, 5.0 mya divergence time between A. thaliana and A. arenosa (Koch et al. 2000;Yin et al. 2010). The TMRCA reference point between A. suecica and A. arenosa was not used as A. arenosa is the paternal parent of A. suecica and all the sequences in our data set are of CP origin. We excluded Kas-1 (IND) and XJalt, XJqhx (CHN) from their parent data set and included in the rest of the world data set, as the phylogenetics and haplotype network analyses grouped these accessions with the rest of the world (see Results). Four independent BEAST runs of 3 × 10 7 generations saving logs every 200 generations were performed. Results of four BEAST runs were combined and then subsampled at a lower frequency using Log Combiner v1.8.0; this produced 3 × 10 4 age estimates under the coalescent constant size tree prior and the uncorrelated lognormal relaxed clock. The count of beast runs was decided to achieve an effective sample size of at least 100 for all TMRCA estimates. The MCMC data output for each individual run was visualized using Tracer v1.5. Chronogram trees were generated using TreeAnnotator v1.8.0 program.

Genetic diversity and structure of west Himalayan populations of A. thaliana
Genetic diversity of the west Himalayan populations was estimated using 19 MS and 11 CP loci. Among the 19 MS markers, no single marker was found to be monomorphic [see Supporting Information- Table S2]. The various measures of gene diversity parameters for the overall population are depicted in Table 1, and population-level diversity parameters for MS and CP markers are depicted in Tables 2 and 3, respectively. The data indicated that San and Chi were the most and the least diverse populations, respectively ( Table 2). The amount of genetic variation in terms of %PL and allelic richness per population did not exhibit any significant correlation with their corresponding altitudes (r ¼ 0.06, P ¼ 0.905). All the 11 CP loci were found to be polymorphic. At population level, Deh and Chi populations were found to be the most and the least diverse with respect to %PL and Na using CP markers. The He (assuming panmictic populations) was higher than the Ho. The haplotype diversity of each of the six populations was 1.0 indicating high degree of CP sequence diversity across all the populations (Table 2).
In the phylogenetic analysis using NJ method with MS markers, clustering of accessions at two hierarchical levels was observed. The three major branches   Table S4] and corresponding F ST values (while controlling for pairwise altitudinal differences) showed a significant correlation (r 2 ¼ 0.57, P ¼ 0.028) indicating IBD, whereas no significant correlation was found when pairwise F ST were correlated with pairwise altitudinal differences (r 2 ¼ 0.54, P ¼ 0.07).
To further assess the genetic structure, populations were analysed using two clustering approaches, PCoA and STRUCTURE. Principal coordinate analyses of the MS markers explaining 19.02, 9.98 and 9.03 % of the overall genetic variation identified three groups representing three altitudinal classes ( Fig. 3; Supporting Information-Fig. S3).  STRUCTURE analysis indicated that at K ¼ 2, the populations were grouped into two: one representing populations of low and medium altitude, and the other one representing populations of high altitude. At K ¼ 3, all the populations grouped according to their altitudinal assignment (described in Methods). Further, at K ¼ 4 and 5, San formed a separate group from Chi and Kok, and at K ¼ 6, all the populations except Mun and Dha formed separate groups. San showed large withinpopulation variations at K ¼ 5 and 6 (Fig. 4). The accessions that were highly variable in STRUCTURE analysis were intermixed in NJ analysis as well. For example, Kok-20, San-29 and San-3 were found to be highly variable in both the analyses. . K-means clustering (circled) of first three coordinates from principal co-ordinate analysis of the MS markers. Accessions from low, medium and high altitude formed different clusters. The first three co-ordinates (x-axis, first co-ordinate; y-axis, second co-ordinate; z-axis, third co-ordinate) are shown.

Comparison between west Himalayan and the rest of the world populations
The west Himalayan populations were compared with the Yangtze River and the rest of the world populations using CP sequences (data set iii). The results of comparative genetic diversity parameters between the three groups are depicted in Table 1. West Himalaya and the rest of the world showed high and comparable genetic diversity within populations. Thirty-seven haplotypes were detected from 75 accessions of the world data set, whereas 48 haplotypes were identified from the 49 accessions of west Himalaya. Yangtze River populations could be defined by four haplotypes. Haplotype diversity of the west Himalayan population was slightly higher than world populations (Table 1), whereas Yangtze River populations had the smallest haplotype diversity.
In the concatenated data set of 11 CP loci, 1.5 % were parsimony informative characters (PIC). The genetic relationship among west Himalayan and the rest of the world accessions was first analysed by constructing NJ trees with complete deletion of gaps. This method resulted in a poorly supported tree (data not shown). However, inclusion of indels greatly increased the statistical branch support without changing the clustering pattern as observed with complete deletion methods (Fig. 5). There were two distinct clusters with high bootstrap support: Cluster I consisted exclusively of the accessions from west Himalaya and the Yangtze River, whereas Cluster II consisted of the accessions from the rest of the world. The accessions in Cluster I formed a highly polytomous branch. The rest of the world's accessions clustered into sub-clusters without any preference to geographical affiliation. Interestingly, the accessions from Xinjiang province of China (XJalt, XJqhx) and Indian accession Kas-1 did not group with Cluster I. Further, the haplotype network was constructed with 95 % of steps connection limit. The network consisted of 46 haplotypes ( Fig. 6; Supporting Information-   Table S6]. Hatch marks denote nucleotide changes. The results of mismatch distribution analysis were also consistent with this finding [see Supporting Information- Table S7]. The SSD and raggedness index between the observed and expected mismatch distribution in west Himalayan (SSD ¼ 0.1, P ¼ 0.6, Rag ¼ 0.004, P ¼ 0.43) were smaller than that of Yangtze River (SSD ¼ 0.15, P ¼ 0.9, Rag ¼ 0.57, P ¼ 0.04). These results provided evidence of rapid population expansion in the west Himalayan population. Based on the t value, the initial time of expansion of west Himalayan populations and Yangtze River populations was found to be 130 000 and 84 000 years, respectively. Using Bayesian approaches, we estimated the divergence time with 95 % highest probability density (HPD) [see Supporting Information- Table S8]. The TMRCA of combined populations of western Himalaya and Yangtze River was estimated to be 0.45 mya (95 % HPD, 0.35 -0.56), which was exactly same as the TMRCA of western Himalaya lineage. The TMRCA of Yangtze River populations stood at 0.29 (95 % HPD, 0.20-0.39). Further, the origin of the rest of the world was found to be 0.62 mya (95 % HPD, 0.49-0.74). Although the tree prior for A. arenosa and A. thaliana split was 5.0 mya, the relaxed molecular clock found it to be 6.61 mya (95 % HPD, 5.94-7.34). The rest of TMRCA reference points, O. cabulica (12.42 mya; 95 % HPD, 12.17-12.68) and A. suecica (0.64 mya; 95 % HPD, 0.51-0.76), were within the range of the given priors.
The chronogram resulting from Bayesian phylogenetic time tree analysis had two distinct clades. One included accessions from west Himalaya, Yangtze River, and the other included all the accessions from the rest of the world accessions. The chronogram clearly showed a split between European and Himalaya -Yangtze River accessions at 0.49 mya (Fig. 7). AoB PLANTS www.aobplants.oxfordjournals.org

Discussion
The Himalayan region is one of the mega-biodiversity hot spots of the world, bestowed with rich flora and fauna (Chatterjee 1939;Nayar 1996;Pandit et al. 2000). Despite its floristically rich heritage, A. thaliana, a well-studied model plant species, has found no detailed studies from this region. We collected accessions of A. thaliana from the west Himalaya ranging altitudes of 700 -3400 m a.m.s.l. The populations were collected from diverse habitats of highly disturbed lawn (Deh) to mountainous slopes (Mun) to undisturbed mountainous river valley (Chi). To the best of our knowledge, this is the first report on A. thaliana populations from altitudes ranging from such a large altitudinal range.
The six west Himalayan populations analysed with MS and CP loci indicated substantial genetic variation, with all the pairwise F ST values being significant. Most of the earlier studies have reported less genetic variation within populations, consistent with A. thaliana life history and more among populations (Hanfstingl et al. 1994;Bergelson et al. 1998;Breyne et al. 1999;Miyashita et al. 1999;Jørgensen and Mauricio 2004;Nordborg et al. 2005;Beck et al. 2008;Brennan et al. 2014). In other cases, higher genetic variation was observed within population than among populations (Jørgensen and Mauricio 2004;Stenøien et al. 2005;Bakker et al. 2006;Pico et al. 2008;Lundemo et al. 2009). These differences in genetic differentiation may arise to some extent due to use of different markers and sampling strategies (Nordborg et al. 2005). High variations within populations have been attributed to differential contributions of migration, outcrossing and de novo mutations (Pico et al. 2008). In the case of west Himalayan populations, owing to its high mountain ranges, migration may play a limited role in the advent of within-population variations. Only in Mun and Dha populations, for their geographical and altitudinal proximity and small F ST value, migration might play a role in withinpopulation variability. With the present data set, it was not possible to comment on the contribution of neutral changes to the within-population variability.
The results of MS marker analysis using STRUCTURE and NJ tree indicated that the west Himalayan A. thaliana populations were structured. In STRUCTURE analysis, at lower K, the accessions grouped at altitudinal level but gradually; at higher K, they assembled according to their population affiliation. This finding was corroborated by NJ tree where major branches represented altitudes and external clades represented populations. Only Mun and Dha showed lack of population-level structure. On account of the geographical and altitudinal proximity of this pair of populations, migration might play a key role in determining their inter-population variability, although in PCoA analysis, population-level clusters could not be detected; but clustering was clearly evident at altitudinal level. For a highly selfing plant species, the structured distribution of genetic variation is consistent with A. thaliana and has been reported previously (Todokoro et al. 1995;Kuittinen et al. 2002;Ostrowski et al. 2006). On the other hand, the absence of population structure in earlier studies has been interpreted as the result of human disturbance (King et al. 1993;Miyashita et al. 1999;Vander Zwan et al. 2000;Maloof et al. 2001). But the present populations were mostly from the undisturbed areas except Deh, which is otherwise geographically isolated from the rest. The observation of significantly positive correlations with geographical distances in these populations further supported the evidence of IBD in the west Himalayan region. These results are consistent with similar analyses of genetic structure both at regional level (Berge et al. 1998;Pico et al. 2008) and at global level (Nordborg et al. 2005;Ostrowski et al. 2006;Schmid et al. 2006;Beck et al. 2008). The low Ho when compared with He indicates a high inbreeding rate, a common phenomenon in highly selfing species like A. thaliana (Platt et al. 2010). Such patterns of heterozygosities and population structures have been previously observed in A. thaliana populations using different markers (Abbott and Gomes 1989;Stenøien et al. 2005;Pico et al. 2008;Bomblies et al. 2010).
The genetic clustering observed in case of MS markers was absent with CP loci due to rarity of PICs in them. Yin et al. (2010) have also encountered low nucleotide diversity in Chinese populations resulting in a poorly resolved tree. The CP genome is known to have low nucleotide substitution rate when compared with the nuclear genome (Wolfe et al. 1987;Smith 2015). Several studies have reported poor phylogenetic resolution at species and lower taxonomic levels due to small substitution rates in plants (Lee et al. 2005;Roy et al. 2010). Although the nucleotide diversity of Himalayan accessions was much higher than that of Yangtze River, still they were not high enough to achieve significant clustering.

West Himalayan populations with respect to world populations
The six populations analysed using CP loci contained a substantial amount of genetic variation, since all of them included several haplotypes. However, caution must be taken in interpreting these data since genetic diversity of local populations might depend on population size, age and habitats as well as types of markers used for diversity estimation (Pico et al. 2008). The west Himalayan populations analysed here were from diverse habitat types and mostly growing in natural sites. Such populations may contain large genetic variations when compared with smaller or younger populations. But our study may be limited due to the small number of populations. It is worth mentioning here that despite all of our efforts, we could not locate more populations in this rugged mountainous region.
Phylogenetic tree analysis using CP markers showed no geographical affiliation in clustering of the rest of the world accessions, but accessions from Yangtze River and west Himalaya (excluding Kas-1) formed one group distinct from the rest of the world accessions. This polytomous clustering was primarily attributable to lack of PIC's. The two strongly supported branches in the NJ tree were obtained as a result of the use of indels as phylogenetic information. Several other studies involving regional (Sharbel et al. 2000;Jørgensen and Mauricio 2004;Stenøien et al. 2005) as well as global (Bergelson et al. 1998;Miyashita et al. 1999;Erschadi et al. 2000;Barth et al. 2002;Bakker et al. 2006) accessions have reported little or no clustering of accessions based on their geographical location on the basis of different markers, whereas clustering on the basis of geographical location has also been reported (Sharbel et al. 2000;Lundemo et al. 2009). This pattern of neutral genetic variability has been explained by rapid spread followed by little gene flow among populations of A. thaliana. The inference of rapid expansion of these populations is further strengthened by the results of neutral mutation hypothesis test where most of the CP loci were found to have significantly negative Tajima's D, FL-D and FL-F. A significantly negative Tajima's D value is a general feature of the A. thaliana genome (Schmid et al. 2005) and is consistent with demographic forces, such as population growth, rather than non-neutral forces such as selection (Beck et al. 2008).
The results of haplotype analysis further corroborate the genetic distinction between these two groups. As observed by Yin et al. (2010), this study also indicated the presence of two major haplogroups in the combined data set. However, we found one haplogroup consisted entirely of populations from the west Himalaya and Yangtze River and the other consisted of the rest of the world populations. Inclusion of the west Himalayan populations clearly indicated a dimorphism between the rest of the world populations and west Himalaya -Yangtze River populations, in contrast to what was observed in the Yin et al. (2010) study. Although this dimorphism was supported by only two nucleotide character states, high statistical branch support achieved in the phylogenetic analysis further substantiates the presence of these two worldwide groups. It was notable that the cpDNA diversity of Yangtze River accessions was the lowest among all the regions (0.00017). This was further validated by the low haplotype diversity of Yangtze River populations. Ancestral haplotype analysis showed one of the highfrequency haplotypes (CHN-AHyxx) as the ancestral haplotype of Haplogroup I and this haplotype also notably included the Kas-2 accession from the west Himalaya. This suggests that the Yangtze River populations might have arisen from the west Himalaya population but experienced founder effect phenomena leading to its near complete homogenization in cpDNA diversity. Our results further clarified the doubt about the origin of Kas-1, which was not clustered with accessions of west Himalaya or Yangtze River accessions. Earlier studies have had raised doubts about the origin of Kas-1 and they were proposed to include more accessions from this region to clarify its origin more conclusively (Vander Zwan et al. 2000;Yin et al. 2010).

In isolation evolution of west Himalayan populations
The divergence time of the west Himalaya population was about 0.45 mya. This comparatively long demographic history and the existing geographical and environmental barriers of the Himalaya might have aided in isolation evolution of these populations. Further, the common ancestor of the west Himalayan and Yangtze River populations might have split from the rest of the world, most probably from the central Asian region at 0.49 mya. Arabidopsis thaliana collected from the west Himalayan region occupied a broad diversity of habitats with varied climatic conditions that can be primarily attributed to the steep altitudinal gradient. These diverse habitats together with large genetic variations observed within the west Himalayan populations also suggest a longer demographic history of A. thaliana in this region. This was further supported by the results of mismatch distribution analyses. The results suggest that the initial time of expansion of west Himalayan populations to be about 130 000 years, whereas population expansion in Yangtze River began at about 84 000 years ago, which is slightly lower than the estimation of Yin et al. (2010). The results of haplotype and divergence time analyses of the west Himalaya and Yangtze populations ruled out any recent colonization of west Himalayan populations. Our results further suggest in-isolation evolution of west Himalayan populations and might have served as a source for eastward expansions as this region contains an amount of neutral genetic diversity similar to that of the rest of the world. The decrease in genetic diversity in the Yangtze River indicates west -east expansions. We hypothesize that besides being the source of European populations as suggested by Sharbel et al. (2000), central Asian populations might be the source of west Himalayan populations as well, which further expanded eastwards. High AoB PLANTS www.aobplants.oxfordjournals.org genetic diversity of west Himalayan populations suggests this region as a possible centre of diversity and source of worldwide populations, but extensive sampling of large number of populations may be required to prove this hypothesis. Further, analysis including collections from the east Himalaya, eastern China and Korean peninsula may provide further insight to the west -east expansion model. However, as indicated by other studies, populations of Japan may not be part of this natural expansion due to the existing physical barrier (Sea of Japan) rather than a recent colonization (Todokoro et al. 1995).

Conclusions
The six west Himalayan A. thaliana populations analysed in this study were highly diverse and structured, even at altitudinal level. Although we compared the west Himalayan populations with diverse collections of worldwide accessions, the genetic diversity indices were comparable with the world populations. It is clear from this study that these populations are genetically distinct from the rest of the world populations. West Himalayan populations are not recently colonized as indicated by time of divergence estimate and demographic expansion model test. The geophysical nature of this region might have played a major role in shaping the population structure in this region. This region might be the source of west -east expansion model, but further studies are needed to corroborate the hypothesis.
A.S. and University Grants Commission, New Delhi, to A.M.T., S.S. and P.M. Authors are also thankful to anonymous reviewers for constructive comments in improving the presentation and Gail Rice for editing the manuscript.

Supporting Information
The following additional information is available in the online version of this article - Table S1. Detailed geographical locations of sample collection sites and their habitats. Table S2. Details of 19 MS loci data. Table S3. Pairwise genetic distances (F ST ) between west Himalayan populations. All the values are highly significant (P , 0.0001). Table S4. Pairwise geographical distances between west Himalayan populations (in km). Table S5. Results of haplotype analysis of west Himalayan (WH) populations and the rest of the world (RW) accessions using CP markers. Showing outgroup weights (in decreasing order) of the haplotypes and the corresponding accessions in the haplotype. Table S6. Nucleotide diversity (p) and the results of neutral mutation hypothesis tests for the 11 CP marker data sets of WH populations. ns, not significant (P . 0.05); s, significant (P , 0.05); vs, very significant (P , 0.01). Table S7. Results of mismatch distribution analysis using CP markers of west Himalaya (WH) and Yangtze River (YR) populations. Table S8. Divergence time estimation. Mean divergence time in million years ago of outgroup species (calibration time points), rest of world (RW), west Himalaya (WH), west Himalaya + Yangtze River (HY) and Yangtze river (YR). Also included is 95% upper and lower higher probability density (HPD) and effective sample size (ESS). Figure S1. Mean probability (D) of ln(K). Figure S2. Consensus NJ tree of 1000 bootstrap replicates for WH accessions (total 48) constructed using concatenated sequences of 11 CP markers. The tree was constructed employing indel coding method MCIC. Figures on the nodes represent bootstrap branch support. Figure S3. Percentage contributions to the genetic variability of the co-ordinates calculated using principal co-ordinate analysis of west Himalayan samples on the basis of MS markers. x-axis ¼ co-ordinate, y-axis ¼ percentage contribution of the corresponding co-ordinate to the overall variability.