South Asia is home to more than 1.5 billion humans representing many diverse ethnicities, linguistic and religious groups and representing almost one-quarter of humanity. Modern humans arrived here soon after their departure from Africa ∼50 000–70 000 years before present (YBP) and several subsequent human migrations and invasions, as well as the unique social structure of the region, have helped shape the pattern of genetic diversity currently observed in these populations. Over the last few decades population geneticists and molecular anthropologists have analyzed DNA variation in indigenous populations from this region in order to catalog their genetic relationships and histories. The emphasis is gradually shifting from the study of population origins to high resolution surveys of DNA variation to address issues of population stratification and genetic susceptibility or resistance to diseases in genome-wide association surveys. We present a historical overview of the genetic studies carried out on populations from this region in order to understand the influence of geographic, linguistic and religious factors on population diversity in this region, and discuss future prospects in light of developments in high throughput genotyping and next generation sequencing technologies.
South Asia encompasses the Indo–Pak sub-continent and includes India, Pakistan, Bangladesh, the Kingdoms of Nepal and Bhutan and the islands of Sri Lanka, the Maldives and the Chagos Archipelago. The land mass is bordered by Iran and Afghanistan on the west, China in the north, and Myanmar on its eastern fringes. The Indian Ocean straddles its entire southern coast line. Its population of around 1.5 billion individuals constitutes approximately one-fourth of humanity. Over the centuries this area has witnessed many invasions and migrations mainly from the West. Genetically, it contains the site of what was once called ‘the grandest genetic experiment ever performed on Man’ and, in several surveys of worldwide diversity, one of the most outstanding populations making up a sixth grouping of humanity comparable in these analyses to major continental groupings [1, 2]. In this review, we will consider how our understanding of the historical, cultural and environmental factors that have shaped its inhabitants has developed, something of what we have learned, and prospects for future developments.
This now-densely occupied land was encountered by the first populations of modern humans that ventured out of Africa more than 50 000 years ago. It is suggested that they arrived in this part of Asia via a southern coastal route and continued to Southeast Asia. The fossil and archaeological evidence for human settlements in this part of the world prior to 9000 YBP is, unfortunately, sparse although settlements dating much further back are now beginning to emerge, including from Patne in western India and Batadomba-lena in Sri Lanka, dated to between 30 000 and 34 000 YBP [3–5]. There is evidence for indigenous animal and plant domestication in many places in the region, the earliest being found at a Neolithic site in Mehrgarh, in Southwest Pakistan . Subsequent epochal events included the development and decline of the Indus Valley civilization's Harappan culture, the arrival of Indo-European speakers from central or west Asia, linked to the introduction of the caste system in India and the possible displacement of Dravidian speakers to their current location in south central India. More recent historical events have included Alexander's invasion in 327 BC, the Arab invasion and subsequent Muslim conquest and rule of India, followed by the British Raj that lasted until the partition in 1947 .
More than 4000 well-defined population groups including ∼500 tribal and approximately 30 hunter-gatherers reside in this region . The overwhelming majority are endogamous. The endogamous Hindu caste system, and the presence of large consanguineous families in the region, especially among the Muslim populations, provides unique resources for unraveling the genetic basis of disease. Like elsewhere, demographic events such as genetic bottlenecks, population expansions and admixture have shaped the genetic diversity that is observed in this region today.
Linguistically the region comprises major groups of Indo-European, Dravidian, Tibeto-Burman and Austro-Asiatic speakers, minor groups like the Andamanese, and even language-isolate groups like the Hunza Burusho in northern Pakistan and Nihali in Madhya Pradesh, India (Figure 1). More than 70% of the population speak Indo-European languages. Tibeto-Burman speakers are present in the north and northeast and form a majority in Bhutan. Dravidian languages are prevalent in south India, Sri Lanka and, intriguingly, in Balochistan province of Pakistan. Austro-Asiatic languages are spoken by many Indians in the south and close to the border with Myanmar . The region is home to followers of many religions, the major among them being Islam, Hinduism, Buddhism and Sikhism. The population also includes sizeable Christian, Jewish and Zoroastrian minorities. All have contributed to the genetic and cultural diversity found here.
During the past few decades molecular geneticists and anthropologists have analyzed DNA variation among human populations in order to catalog their genetic relationships and to glean information about recent human evolution. Several such studies have been carried out in populations from South Asia, especially Pakistan, which is well-represented in the Human Genome Diversity Project (HGDP), and more recently India, in order to study their origins and their susceptibility, or resistance, to disease. We first present an overview of the genetic studies carried out on populations from this region in order to understand the influence of geographic, linguistic and religious patterns on population diversity in this region and go on to discuss future prospects for analyses of genetic variation in this region.
Throughout the 1970s and 1980s classical serological markers such as human blood groups, leukocyte antigens (HLA), glucose-6-phosphate dehydrogenase and other isozymes were used to study a limited number of populations from South Asia . The advent of polymorphic DNA markers such as restriction fragment length polymorphisms (RFLPs), variable number of tandem repeats (VNTRs), short tandem repeats (STRs) or microsatellites, and single-nucleotide polymorphisms (SNPs) and the technological breakthroughs that made it relatively easy to genotype large number of markers in thousands of samples permitted the extensive analyses of genetic variation in humans. Initially the laborious techniques of RFLP and hybridization were applied to analyze a few hundred samples but the development of polymerase chain reaction (PCR) amplification made possible the extensive analyses of DNA variation in thousands of samples.
Several initial studies analyzed mitochondrial DNA and the Y chromosome in selected populations from Pakistan and India in order to obtain glimpses into their female and male ancestry, respectively. Sequence differences among individual mitochondrial DNAs separate them into haplotypes and haplogroups that provide a snapshot of population origins from the female perspective . Similarly, the male-specific part of the Y chromosome is passed down from father to son without change, except for the gradual accumulation of mutations which appear as DNA polymorphisms and provide a male perspective to human evolution. Human Y chromosomes are delineated into distinct haplogroups and haplotypes, defined by a combination of unique event or bi-allelic polymorphisms alone or their combination with Y-STRs, respectively . A number of other studies analyzed autosomal STRs and Alu insertion polymorphisms to gauge an overall view of autosomal genetic diversity in the region and address the issue of population origins in light of historical context [13–16].
Populations from Pakistan and north India share a sizable proportion of variation with European populations and Central and West Asians have been major contributors to the gene pool of this region, consistent with arrival of migrants from Northwest Asia [13, 14, 17]. In Pakistan the Karakoram Mountain ranges that are part of the Himalayan Mountains in the north have been a major barrier to gene flow from China, although this does not appear to be the case in India where Sino-Tibetan populations entered through the north-east corridor and Nepal .
Most Y-chromosomal lineages found in both tribal and non-tribal populations from South Asia date back to the Paleolithic period. Common haplogroups (markers) include C5 (M356), F* (M89), H1 (M52), H2 (Apt), L1 (M27; M76), R1a1 (M17) and R2 (M124) [19, 20]. All except for R1a1, which constitutes 55% of Y chromosomes in Pakistan, have long coalescent times and appear to be indigenous. Y haplogroup J2* (M172) which is prevalent in Pakistan and north and west India represents a West Asian contribution to the genetic diversity of the sub-continent (although the migrants would of course have carried other lineages as well) . O2* (P31), O3* (M122) and Q* (M242) lineages in India and Pakistan are probably representative of migrants from East and Southeast Asia [21, 22]. Generally haplotype variation between populations is higher, especially between tribes [23, 24].
More than 60% of mitochondrial lineages in this region are represented by haplogroup M and mitochondrial variation, especially in haplogroup M, is largely local in origin with estimated coalescence times in the Paleolithic . Approximately 30% of Indian mtDNA haplotypes belong to West Eurasian haplogroups. Haplogroups R7 and U2 lineages are specific to India and the variation in its distribution also suggests an ancient origin [25, 26]. It has been argued that the presence of Eurasian and Australasian mitochondrial haplogroups M, N and R and Y haplogroups C* (M130), D* (M174) and F* in South Asia supports the migration of humans out of East Africa via a southern coastal route through the sub-continent . The Andaman and Nicobar Islanders that comprise six tribal populations and lie on the southern coastal route to Oceania show a high level of population differentiation with unique mitochondrial sequences in the Andamanese probably arising by genetic drift in an isolated population. However, the Nicobarese share genetic relatedness with Southeast Asians and are probably more recent migrants in comparison with their neighbors . The presence of haplogroups B5a and F1 near the border between India and Nepal suggests movement across the Himalaya Mountains and this possibility is also supported by the detection of Indian haplogroups R6 in the Nepalese .
The Indian caste system represents a unique socio-economic hierarchy associated with the Hindu religion, that broadly distinguishes upper (Brahmins/priests), middle (Kshatriyas/warriors; Vaishyas/farmers and traders; Sudras/laborers) and lower (Panchama/untouchables) or scheduled castes. It is associated with the spread of Indo-European languages after ∼3500 YBP, but the extent of gene flow from outside remains a matter of debate. Early studies were interpreted to suggest extensive gene flow from West and Central Asia but recent data, obtained using a larger number of markers, have been used to argue that most DNA variation was indigenous in origin [20, 29]. Male substructure has been observed in the higher (priest and warrior) classes from Jaunpur District in central India and male gene flow between the castes was low (<1% per generation) there, as expected from the social rules . Some studies show a genetic affinity between the lower castes and tribal populations and the frequency of these haplotypes is proportional to caste rank, the highest frequency of West Eurasian haplotypes being found in the upper castes [13, 30]. Upper caste Indians share 10–20% of variation with European populations [13, 14].
Several populations stood out as being genetically distinct in the initial studies based upon Y chromosomal and autosomal STRs. In the Kalash population from Pakistan, referred to at the beginning of this review as a sixth grouping of humanity, this was most likely due to drift in a geographically and religiously isolated group that has undergone a population bottleneck during their recent migration to their present day settlements in the Hindu Kush Mountain valleys in northern Pakistan from where the individuals were sampled. A larger survey that includes populations from their ancestral homeland in Nuristan, Afghanistan, would provide more insights about their unique genetic structure. They have a major West Eurasian mitochondrial component along with certain population-specific Y lineages (L3a; PK3) with little Y-STR variation and no genetic affiliation with East Asian populations [21, 26, 31].
The origins of the Parsi are well-documented and, although only a few thousand now live in Pakistan, there are many in India. Their migration to Gujarat in India after the collapse of the Sassanian empire is well documented and their name relates to their geographic origin—meaning ‘from Iran’. Their Y chromosomes are closer to populations from present-day Iran but 60% of their maternal gene pool belongs to South Asian haplogroups not found in Iran, highlighting their affinities with local Gujarati women [17, 26].
A number of East African slaves were brought to this region as involuntary migrants and among such groups are the Makranis who have physical features typical of African populations. The contribution of sub-Saharan African Y chromosomes to this population was estimated to be ∼12% and combined with the presence of sub-Saharan mitochondrial haplogroups represent the genetic legacy of the East African slave trade that existed in this region [17, 26].
Preliminary data on the Nepalese and Bhutanese populations using autosomal and Y-STRs show significant differences in comparison with their geographic neighbors consistent with genetic drift in small geographically isolated populations, the differentiation being more striking in the Bhutan [32–34]. A more detailed genetic analyses of the samples collected under the ‘Languages and Genes in the Greater Himalaya Region’ Project should provide a clearer picture.
The power of a genealogical approach based on the hierarchical use of Y chromosomal markers in ethnic groups from this region has provided some interesting insights into historical events. In particular, analyses of Y lineages in the Hazara population from Pakistan established the presence of a ‘star haplotype’ that could be directly linked to Genghis Khan or his male ancestors and that spread by his sons throughout Eurasia . An examination of their mitochondrial DNA suggests that they were accompanied by women of East Asian ancestry and their autosomal DNA also shows genetic relatedness with East Asians [2, 26].
Although several populations from northern Pakistan claimed that they were the descendents of Greek soldiers, left behind in this region by Alexander the Great, this was not generally borne out by genetic analyses. Only a small proportion of haplogroup E1b1b1a (M78) Y chromosomes in the Pathan population of Pakistan provided strong evidence of a small Greek contribution .
Similarly, the Muslim invasions do not appear to have left a readily detectable genetic imprint in India. Islam seems to have spread by conversions and cultural diffusion: overall the Muslims in India are genetically closer to their non-Muslim geographical neighbors than to other Muslim populations and no correlation exists between genetic variation and religious beliefs . This is also true historically. Muslims constitute more than 400 million of the population of this region and since their advent in the seventh century they have included many ethnicities from Arabia, Turkey, Persia and Central Asia. Some studies  have attempted to analyze these Muslim populations based along the sectarian divide between the two major Muslim sects (Shias and Sunnis) but genetic analyses are not appropriate for analyses of recent political and religious events and any differences that are observed are likely to relate to the ethnic or geographic origins of the source populations.
Indo-European, Dravidian, Tibeto-Burman and Austro-Asiatic languages are spoken in this region and overall genetic relationships, as ascertained by STRs, SNPs and other genetic markers, are dictated primarily by geographic proximity rather than linguistic origin [9, 38].
Indo-European languages form the predominant language group and the genetic relatedness between the European and South Asian populations indicate that they may well have shared a common language superfamily, Dene-Caucasian. The Indo-European family may have spread to South Asia 6000–10 000 YBP replacing the languages spoken earlier almost everywhere [39, 40]. The Y haplogroup R1 (M173) is often referred to as an Indo–European marker and its associated haplogroup R1a1 is present at high frequency in many regions where Indo–European speakers live. The worldwide distribution of this haplogroup indicates frequency peaks in Eastern Europe and West and South Asia, which fits in with historical records of nomadic settlements in Europe and India. However, its presence in 15% of Dravidian speakers in India argues against a simple correlation. Although recently several new SNPs have been identified that refine this branch of the tree they are restricted to a few individuals in the same population or show genetic exchange across the Gulf of Oman [21, Underhill et al., unpublished data].
An elite dominance model of the Indo-European speakers partly explains the genetic similarities observed between the Dravidian and Indo-European groups and the seclusion of Dravidians in southern India and parts of Sri Lanka but it does not explain the enigma of the Brahui. This Dravidian-speaking population resides in the Balochistan province in south western Pakistan and is surrounded on all sides by Indo Europeans. Quintana-Murci and co-workers argued that they arrived in South Asia from southwestern Iran with the expansion of Dravidian-speaking farmers although this was contested by later studies using extensive Y markers [20, 29].
The Austro-Asiatic languages may be the most ancient in the region and preliminary analyses of mitochondrial DNA, a limited number of Y-chromosomal and autosomal markers had suggested that the Austro-Asiatic tribes appear to be descendants of remnant earlier settlers and share genetic affinity with Tibeto-Burman populations . Subsequent analyses using a higher resolution of Y haplotypes in a larger sample of the three major Austro-Asiatic groups of India (Mundari, Khasi-Khmuic and Mon-Khmer) demonstrated a strong paternal genetic link amongst these populations and those from Southeast Asia. The authors estimated that the protohaplogroup O2a originated in the Indian Austro-Asiatic populations ∼65 000 years ago and entered Southeast Asia via the Northeast Indian corridor. Mitochondrial DNA varied between Austro-Asiatic groups from Southeast Asia and India reflecting a difference in the history of the sexes . Since some estimates of the MRCA of all extant Y chromosomes are as recent as ∼59 000 YBP, such time estimates should be treated with caution .
Several Tibeto-Burman groups reside in north and north east India, and in Pakistan the Balti population from the Karakoram Mountains also speaks a Tibeto-Burman language. Y-chromosomal analyses of only a limited number of these Balti speakers has been carried out and they are not noticeably different then their neighbors in northern Pakistan .
The Hunza Burusho were of particular genetic interest because their language, Burushaski, is one of the few remaining language isolates in the world . However, they are genetically close to their geographic neighbors in Pakistan . Their isolation in the Karakoram Mountains habitat may have preserved their language but any differences between them and their neighbors, who have acquired new languages, have been greatly diluted by genetic exchange. Analyses of the language isolate Nihali speakers in India will show whether they exhibit similar genetic affinity with their geographic neighbors.
Families from South Asia have been invaluable in understanding genetic basis of several Mendelian disorders. The high rate of consanguineous marriages, large family sizes and contemporary inbreeding among tribes, clans and ethnicities make them suitable for linkage analyses [42, 43]. The genetic basis of several single gene disorders leading to syndromic and non-syndromic blindness, deafness, thalassemias, skeletal, hair, skin and nail disorders has been unraveled in families and populations from this region [44–49]. Several of these mutations are family- or population-specific [44, 50]. The immediate benefits of these analyses include genetic testing to exclude such disease in the unborn child. However, the real challenge is to translate these findings into public health education programs that will benefit these families and communities.
Determination of HLA frequencies at higher allelic resolution than achieved by serological methods through DNA based genotyping in ethnic groups from this region and their association with diseases like malaria, tuberculosis, leprosy and rheumatic heart fever have identified several high risk alleles [51–53]. Genetic association studies using single, or a few genetic variants, have also been carried out but their associations have not been replicated across populations possibly due to ascertainment bias, choice of markers, insufficient statistical power, population stratification, or differences in linkage-disequilibrium patterns in patients and controls.
As a consequence of advances in genotyping technologies, the focus of current studies is shifting from investigating population origins using small numbers of loci to analyzing large number of SNPs, structural and copy number variants (CNV) to address issues of population sub-structure and group membership that will have practical applications in the design of disease association studies, in rationalizing use of medicines tailored to an individual's genetic make up and DNA-based forensic analyses. The discovery of a single high risk variant in the cardiac myosin binding protein C (MYBPC3) that is restricted to 4% of South Asians and predisposes strongly to heart failure in later life is testament to both the promise of such studies and the distinct genetic features of the region .
The wide availability of DNA samples of ethnic populations from Pakistan through the Foundation Jean Dausset's HGDP-Centre d’Etude du Polymorphisme Humain (CEPH) Human Genome Diversity Cell Line Panel has permitted extensive analyses of STRs, SNPs and CNVs in these populations [2, 54–57]. Although some similar work has subsequently been carried out on tribal and caste populations from India and Indian expatriates settled in the USA the lack of readily-available cell lines or DNA from indigenous Indian populations has greatly hindered large-scale studies of other parts of South Asia [58, 59].
Using samples in the HGDP panel, Conrad et al. demonstrated that HapMap populations capture common haplotypes well in non-HapMap South Asian populations and that HapMap data could be used for imputing missing genotypes in these populations . South Asians were expected to have intermediate levels of linkage disequilibrium (LD) between Europeans and East Asians, two populations that were part of the first phase of the HapMap Project. Overall, Indian and Pakistani populations are tagged most effectively by European populations but optimal HapMap mixtures increased ‘the fraction of polymorphic non-tag SNPs in a target population that are in LD with at least one tag SNP above a specified cut off point’ .
Analyses of populations from Pakistan in the HGDP Panel using the Illumina HumanHap 550K and 650K bead chips and expatriate populations from India, Pakistan and Sri Lanka using Affymetrix GeneChip Mapping Array 500K gave broadly similar results to an earlier study that had employed STR variation in this panel [56, 57, 62]. The South Asians were separated as a distinct cluster when regional identity was inferred for six groups using 650 000 SNPs, a better resolution than was obtained through analyses of autosomal STRs or the 550K chip in these populations, which could not clearly distinguish between populations from South Asia, Europe and West Asia. As expected, the Hazara shared ancestry with East Asians and there was a small East Asian contribution to the gene pool of Burusho, Pathan and Sindhi populations that was not apparent with use of STR datasets [2, 57]. Although the Kalash individuals reportedly harbored more than the average CNVs this was not replicated in the most recent survey of CNV in human populations using the Illumina 650Y arrays [56, 63].
Examination of variability in Indian expatriates in the USA using 471 insertion/deletion polymorphisms and autosomal STRs revealed low levels of genetic divergence but these samples, like the Indian Gujarati population from Texas that is included in the HapMap 3 sample collection, are hardly representative of the diversity of extant Indian populations . This problem is being addressed by the Indian Genome Variation Consortium which recently analyzed more than 400 SNPs in an indigenous sample of 1871 samples representing numerous geographical, linguistic and religious groups from India . They observed relatively low genetic differentiation overall, with mean Fst value of 0.03, although this value is still higher than in Europe (∼0.01), but these results could also reflect marker or sampling bias. The maximum genetic differences were observed among tribal populations speaking Indo-European languages, and certain populations and isolated ethic groups clustered on basis of ethnicity or language, suggesting that care should be taken in the selection of cases and controls while designing Genome Wide Association Studies (GWAS) in these populations.
Following the comprehensive documentation of genetic stratification within south Asia, a next logical step would be association mapping to identify genetic variants associated with complex multi-genic diseases and microbial resistance to pathogens such as malarial parasite, viral hepatitis, HIV, or Mycobacterium tuberculosis and Mycobacterium leprae that afflict a large section of South Asians. The increase in non-infectious diseases such as cardiovascular disease, type 2 diabetes mellitus and metabolic disorder in this region also places an enormous burden on these developing economies and their already stretched health care systems.
The development of dense SNP chips such as Illumina's Human1M-Duo BeadChip and Affymetrix v6.0 will enable an even denser and more uniform genomic coverage of both SNPs and CNVs to be obtained. GWAS will benefit from the identification of haplotypes that better describe association than single SNPs. The recent success of the Wellcome Trust Case Control Consortium (WTCCC) genetic association studies in identifying SNPs associated with cardiovascular disease, diabetes and other disorders that are being replicated by the National Human Genome Research Institute (NHGRI) and other research centres provide an ideal model for conducting collaborative large scale disease association studies in South Asian populations with a common set of controls [64, 65]. Besides raising ethical concerns and needing strict quality control, these studies are expensive, and require very large sample sizes. The presence of large South Asian Diasporas in Europe and USA should also facilitate these collaborative studies. The Pakistan Risk of Myocardial Infarction Study (PROMIS) being carried out as part of an international collaboration plans to analyze ∼20 000 myocardial infarction patients and appropriately matched controls and is one of the largest such studies on populations from South Asia, providing one model for future studies . Like many studies from Pakistan, it includes Mohajirs, or Urdu ethnicities, which encompass diverse ethnic groups from India that migrated to Pakistan after independence and whose only commonality is the language (Urdu) that they speak. In such studies careful consideration must be made with regards to ethnicity and geographic origins to ensure that unsuspected population structure does not confound genetic analysis.
An increasing number of studies are also focusing on the role that selection may have played in recent human evolution and studying the constraints imposed by local environment of pathogens, nutrients, toxins and climate on genetic variation. A majority of genes that are under selection due to pathogens are involved in host invasion (e.g. FY), innate (CASP12, CD40, TLR,) or adaptive (HLA, interleukins) immunity [67, 68]. Others include those that aid adaptation to climate, exposure to toxins and dietary nutrients . The genetic variant responsible for lactose tolerance (LCT –13910 T allele), which has been under positive selection in response to dietary milk in European populations, appears to have a recent origin in South Asia, being frequent in pastoral groups and present at low frequency in non-pastorals in Pakistan . Analyses of variation in drug metabolizing enzymes such as alcohol dehydrogenase and cytochrome p450 enzymes are of immediate clinical significance.
The technological advances in DNA sequencing will soon make it economically feasible to resequence entire genomes of chosen individuals . The current 1000 Genomes Project lacks South Asian samples, but the HapMap3 Gujarati in Houston sample meets the necessary international criteria for consent and availability and, in the absence of more suitable indigenous samples, may provide the first insights into South Asian diversity from whole-genome resequencing. We hope that indigenous South Asian samples will soon be available for resequencing as well.
The complexity of human genetic variation in South Asians and its role in gene regulation, expression and influencing disease and non-disease phenotypes in diverse populations from this region has yet to be fully unraveled. The challenge is to obtain clinically significant results that can translate into benefits for the general population leading eventually to early diagnosis, prevention and therapeutic intervention. Only then can this quarter of humanity truly benefit from the promise of the ‘genetic revolution’.
1000 Genomes Project: http://www.1000genomes.org/page.php
Ethnologue: Languages of the World: http://www.ethnologue.com/web.asp
Human Genome Diversity Project: http://www.stanford.edu/group/morrinst/hgdp
Human Genome Diversity Cell Line Panel:
Languages and Genes of the Greater Himalayan Region: http://www.sanger.ac.uk/ Teams/Team19/himalayas.shtmlhttp://www.le.ac.uk/genetics/maj4/Himalayan_OMLLreport.pdf
Pakistan Risk of Myocardial Infarction Study (PROMIS): http://www.phpc.cam.ac.uk/MEU/PROMIS/
The Indian Genome Variation Consortium: http://www.igvdb.res.in/
The Wellcome Trust Case-Control Consortium: http://www.wtccc.org.uk/
South Asia is home to a quarter of humanity and harbors many diverse ethnicities, linguistic and religious groups.
We examine the influence of geographic, linguistic and religious factors on genetic variation in this region and discuss the importance of population stratification and its implication for genome-wide association surveys of South Asian populations.
Future prospects are discussed in light of developments in high throughput genotyping and next generation sequencing technologies.
The Wellcome Trust.
The authors would like to thank Daniel MacArthur and the reviewers for their helpful comments and Ambareen for the illustration.