-
PDF
- Split View
-
Views
-
Cite
Cite
Marcel Lucas-Sánchez, Jose M Serradell, David Comas, Population history of North Africa based on modern and ancient genomes, Human Molecular Genetics, Volume 30, Issue R1, 1 March 2021, Pages R17–R23, https://doi.org/10.1093/hmg/ddaa261
- Share Icon Share
Abstract
Compared with the rest of the African continent, North Africa has provided limited genomic data. Nonetheless, the genetic data available show a complex demographic scenario characterized by extensive admixture and drift. Despite the continuous gene flow from the Middle East, Europe and sub-Saharan Africa, an autochthonous genetic component that dates back to pre-Holocene times is still present in North African groups. The comparison of ancient and modern genomes has evidenced a genetic continuity in the region since Epipaleolithic times. Later population movements, especially the gene flow from the Middle East associated with the Neolithic, have diluted the genetic autochthonous component, creating an east to west gradient. Recent historical movements, such as the Arabization, have also contributed to the genetic landscape observed currently in North Africa and have culturally transformed the region. Genome analyses have not shown evidence of a clear correlation between cultural and genetic diversity in North Africa, as there is no genetic pattern of differentiation between Tamazight (i.e. Berber) and Arab speakers as a whole. Besides the gene flow received from neighboring areas, the analysis of North African genomes has shown that the region has also acted as a source of gene flow since ancient times. As a result of the genetic uniqueness of North African groups and the lack of available data, there is an urgent need for the study of genetic variation in the region and its implications in health and disease.
Introduction
The genetic study of North African human groups has been generally neglected. Instead the focus of population genetic analyses has been placed on neighboring areas, thereby overshadowing the relevance of North Africa. On one hand, the African continent has captured most of the attention for being the cradle of humankind, however the focus of genetic studies have been mainly set on Eastern and South Africa as the suggested geographical origins of our species (1–4). The expansion of Bantu-speaking groups from western Africa, associated with one of the major population movements also received much attention (5–7). Therefore, North Africa has been neglected in genetic studies compared with the rest of the continent. Moreover, North Africa has been considered as an extension of the Middle East into the African continent, and therefore received little recognition as a unique entity until recently (8,9). Thus, a lower amount of genetic data in North Africa has been collected as compared with other regions. Even recent global genome databases, such as the Human Genome Diversity Project (10) and the Simons Genome Diversity Project (11), only considered a single population (the Mozabite) and four individual genomes (two from the Mozabite and two from the Saharawi), respectively. Fortunately, in the last few years, genetic data, including ancient and current-day whole genomes, have been analyzed in order to refine the population history of North Africans.

Scheme of the main population movements in North Africa. Movements from Europe (green), the Middle East (blue), sub-Saharan Africa (dark orange) and North Africa (yellow) are shown. Arrows are approximations and show direction rather than specific migration routes for the major migrations, although additional migrations may have occurred.
North African Data: From Classical Markers to Whole Genomes
Despite the limited population data in North Africa, most analyses have shown a complex pattern of genetic diversity, characterized by extensive admixture, and differentiation of the North African area from the rest of the African continent. The study of classical genetic markers, compiled in the seminal work by Cavalli-Sforza et al. (12), evidenced the differentiation of North Africa from the rest of the continent. This is shown in the first component of the African principal component analysis, which suggested a North African demographic history more related to the out-of-Africa (OOA) populations. In a specific North African compilation of classical markers, Bosch et al. (13) also showed the distinction of North Africa in comparison to other African groups, and pointed to a gradient of genetic diversity in an east–west axis, as a result of human movements limited by the Mediterranean Sea and the Sahara Desert. The uniparental marker analyses (mitochondrial DNA and Y chromosome) have also evidenced the uniqueness of North Africa within the continent and the admixture of lineages from neighboring areas. The presence of uniparental lineages that originated in sub-Saharan Africa, the Middle East or Europe, suggests a complex pattern of gene flow toward North Africa; however, autochthonous lineages have also been described in the region, pointing to extensive admixture of local and external groups with different gradients of lineages in the area (14–18). During the last decade, the analyses of genome-wide SNPs refined our knowledge about the North African genetic landscape (8,19,20) and reinforced the idea of complex demographic patterns of admixture and isolation in the region that differentiated it from the rest of the African continent. This idea has been also corroborated by the analysis of a still limited data on complete genomes from ancient and modern samples (21–23).
The North African Genetic Component
Genetic data from current-day populations suggest a complex pattern of admixture, with a minimum of four main sources of genetic ancestry for North African people. Henn et al. (8) first showed the presence of an autochthonous ancestral component (also known as the Maghrebi component), as well as European, Middle Eastern and sub-Saharan components, in the current North African populations (8). This result showed that North African populations exhibit their own ancestral component and cannot be considered as the result of a mere admixture of exogenous ancestries from neighboring regions. This component is related to an early North Africa population that diverged from the rest of the OOA groups predating the Holocene, more than 12 000 years ago. The component was possibly introduced in a back-to-Africa movement, already suggested by the mitochondrial DNA (mtDNA) haplogroups U6 and M1 (24–26), and more recently nuclear data (27); it is distributed in a west-to-east declining gradient across the region (8). Later studies confirmed the presence of this autochthonous component by comparing current-day genomes with data from ancient anatomically modern human samples recovered from different locations in North Africa (23). This analysis refined the sources of ancestry in current North Africa populations, adding a Caucasian hunter–gatherer/Neolithic Irani-related component and locating the possible origin of the autochthonous North African component in Epipaleolithic or Early Neolithic times, given that it is prevalent in Moroccan Epipaleolithic and Early Neolithic samples.
Ancient Genomes in North Africa
The retrieval of stone artifacts and cutmarked bones from an archeological site in Algeria places the first peopling of North Africa around 2.4 million years ago (28), whereas direct bone dating of the oldest human remains from the Moroccan site of Jebel Irhoud points to 300 thousand years ago (ka) (9). Many more fossils have been recovered in North Africa (29) but only for a few of them it has been possible to extract and analyze their genome. The Taforalt site in Morocco (dated between 15 100 and 13 900 calibrated years before present) is the oldest site to date to yield DNA data, not only in North Africa but in Africa as a whole. The analyzed Taforalt individuals show high affinity toward Near Eastern populations, especially Epipaleolithic Natufians, with whom they share 63.5% of their ancestry on average. These individuals present mtDNA haplogroups U6 and M1, concordant with the pre-Holocene back-to-Africa event (22,26,30). An ancient sub-Saharan ancestral component is also present, showing a higher affinity with Taforalt than with any combination of Yoruba–Natufian ancestry. Also, no gene flow from Paleolithic Europeans is observed (22).
In addition to these ancient North African Epipaleolithic genomes, five individuals from the Early Neolithic Ifri n’Amr or Moussa (IAM) site were analyzed together with four Late Neolithic samples from Kelif el Boroud (KEB) (21). IAM individuals (7000 years old) showed close genome-wide affinities with the Tarofalt individuals. This was also supported by the presence of similar mtDNA haplogroups (U6, M1) associated with the back-to-Africa migration (30), suggesting a continuity between Later Stone Age and Early Neolithic populations in the Maghreb (21). On the other hand, the genome analysis of the KEB population suggests that it can be modeled as a mixture of IAM and Anatolian/European Neolithic, and it also presents a lower sub-Saharan component than IAM or Tarofalt (21). Mitochondrial and Y-chromosome haplogroups in these samples are prominently found in Anatolian and European Neolithic samples (31,32). Recently, two 7000 year-old mtDNA samples have been extracted from the Takarkori Rockshelter site (Libya) and attributed to a newly identified haplotype in Africa in the basal branch of haplogroup N (33). This haplotype could have arrived with a back-to-Africa event in the spread of pastoralism from the Levant, or it could have differentiated from the L3 haplogroup inside Africa and later spread out of the continent. The Sahara aridification could have caused the isolation and survival of the haplotype in Takarkori while being replaced in other parts of Africa (33).
Regarding Egypt, Schuenemann et al. (34) analyzed 151 individuals from the Abusir el-Meleq settlement, carbon-dated from 1388 bce to 395 ce. Ninety mtDNA and three genome-wide SNP data samples show highly similar haplogroup profiles with low genetic distances between all samples, supporting the idea of genetic continuity in the region. The absence of sub-Saharan African mtDNA haplogroups in ancient samples as compared with modern Egyptians (with 20% sub-Saharan mtDNA) may be explained by recent sub-Saharan gene flow. Nuclear DNA data from these ancient Egyptians further supports these results and reveals a larger Neolithic Near Eastern component than in modern Egyptians, in agreement with the rest of North Africa ancient samples (34).
Outside mainland Africa, other ancient DNA samples have been useful to assess the genetic ancestry of North African populations. The ancient Guanche samples from 7th–11th centuries ce analyzed by parental markers and whole-genome data suggest a North African origin of the Canary Islands settlers, based on the presence of mtDNA U6 and Y-chromosome E1b1b1b1 haplogroups, which are autochthonous to North Africa (32,35), and a significant genetic component shared with Epipaleolithic North Africans in the whole-genome data (23). Autosomal data show a similar admixture profile as Moroccan KEB (21) and are consistent with a single ancestral North Africa origin, but with possible small introgression events after the first settlement of the islands (32). Additionally, European ancient DNA samples from Iberia and Mediterranean Islands confirm a widespread sporadic gene flow from North Africa to the north during the early Bronze Age (36,37).
Population Replacement versus Demographic Continuity Hypotheses
North Africa has been populated since the early stages of humankind (28), but the (still limited) ancient genomic data are only available since the Epipaleolithic population of Taforalt, who might be considered direct descendants of the autochthonous population of North Africa. The continuity of this autochthonous component until the present time has been challenged by constant gene flow into the region from neighboring populations (Fig. 1), which has partially replaced the original population in North Africa at different times (during the Paleolithic and Neolithic ages, and in historical times).
Gene flow in prehistoric times from Middle Eastern Natufians has been observed (21,22). This gene flow coincided with the last humid period of the region (38), which could have eased the connection between both populations (17). The Sahara experienced strong climatic oscillations during the Late Pleistocene and Early Holocene. In what is known as the ‘Holocene climatic optimum’, warmer and wetter environmental conditions appeared after the Last Glacial Maximum (from 12 to 5 ka), leading to an increase of waterways, flora and fauna that facilitated the spread of human groups across the Sahara (39–42). Later arid periods could have isolated some of these populations in refugia, causing the disappearance of genetic lineages due to genetic drift (33).
Concerning the Neolithic transition, controversy exists with hypotheses defending either cultural diffusion of agriculture (43) or replacement of the indigenous hunter–gatherer populations with Neolithic groups (44,45). The demic diffusion hypothesis has traditionally been accepted as the Neolithization mechanism in North Africa, with Middle Eastern populations suggested as the source of the transition (46,47), although contact with Iberian populations during Late Neolithic has also been observed (21,47,48) (Fig. 1). Nonetheless, recent analysis of contemporary genomes and their comparison with the Taforalt remains, as well as the endemic element shared between Taforalt and early Neolithic Ifri n’Amr or Moussa genomes (21), has shown a continuity of the Paleolithic component in North Africa (23), although this autochthonous Paleolithic component is much lower than the Paleolithic component observed in current Europeans. Therefore, although the impact of the Neolithic was dramatic in North Africa, it did not completely erase the autochthonous component, thereby also suggesting that cultural diffusion had taken place before the demic diffusion.
Eurasian gene flow after Neolithization seems to have had a lower genetic impact, as shown by Serra-Vidal et al. (23). The post-Neolithic movements with high genetic impacts on the region are: (i) a sub-Saharan gene flow, which was mainly due to trans-Saharan slave trade routes from the Roman period (1st century bc) through the Arab conquest and lasting until the 19th century (8,20) and (ii) the Arabization, which started in the 7th century and introduced gene flow from the Middle East across all of North Africa, thereby contributing to shape the east to west cline of the Middle Eastern component found in current North Africans (23,49) (Fig. 1). Other historical movements had only minor impacts on the genetic history; these movements include the arrivals of Phoenicians, Romans, Vandals, Byzantines, Ottoman Turks and other Mediterranean European populations (49).
Cultural and Genetic differentiation: Arabs and Imazighen (Berbers)
From a cultural point of view, populations in North Africa have been traditionally divided into Arabs and Berbers. Notably, ‘Berbers’ is a misnomer that traces back to Greco-Roman times (from the Latin word barbarus) for the original inhabitants of the region (50,51), who call themselves Amazigh (sing.)/Imazighen (pl.) (free people) (52). This differentiation has its origin in the Arab conquest of North Africa, when Arab groups occupied the region and imposed a new language, religion and customs (between the 7th and 11th centuries) (53–55). Most North Africans incorporated the new culture, admixed with the newcomers, and began to identify themselves as Arabs (56). But others escaped this influence and receded to the mountains and to remote villages, where they maintained their previous way of life, along with an Amazigh identity and language (Tamazight) (15,16,52). Imazighen are considered the autochthonous inhabitants of North Africa, as historical records account for their existence before the arrival of Phoenicians (814 bce) (54,57), and an archeological link with the pre-Holocene North African Capsian culture has been suggested (58). As mentioned previously, the comparison between Epipaleolithic and modern North African genomes has reinforced the idea of genetic continuity in the region (23).
Different studies have assessed the cultural differentiation in North Africa from a genetic point of view, targeting several markers like classical markers (13), mtDNA lineages (15), Y-chromosome haplogroups (16,18,59) and more recently, genome-wide data (8,20) and whole genomes (23). These studies revealed a remarkable heterogeneity within North African populations as well as a lack of clear differentiation between the Arab and the Amazigh populations as a whole. Although some Amazigh groups show differences to Arab populations, others share more genetic similarities with certain Arabs than with other people with whom they share a cultural identity. Nonetheless, some Tamazight-speaking populations are outliers and show sharp genetic differences with their neighboring Arab-speaking or even Tamazight-speaking populations. This can be attributed to processes of isolation and genetic drift (15,16,60), as well as to asymmetrical sub-Saharan genetic influence (20). Defining cultural and genetic populations in North Africa is thus challenging. Differential contact with arriving populations, and acceptance of the newcomers’ cultures, have led to heterogeneous admixture and local isolation processes, creating a complex mosaic of genetically diverse populations with dissimilar roles of culture in different parts of the region.
North Africa as a Source of Gene Flow
North Africa, the destination of diverse demographic movements throughout history, has not only been a sink but also an important source of gene flow to its surrounding regions (Mediterranean Europe, the Canary Islands and some sub-Saharan populations) (Fig. 1). Historical and archeological evidence exist for North African influence over its neighboring regions, and recent genetic studies using both present-day and ancient samples corroborate this. Due to its proximity and relatively recent historical events, the Iberian Peninsula is one of the main recipients of North African gene flow (14,61) (Fig. 1). The Arab expansion brought mainly Amazigh people to the Peninsula, where they stayed for more than 700 years (62,63). Nonetheless, archeological and anthropological findings account for much older contacts between both shores of the western Mediterranean, pointing to the Neolithic or even the late Paleolithic (64,65) (Fig. 1). The presence of mtDNA (14,66,67) and Y-chromosome sequences (59,68) of North African origin in Iberia, as well as evidence of admixture revealed with genome-wide data (19,61), support this trans-Mediterranean gene flow. Dates inferred with present-day samples place the Iberian admixture pulse in the Arab conquest (19,61), which probably masks older events; however, ancient DNA studies provide genetic evidence for the previously reported prehistoric contacts between both coasts (36,69). Other regions of southern Europe, such as Italy and the south of France, were also destinations of North African gene flow (19,61), although to a lesser degree than the Iberian Peninsula. Dating of the North African components in such regions places the migration events at least 5–7 generations ago (61), but a recent study estimated that the admixture pulse in Italy was much older, suggesting movements from North Africa coinciding with the fall of the Roman Empire around the fourth century (61).
The Canary Islands, consistent with their closeness to the western African coast, also show traces of migrations from the continent (70,71) (Fig. 1), and strong evidence in current and ancient genomes corroborates the North African origin of their first settlers (19,32,35,61,72,73). Genome-wide data revealed that the geographical location of the source of admixture differs between the Canary Islands and southern Europe, of the Atlantic coast for the former, and the Mediterranean front for the latter (61).
Southward gene flow from North Africa into sub-Saharan populations has been related to the spread of pastoralism (Fig. 1). Cattle domestication appeared in North Africa during the Neolithic (74–76), and contacts with southern populations introduced lactase persistence alleles of North African origin, which can be detected in some current-day sub-Saharan populations (4,77–79). Pastoralist migrations from North to East Africa date to around ~4.5–3.3 ka (74,79), whereas contacts between North and West Africa seem younger (around ~2 ka) and are also contemporary with the first traces of pastoralism in western Africa (77).
Concluding Remarks
One main challenge for studying the genetic landscape of North Africa is the insufficiency of available data. No data existed for individual complete genomes until recently, and there is still a lack of whole genomes at the population level. Ancient genome data are also limited despite recent efforts. There is an urgent need for genomic data in the region, not only to unravel the questions related to its population history, but also to understand the genetic variants and genomic regions involved in health and disease conditions. Given the extensive and bi-directional connections between North Africa and its surrounding regions, studying the genetic and genomic variation as well as disease risk patterns of its populations could also have an impact outside North African borders; in European, Middle Eastern, and sub-Saharan populations.
Acknowledgements
Authors thank Gerard Serra-Vidal and Lara R. Arauna for helpful comments on the manuscript.
Conflict of Interest statement. None declared.
Funding
Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación (PID2019-106485GB-I00/AEI/10.13039/501100011033) and “Unidad de Excelencia María de Maeztu” (AEI – CEX2018-000792-M); and Agència de Gestió d’Ajuts Universitaris i de la Recerca (2017SGR00702).
References
Author notes
Marcel Lucas-Sánchez and Jose M. Serradell have contributed equally to this work.