Major genomic mitochondrial lineages delineate early human expansions
Published 13 Aug 2001
The phylogeographic structuring of the human mitochondrial DNA variation has propitiated a genetic approach to study the modern homo sapiens dispersals throughout the world from a female perspective.
Complete mitochondrial DNA(mtDNA) sequences from 42 human lineages, representing major clades with known geographic assignation have been analyzed phylogenetically. Relative relationships among them and more accurate temporal calibrations give new perspectives as how modern humans spread in the Old World.
The first detectable expansion occurred around 59,00069,000 years ago from Africa, independently colonizing western Asia and India and, following this southern route, swiftly reaching east Asia. Within Africa, this expansion did not replace but mixed with older lineages detectable today only in Africa. Around 39,00052,000 years ago, the western Asian branch spread radially, bringing Caucasians to North Africa and Europe, also reaching India, and expanding to north and east Asia. More recent migrations have entangled but not completely erased these primitive footprints of modern human expansions.
Human mtDNA is a non-recombining molecule with maternal inheritance and practically haploid genetics. Differences between mtDNA sequences are only due to mutation. As time passes, mutations accumulate sequentially along less and less related molecules that constitute independent lineages known as haplotypes.
Relationships among lineages can be estimated by phylogenetic networks where mutations are classified in hierarchical levels. Basal mutations are shared for clusters of lineages, defined as haplogroups, whereas those at the tips characterize individuals. Major haplogroups are continental or ethnically specific. Three of them (L1, L2, and L3) group sub-Saharan African lineages, nine (H, I, J, K, T, U, V, W and X) encompass almost all mtDNAs from European, North African and Western Asian Caucasians. Finally, haplogroups A, B, C, D, E, F, G and M embrace the majority of the lineages described for Asia, Oceania and native Americans.
The geographic distribution of derived branches of these haplogroups have shed light on crucial aspects of human history as the probable origin and approximate dating of migrations into the New World and Polynesia, or quantitative estimations of the relative Paleolithic and Neolithic contributions to the extant European mtDNA diversity. At the other end, the ultimate coalescence of all worldwide mtDNA lineages into Africa favored, since the beginning, the recent African origin hypothesis for all modern humans.
The analyses of the complete mtDNA sequence of 53 humans of diverse origins have added statistical support to this hypothesis. However, as the current definition of the major haplogroups is not based on total genomic sequences, there is not yet a clear resolution of their basal relationships. This genomic phylogenetic reconstruction is necessary to infer the early human dispersal routes after the African exodus. We present the phylogenetic network of 42 complete mtDNA sequences including representatives of the major haplogroups. Based on their relative clustering and coalescence ages we propose a tentative model of the way the Old World could have been colonized by modern humans.
Results and Discussion
The phylogenetic network of the 42 mtDNA sequences was free of reticulations when mutations 150, 152, 303i and 16519 were omitted in its construction. The tree topology was the same as the bootstrap supporting neighbor joining tree. We detected 35 parallel substitutions from 124 variable positions (28%) in the non-coding region (1,122 bp in length), and 45 from 409 (11%) in the coding region (15,447 bp in length). Shared mutations in basal branches of the tree relate haplogroups, however, parallel mutations should be avoided in their global affiliations.
As can be expected from haplotypes of well-differentiated haplogroups the majority of mutations are in the external branches of the tree, including those that specifically define them. Nevertheless, it is well known that in population studies these main lineages sprout into several sub-clusters sometimes with interesting geographic localization. In the cases where representatives of these sub-clusters have also been analyzed, it is evident that the African ones are at the same level of divergence as non-African clusters.
More information of cluster structure in Africa is necessary. In non African groups, two haplotypes belonging to sub-haplogroup U2 have a divergence similar to that found between other sub-clusters of the Caucasian U haplogroup. One of them, lacking mutations 16129C and 15907, that are present in all western Eurasian representatives, resembles haplotypes found in India.
The proposed inclusion of haplogroup K into the U cluster is confirmed, being U7 its most probable related sub-clade. Main Asian haplogroups belong to two different major clusters, whereas A and B rooted with Caucasoid haplogroups, C, D, G and M constitute a monophyletic cluster. Likewise, African haplogroup L3 is more related to Eurasian haplogroups than to the most divergent African clusters L1 and L2.
Chimpanzee rooting shows that the oldest lineage of extant modern humans is the African L1a cluster. In addition, the significant bootstrap values on the deep African branches reinforce the statistical support that the out of Africa hypothesis has obtained through a parallel genomic mtDNA study. We have estimated a minimum total coalescence for modern human lineages from 156,000 to 169,000 years before present (yr BP).
The two subsequent ancient splits also happened inside Africa, originating the L1b/c and L2 haplogroups with ages of 122,000132,000 yr BP and 85,00095,000 yr BP respectively. These three clades still have an overwhelming sub-Saharan African implantation. The next branching, dated between 59,00069,000 yr BP, also occurred in Africa but comprising clades currently found only in this continent (L3), and others with a first expansion out of Africa.
Today, L3 derivatives are present in nearly all the African populations. This ancient spread inside Africa has been directly detected by the ages of several sub-clade expansions and indirectly confirmed by genetic admixture, involving archaic and modern autosomal gene alleles, detected only in Africa. The coexistence in African populations of very divergent non-recombining lineages may erroneously bias demographic estimations based on pair-wise nucleotide differences.
Two hypothetical routes for the Asian colonization have been proposed, one through Central Asia and one through South Asia. Coincidentally, we detect at least two independent lineages spreading out of Africa. One comprises all M derivatives that radiated 30,00057,600 yr BP. Subsequent expansions of this clade have been found in India and Eastern Asia where it possibly originated and expanded as haplogroups C, D, G and others.
The star-like radiation of these clades suggests that this wide geographic colonization could have happened in a relatively short time. Genetic support for this southern spread of M through Ethiopia and the Arabian Peninsula along South Asia has been recently proposed due to the presence of subclade M1 in Eastern Africa. However, a posterior return from Asia to Africa of these lineages is a more plausible explanation because the genetic diversity of M is much greater in India than in Ethiopia.
In fact, M1 could be a branch of the Indian cluster M as ancestral motifs of the African M1 are found in M*, M3 and M4 Indian subclusters. Furthermore, one of the most derived M3 haplotypes in India (10398, 10400, 16086, 16129, 16223, 16249, 16259, 16311) has all the basic substitutions that defined the Ethiopian clade, excepting the highly variable 16189. This supposed Indian expansion to the west also reached northern areas since evolved representatives of M4 have been also detected in Central Asia. We may consider the upper bound for this return to Africa 25,00047,000 yr BP, the age calculated for M1 in Eastern Africa based on HVSI sequences or 33,00063,000 obtained using RFLPs.
The other major branch that left Africa gave origin mainly to Caucasoid lineages which is congruent with a northern route through the Levant. With a lower bound of 43,00053,000 yr BP this branch spread into at least three main clusters. One comprises haplogroups X and A with only a shared mutation between them and different geographic distributions. Whereas A is widespread in Asia, X is mainly restricted to Europe. Curiously, representatives of both clusters have been detected in native Americans raising the possibility that some American Indian could have European ancestry.
Nevertheless, X haplotypes have recently been detected in Central Asia. These Asian X haplotypes lack the 225A mutation, as the majority of the American X, pointing to this area as the most probable source for the dispersal of the New World founders.
The second cluster groups minor haplogroups W, I and N1b, the three are present although in low frequencies in Europe, Near East and Caucasus but only I and N1b have been also detected in Egypt and Arabia. The last group radiated around 39,00052,000 yr BP, giving at least four ancestral clusters. One of them originated haplogroup B that expanded to Eastern Asia, reaching Japan and southeastern Pacific Archipelagos.
In early studies, this clade was defined by the 9-bp COII-tRNALys deletion but after that it has been found with independent origins on other haplogroup backgrounds. In this study we have detected this deletion on an Iberian haplotype belonging to haplogroup I. Curiously, it was also found in an Italian haplotype I. However, the 9-bp deletion was absent in a wide screen that we carried out on Iberian and Northwest African I haplotypes.
The detection in two Mediterranean populations of I haplotypes harboring the 9-bp deletion points to the existence in this area of a subset of the I haplotypes that share a recent common ancestor. As happens with A, haplogroup B has not been found in northern India8 but is present in Mongolia, favoring a Central Asian route for the expansion of these prominent Asian haplogroups.
Two additional clades join haplogroups J and T and haplogroups H, V and HV respectively. Derivatives of at least some of them are found in Europe, North Africa, Central Asia and even India, but the most probable origin for all these expansions is the Near East-Caucasus area. Finally, cluster U seems to have suffered a radial spread, giving subsequent diversification in different geographic areas. Three sub-haplogroups, U2, U5 and U6 had their major expansions in India, Europe and North Africa respectively. U2 split in two branches, one, characterized by mutations 16129C and 15907, is geographically scattered from Western Europe to Mongolia but has not been detected in North Africa.
The other reached India where it gave origin to several sub-clusters with global frequencies around 10% being, after its predecessor haplogroup M (53%), the second most abundant haplogroup in India. U7 with a minor implantation in Europe but third in frequency in India and also not detected in North Africa might have had a similar expansion as U2.
The main radiation of haplogroup U5 occurred in Europe. It has been stated that this lineage entered Europe during the Upper Paleolithic, most probably from the Middle East-Caucasus area. The great divergence found here for the two U5 representatives is in agreement with the old age proposed for this haplogroup. Finally, U6 traces the first detectable Paleolithic return to Africa of ancient Caucasoid lineages. It has been mostly found in Northwest Africa, with a global estimated age of 47,000 years reflecting an old human continuity in that rather isolated area. The fact that in Europe it has only been detected in the Iberian Peninsula rules out a possible European route, unless a total lineage extinction in all the path is invoked. On the other hand, its presence in Northeast Africa, albeit in low frequencies, reinforces its way through North Africa. A third possibility could be that this lineage never went out of Africa but its coalescence with clades which all had prominent expansions in Eurasia weakens this option.
U3 has also been found with a comparatively higher frequency in Northwest Africa and might have followed the same route as U6, however, as its star-like expansion in the Caucasus has been dated around 30,000 yr BP, it most probably reached Africa in a posterior expansion. This out of Africa and back again hypothesis has also been suggested for Y-chromosome lineages. Subsequent Neolithic and historic expansions have doubtlessly reshaped the human genetic pool in wide geographic areas but mainly as limited gene flow, not admixture, between populations. Consequently, the continental origin of the major haplogroups can still be detected and the earliest human routes inferred through them.
After the out of Africa, modern humans first spread to Asia following two main routes. The southern one is represented by haplogroup M and related clades that are overwhelmingly present in India and eastern Asia. The northern one gave a posterior radiation that, through Central Asia, again reached North and East Asia carrying, among others, the prominent lineages A and B. Later expansions, can be detected by the presence of subclades of haplogroup U in India and Europe. There were also returns to Africa, most probably from the same two routes. The return from India could be detected by the presence of derivatives of M in Northeast Africa, and the arrival of Caucasoids by the existence of a subclade of haplogroup U that, today, is mainly confined to Northwest Africa.
Materials and Methods
We have manually sequenced 33 complete mtDNA genomes from available samples previously assigned to major haplogroups. To include lacking haplogroups we added 9 published sequences to the analyses.
Complete mtDNA sequences
Complete mtDNA were amplified in 32 overlapping fragments with primers and PCR conditions described in Table 2. The same primers were utilized to directly sequence both strands of the fragments using the Promega fmol® DNA Cycle Sequencing System and the Usb Thermo Sequenase Radiolabelled Terminator Cycle Sequencing Kits.
Sequences were aligned manually. Phylogenetic relationships were estimated using median-joining networks as implemented in Network 2.0d http://www.fluxus-engineering.com and refined by hand. The same topology was obtained using the neighbor-joining method. A chimpanzee sequence (GenBank accession n° D38113) was added to root the networks. Statistical significance of the branches were accomplished by bootstrap resampling with 1000 replications (PHYLIP Package 3.5c, http://evolution.genetics.washington.edu/phylip.html). Minimum estimates of coalescence ages, and 95% confidence intervals, were based on mean divergence among lineages for the coding region and a constant evolutionary rate of 1.7 × 10-8 per site per year that has been inferred for this region on the basis of 53 complete mtDNA sequences.
This article is available from: http://www.biomedcentral.com/1471-2156/2/13
© 2001 Maca-Meyer et al; licensee BioMed Central Ltd. Verbatim copying and redistribution of this article are permitted in any medium for any non-commercial purpose, provided this notice is preserved along with the article's original URL. For commercial use, contact email@example.com
Homepage | The Blackboard | TriniView Forum