Saturday, March 8, 2014
I'll expand on this entry later today. For now the spreadsheet below should make sense to a lot people. Please note the close correlation between the ANE levels among the Karitiana Indians here (around 43%), and the Karitiana genome featured in the TreeMix analysis in Raghavan et al. 2013 (41.6%, plus or minus 3.4%). That's pretty damn close. Nevertheless, I think there is an issue with this ANE component, which is that it contains South Asian admixture, but more on that soon.
ANE K=4 ADMIXTURE Test spreadsheet
Thursday, February 27, 2014
Human Biology recently posted several open access manuscripts dealing with the topic of Jewish origins (see submissions from 2013 here). One of these preprints is essentially a rebuttal to an Eran Elhaik paper from a couple of years ago, which argued that a substantial part of Ashkenazi Jewish ancestry was derived from within the Khazar Empire. The leading author of the new preprint is Doron M. Behar, but thirty people in all, many of them well known scientists, have put their names on it. Here's the abstract:
The origin and history of the Ashkenazi Jewish population have long been of great interest, and advances in high-throughput genetic analysis have recently provided a new approach for investigating these topics. We and others have argued on the basis of genome-wide data that the Ashkenazi Jewish population derives its ancestry from a combination of sources tracing to both Europe and the Middle East. It has been claimed, however, through a reanalysis of some of our data, that a large part of the ancestry of the Ashkenazi population originates with the Khazars, a Turkic-speaking group that lived to the north of the Caucasus region ~1,000 years ago. Because the Khazar population has left no obvious modern descendants that could enable a clear test for a contribution to Ashkenazi Jewish ancestry, the Khazar hypothesis has been difficult to examine using genetics. Furthermore, because only limited genetic data have been available from the Caucasus region, and because these data have been concentrated in populations that are genetically close to populations from the Middle East, the attribution of any signal of Ashkenazi-Caucasus genetic similarity to Khazar ancestry rather than shared ancestral Middle Eastern ancestry has been problematic. Here, through integration of genotypes on newly collected samples with data from several of our past studies, we have assembled the largest data set available to date for assessment of Ashkenazi Jewish genetic origins. This data set contains genome-wide single-nucleotide polymorphisms in 1,774 samples from 106 Jewish and non- Jewish populations that span the possible regions of potential Ashkenazi ancestry: Europe, the Middle East, and the region historically associated with the Khazar Khaganate. The data set includes 261 samples from 15 populations from the Caucasus region and the region directly to its north, samples that have not previously been included alongside Ashkenazi Jewish samples in genomic studies. Employing a variety of standard techniques for the analysis of populationgenetic structure, we find that Ashkenazi Jews share the greatest genetic ancestry with other Jewish populations, and among non-Jewish populations, with groups from Europe and the Middle East. No particular similarity of Ashkenazi Jews with populations from the Caucasus is evident, particularly with the populations that most closely represent the Khazar region. Thus, analysis of Ashkenazi Jews together with a large sample from the region of the Khazar Khaganate corroborates the earlier results that Ashkenazi Jews derive their ancestry primarily from populations of the Middle East and Europe, that they possess considerable shared ancestry with other Jewish populations, and that there is no indication of a significant genetic contribution either from within or from north of the Caucasus region.
I'm really not sure what to make of all of this attention that the Khazar hypothesis is still getting? It's been obvious for a while now that in terms of genetic structure Ashkenazi Jews are basically a group of East Mediterranean origin. But Elhaik's paper did get a fair bit of media coverage, so I suppose after that a rebuttal was to be expected.
In any case, I'm not complaining. This paper includes a very interesting genotype dataset of many previously unpublished samples, which I tested last week with PCA (see here).
Behar, Doron M.; Metspalu, Mait; Baran, Yael; Kopelman, Naama M.; Yunusbayev, Bayazit; Gladstein, Ariella; Tzur, Shay; Sahakyan, Havhannes; Bahmanimehr, Ardeshir; Yepiskoposyan, Levon; Tambets, Kristiina; Khusnutdinova, Elza K.; Kusniarevich, Aljona; Balanovsky, Oleg; Balanovsky, Elena; Kovacevic, Lejla; Marjanovic, Damir; Mihailov, Evelin; Kouvatsi, Anastasia; Traintaphyllidis, Costas; King, Roy J.; Semino, Ornella; Torroni, Anotonio; Hammer, Michael F.; Metspalu, Ene; Skorecki, Karl; Rosset, Saharon; Halperin, Eran; Villems, Richard; and Rosenberg, Noah A., No Evidence from Genome-Wide Data of a Khazar Origin for the Ashkenazi Jews (2013). Human Biology Open Access Pre-Prints. Paper 41.
Elhaik E. The missing link of Jewish European Ancestry: contrasting the Rhineland and Khazarian hypotheses. Genome Biol Evol. 2012. doi:10.1093/gbe/evs119, Advance Access publication December 14, 2012.
Near Eastern origin of Ashkenazi Levite R1a
Tuesday, February 18, 2014
I'm just testing and exploring these genomes at the moment using Principal Component Analysis (PCA) and the ADMIXTURE software.
The first PCA below features MA-1 or Mal'ta boy, a child from an Upper Paleolithic hunter-gatherer burial site near Lake Baikal, South Siberia. He appears closest to Central Asians, but in fact falls within a cline that runs from Europe to the Americas. Hopefully that's not too confusing, because to me it actually makes good sense based on the formally published analyses of his genome to date.
But this doesn't necessarily mean that MA-1 is of mixed origin, because today’s Europeans, Central Asians and Native Americans didn't come into existence until well after the Upper Paleolithic. Rather, what it suggests is that MA-1's close relatives mixed with the ancestors of many present-day Eurasians and Amerindians.
The other results also turned out as expected, so the ancient individuals should be easy to spot for those of you familiar with them (click on the "world PCA" links below to download the PCA plots in PDF format). By the way, to maximise the number of samples and high-quality SNPs in each analysis, I didn’t prune the markers to correct for the effects of Linkage Disequilibrium. I plan to do that in future analyses.
MA-1, 24,000 YBP, South Siberia, world PCA, 13K SNPsTo further investigate the genetic affinities of the two European-like ancient individuals, MA-1 and La Brana-1, I ran them against more limited reference sets.
Anzick-1, 12,000 YBP, North America, world PCA, 106K SNPs
La Brana-1, 7,000 YBP, Iberia, world PCA , 23K SNPs
Saqqaq, 4,000 YBP, West Greenland, world PCA, 68K SNPs
Aus_Aboriginal, 100 YBP, Western Australia, world PCA, 46K SNPs
Below is a PCA of West Eurasia featuring MA-1, using just over 15K SNPs. In the first eigenvector, the ancient Siberian is very close to present-day Northeast Europeans, and most distant from Middle Eastern groups, like the Bedouin. In eigenvector 2 he seems to be allergic to everyone, particularly Sardinians, northwest Africans and Basques. The result actually looks very similar to the one from Lazaridis et al., where MA-1 was projected onto the plot (see here). Also, please note, I rotated and stretched out the canvass horizontally to fit geography.
On a PCA of North Eurasia (also using 15K SNPs), MA-1 does not cluster with present-day Siberians. This suggests his relatives were absorbed and/or replaced by genetically more easterly groups sometime after the Upper Paleolithic.
In comparison, La Brana-1 appears significantly more western than MA-1. This of course makes sense for a Mesolithic hunter-gatherer from Iberia, and again correlates well with the result from the Lazaridis et al. preprint. Nevertheless, much like MA-1, he's also out of range of modern West Eurasian genetic variation. Both of these PCA were done with just over 26K SNPs.
Indeed, when MA-1 and La Brana-1 are featured together on the same West and North Eurasian plots (using 10K SNPs), they create what appears to be an ancient Eurasian hunter-gatherer cluster, despite their very different origins in space and time. If this outcome isn’t an artefact of the relatively low quality of the data, then I’d say it might have something to do with the suspected high mobility of forager groups on the expansive Eurasian Mammoth steppe.
I also ran two different ADMIXTURE analyses. The results can be seen in the spreadsheets here and here. Both are supervised tests, but the former attempts to fit the ancient genomes to twelve present-day population sets, while the latter to thirteen so called ancestral components inferred from a wide range of present-day samples.
Interestingly, in the first test La Brana-1 is fitted as 100% Lithuanian, while MA-1 appears to be a mixture between Orcadians and Amerindians. Perhaps this is because Orcadians, like other Northwest Europeans, often show pronounced genetic affinities to West and Central Asia that are less evident, or even lacking, among populations of the East Baltic region? However, when I remove the Orcadians from the analysis, the Lithuanians very capably take their place (see here), so perhaps this isn’t something worth mulling over too much?
All of the other K=12 results appear fairly straightforward and easy to interpret, which is something that unfortunately can’t be said for any of the K=13 outcomes.
The problem with the K=13 is that its markers overlap very poorly with those of the ancient genomes, which basically creates uncertainty and thus high levels of noise. Nevertheless, if the results aren’t taken too literally and interpreted in the right context they do make sense. Indeed, here's a PCA of selected European groups, as well as MA-1 and La Brana-1, based on population averages from the K=13. I think it looks very reasonable.
Many more analyses featuring ancient genomes are on the way at Eurogenes, so please stay tuned.
Raghavan et al., Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans, Nature, (2013), Published online 20 November 2013, doi:10.1038/nature12736
Rasmussen et al., The genome of a Late Pleistocene human from a Clovis burial site in western Montana, Nature, (2013), Published online 12 February 2014, doi:10.1038/nature13025
Ancient North Eurasian (ANE) levels across Asia
Friday, February 14, 2014
This paper by Hellenthal et al. is a valiant attempt to map out the history of human admixture with modern DNA. Hopefully it's the last.
I don't want to sound too harsh, because it's fascinating read, and the companion website a lot of fun, but studies like this really need ancient genomes nowadays to look convincing. Let's hope these guys can find the resources to repeat this effort using a wide range of carefully chosen ancient remains.
Indeed, here are a couple of examples of the sort of stuff that makes me sceptical about the accuracy of the methods employed by the authors. First of all, unless I'm reading this map wrong, then what I'm seeing here is a 23% contribution from the Hadza of East Africa to the Lithuanians. Apparently, this supposed admixture event happened during the early middle ages. Is this a typo or what?
Secondly, what's with the difference between the Orcadians and Norwegians? How can the Orcadians be unmixed if a large part of their recent ancestry derives from Norse settlers (which it certainly does)? So, did all of this admixture in Norway take place after the pure Norwegians settled the Orkneys? It's possible, but hardly plausible.
Hellenthal et al., A Genetic Atlas of Human Admixture History, Science 343, 747 (2014), DOI: 10.1126/science.1243518
Tuesday, February 4, 2014
A paper at PLoS ONE describes the discovery of a new, but possibly now extinct, mtDNA C1 lineage from the Yuzhnyy Oleni Ostrov Mesolithic archaeological site in far Northwestern Russia. The enigmatic lineage, classified as C1f, is the bridge between the various C1 subclades carried by modern-day Native Americans, Siberians and Icelanders.
The human mitochondrial haplogroup C1 has a broad global distribution but is extremely rare in Europe today. Recent ancient DNA evidence has demonstrated its presence in European Mesolithic individuals. Three individuals from the 7,500 year old Mesolithic site of Yuzhnyy Oleni Ostrov, Western Russia, could be assigned to haplogroup C1 based on mitochondrial hypervariable region I sequences. However, hypervariable region I data alone could not provide enough resolution to establish the phylogenetic relationship of these Mesolithic haplotypes with haplogroup C1 mitochondrial DNA sequences found today in populations of Europe, Asia and the Americas. In order to obtain high-resolution data and shed light on the origin of this European Mesolithic C1 haplotype, we target-enriched and sequenced the complete mitochondrial genome of one Yuzhnyy Oleni Ostrov C1 individual. The updated phylogeny of C1 haplogroups indicated that the Yuzhnyy Oleni Ostrov haplotype represents a new distinct clade, provisionally coined “C1f”. We show that all three C1 carriers of Yuzhnyy Oleni Ostrov belong to this clade. No haplotype closely related to the C1f sequence could be found in the large current database of ancient and present-day mitochondrial genomes. Hence, we have discovered past human mitochondrial diversity that has not been observed in modern-day populations so far. The lack of positive matches in modern populations may be explained by under-sampling of rare modern C1 carriers or by demographic processes, population extinction or replacement, that may have impacted on populations of Northeast Europe since prehistoric times.
Der Sarkissian C, Brotherton P, Balanovsky O, Templeton JEL, Llamas B, et al. (2014) Mitochondrial Genome Sequencing in Mesolithic North East Europe Unearths a New Sub-Clade within the Broadly Distributed Human Haplogroup C1. PLoS ONE 9(2): e87612. doi:10.1371/journal.pone.0087612
Post-Mesolithic population replacements/extinctions in Northeastern Europe
Monday, January 27, 2014
Nature today published the eagerly awaited paper on the complete genome of La Brana 1: Olalde et al. 2014. The relatively high quality (almost x4 depth read) genome suggests that the Iberian hunter-gatherer had blue eyes, dark hair and deep brown skin.
Moreover, he was probably lactose intolerant (in other words, unlike most Europeans today, he couldn't drink milk as an adult), and his Y-chromosome belonged to the European-specific, but extremely rare, haplogroup C6 (aka. C-V20), and mtDNA to haplogroup U5b2c1, which again is a European-specific marker. Below is an artist's impression of his mug (courtesy of CSIC), and below that the paper abstract.
Ancient genomic sequences have started to reveal the origin and the demographic impact of farmers from the Neolithic period spreading into Europe1, 2, 3. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet4. However, the limited data available from earlier hunter-gatherers preclude an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. Here we sequence an approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, to retrieve a complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across western and central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer.
Indeed, the pigmentation traits are basically the same as those of Loschbour, a Mesolithic genome from Luxembourg, featured recently in the groundbreaking Lazaridis et al. preprint (see here). So we can already speculate with some confidence that this was a common, and perhaps dominant, trait combination among European hunter-gatherers.
However, early European farmers, whose ancestors almost certainly migrated to Europe from the Near East during the Neolithic, probably had somewhat different pigmentation traits. We know this because a 7,500 year-old Linearbandkeramik (LBK) farmer genome from Stuttgart, Germany, also featured in Lazaridis et al., showed markers for brown eyes, dark hair, and relatively light skin.
So as things stand, it appears that Europeans only acquired their present coloring, including pale skin and a high incidence of light eyes, relatively recently, well after the hunter-gatherers and farmers began mixing, and their hybrid DNA had time to go through some really powerful selective sweeps. These sweeps were possibly in part a reaction to the Neolithic diet, rich in carbohydrates but poor in vitamin D, amongst other things. Vitamin D doesn't have to be acquired from food because the body can synthesize it from the sun, but this is done more effectively by people with fair skin, giving them an advantage, especially in places like Europe, which has fairly long winters and lots of cloud cover.
But perhaps this isn't the full story, and present-day European pigmentation traits are also sourced from a late migration into Europe of a prevailingly blond people from somewhere in what is now Russia?
This might sound far fetched, but during the middle Bronze Age the Eurasian steppe was home to the Andronovo culture, with archeological links to earlier cultures in what is now southern Russia. Based on the DNA of Andronovo nomads from Kurgans in South Siberia, it seems they had fair skin and a lot of blue eyes and blond hair (see here). They also overwhelmingly belonged to Y-chromosome haplogroup R1a1a, which is very common today in Central and Eastern Europe and also parts of Scandinavia. So it'll be interesting to see the pigmentation markers of Mesolithic Eastern Europeans and Central Asians when their genomes become available, probably in the not too distant future, and if they contributed any ancestry to present-day Europeans. Early indications are that they did, and I discussed that in my previous blog entry here.
Now, La Brana 1 and Loschbour were both classified as part of the West European Hunter-Gatherer (WHG) mata-population by Lazaridis et al., even though only a partial sequence from La Brana 1 was available at the time. As far as I can see, the results in Olalde et al. based on the complete genome don't contradict this classification, because they show that La Brana 1 is most similar to present-day Europeans from around the Baltic Sea, just like Loschbour. Note, for instance, the position of Swedes (SE) and Poles (PL) on the far right of these graphs, indicating inflated allele sharing between them and La Brana 1 relative to other Europeans.
But the paper also underlines the very close genetic relationship between La Brana 1 and the 24,000 year-old genome from Siberia known as MA-1, which was used as the proxy for the Ancient North Eurasian (ANE) ancestral component in Lazaridis et al. The authors' comments remind me of an earlier study based on ancient mtDNA data which argued that the region from Iberia to Central Siberia was home to a relatively homogeneous gene pool during the Mesolithic, with high frequencies of mtDNA haplogroups U2, U4 and U5 (see here). Please note, MA-1 belonged to Y-haplogroup R*, but carried mtDNA haplogroup U*.
Outgroup f3 and D statistics (16,17), using different modern reference populations, support that Mal’ta is significantly closer to La Brana 1 than to Asians or modern Europeans (Extended Data Fig. 5 and Supplementary Information). These results suggest that despite the vast geographical distance and temporal span, La Brana 1 and Mal’ta share common genetic ancestry, indicating a genetic continuity in ancient western and central Eurasia. This observation matches findings of similar cultural artefacts across time and space in Upper Paleolithic western Eurasia and Siberia, particularly the presence of anthropomorphic ‘Venus’ figurines that have been recovered from several sites in Europe and Russia, including the Mal’ta site.
Unfortunately, I have to say that the main Principal Component Analysis (PCA) from the paper isn't as informative as it could have been, due to the large number of Finnish individuals included in the analysis. It's largely a reflection of the recent population growth, founder effect and genetic drift among Finns, particularly those from eastern Finland. This is exactly the same problem that affected the PCA in Sánchez-Quinto et al. 2012, the paper which reported on the partial genomic sequences of La Brana 1 and 2 (see here).
Nevertheless, note that all of the non-Finnish Europeans more or less fall along the cline that runs from La Brana 1 to present-day Cypriots. This suggests that Europeans today are mostly the product of mixture, in varying degrees, between indigenous European hunter-gatherers, like La Brana 1 and Loschbour, and immigrant Neolithic farmers from the East Mediterranean. So it's a result that basically agrees with the findings of Lazaridis et al.
Interestingly, Loschbour and four other Mesolithic samples from Lazaridis et al. belonged to Y-chromosome haplogroup I, which is not at all closely related to C6. This hints at the presence of a diverse Y-chromosome gene pool in pre-Neolithic Europe, and indeed I'm still confident of seeing R1 and/or R1a among Mesolithic remains from Eastern Europe.
Even though the vast majority of haplogroup C clades are today specific to Eastern Asia, Oceania and the Americas, C6 has only been found among a handful of individuals from across Southern, Western and Central Europe, many of whom are listed at the FTDNA haplogroup C project (look for the V20+ results here). It's difficult to say when this marker or its ancestral lineage migrated to Europe, but C is one of the most basal human Y-chromosome clades, so it could represent the very first Anatomically Modern Human (AMH) wave into Europe, which actually isn't a new concept (see Scozzari et al. 2012).
The Olalde et al. paper includes a lot more information than I'm willing to cover in this blog entry. If you don't have access to the main report, please note that the extended and supplementary data are very detailed and open access.
Olalde et al., Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European, Nature (2014), doi:10.1038/nature12960
Wednesday, January 8, 2014
I've now had a chance to look over the Lazaridis et al. preprint a few times, and also take part in several online discussions about the results, at these blogs and elsewhere. So I thought it might be useful to put together another post on the paper to report what I've learned and reiterate a few points. First of all, to understand the results, it's really important to known what the four main ancestral components in this study represent:
- West European Hunter-Gatherer (WHG), based on an 8,000 year-old genome from Loschbour, Luxembourg
- Ancient North Eurasian (ANE), based on a 24,000 year-old genome from South Siberia (dubbed Mal'ta boy or MA-1)
- Early European Farmer (EEF), based on a 7,500 year-old genome from Stuttgart, Germany, belonging to the Neolithic Linearbandkeramik (LBK) culture
- Eastern non-African (ENA), this basically means East Eurasian, and is based on samples of present-day Onge, Han Chinese and Atayal from Taiwan
Now, from what I've seen online, many people seem to think that ANE is more East Asian than European, and can be considered a signal of pretty much any population expansion from the east into Europe. This is not true. ANE is Amerindian-like, but actually also very similar to WHG. In fact, they're equidistant from ENA:
The results of Table S12.1 provide suggestive evidence that Onge share more common ancestry with hunter-gatherers than with Stuttgart. All statistics involving two hunter-gatherer populations have |Z|<0.9, so ancient Eurasian hunter-gatherers are approximately symmetrically related to Onge, and they are all more closely related to them than is Stuttgart.
We next consider the relationship of ancient samples to East Asia using the set (Ami, Atayal, Han, Naxi, She). East Asians are more closely related to all hunter-gatherers than to Stuttgart, but there are no significant differences between hunter-gatherers (all such statistics have |Z|<1.1) (Table S12.2).
We have conveniently labeled MA1-related ancestry “Ancient North Eurasian” because of the provenance of MA1 in Siberia, but at present we cannot be sure whether this type of ancestry originated there or was a recent migrant from some western region.
The various Uralic, Turkic and Mongolian groups expanding into Europe, usually after the Bronze Age, no doubt carried significant ENA, so these groups can't be the source of the fairly high levels of ANE across Europe today, because most Europeans lack ENA. Below is a graph based on two f4 tests, comparing ANE and ENA ancestry among Europeans, this time with the Han Chinese as ENA proxies. Note that most of the samples fall within a cline that runs from the Stuttgart sample to Estonians. The only outliers in the direction of the Han are groups from current or former Uralic and Turkic speaking areas of Europe.
ANE was actually present in Scandinavia during the Mesolithic, because Motala12, the 8,000 year-old hunter-gatherer genome from Sweden, has an ANE ratio of 19%. But this isn't enough to explain the ANE levels carried by most present-day Europeans, so it's very likely there were at least two expansions of ANE into Europe.
Considering that Loschbour and Stuttgart totally lack ANE, it's plausible that a major wave of ANE moved across much of Europe sometime after the early Neolithic, but obviously before the Uralic and Turkic expansions, which, as per above, were rich in ENA. Based on recently published ancient mtDNA evidence from Central Europe (see here), Lazaridis et al. propose that this timeframe was the Copper Age, which is also often called the Late Neolithic/Early Bronze Age.
This of course is the generally accepted Proto-Indo-European timeframe. Indeed, the theory I put forward in the previous blog entry (see here) that most of the ANE in Europe today was the result of the Proto-Indo-European expansion, probably from Eastern Europe, looks even better on closer inspection.
Note the elongated cline formed by the European samples running from WHG to EEF on Fig 2B, shown below. It correlates well with latitude, and very likely reflects northward migrations of Neolithic farmers into Europe from the Mediterranean Basin, followed by isolation-by-distance. In other words, this cline probably took thousands of years to form.
On the other hand, there is no cline running from WHG/EEF to ANE, but all of the Indo-European and/or Eastern European samples are fairly evenly lifted up towards ANE relative to a few outliers. These outliers are all southwestern Europeans: Basques, Pais Vasco (Basque Country) Spaniards, southern French and Sardinians.
Of course, southwestern Europe is the most distant part of the continent from the generally accepted Indo-European homeland near the middle Volga. Moreover, Basques don't speak an Indo-European language, while Sardinians were only Indo-Europeanized during historic times.
Indeed, even though a couple of tables in the study report considerable ANE ancestry among Basques and Pais Vasco Spaniards, the authors admit that this need not be the case. For instance:
We next attempted to fit individual West Eurasian populations as a mixture of Loschbour and Stuttgart, as representatives of Early European farmers and West European Hunter Gatherers.
Fig. 1B suggests that this is not possible, as most Europeans form a cline that cannot be reconciled with such a mixture [Davidski's note: I think they actually mean Fig. 2B]. Nonetheless, for Sardinians (Extended Data Table 1), the most negative f3-statistic is of the form f3(Test; Loschbour, Stuttgart), which suggests that at least some Europeans may be consistent with having been formed by such a mixture. We thus fit each European population into the topology of Fig. S12.6. Only Basques, Pais_Vasco, and Sardinians, can be fit successfully with this model. Fig. S12.8 shows a successful fit.
Most European populations cannot be fit as this type of 2-way mixture and, intuitively, this is due to their tendency (Fig. 1B) towards Ancient North Eurasians that is not modeled by such a mixture.
Another intriguing thing about the results shown in Fig 2B is that the expansions of ANE across Europe appear not to have disturbed the presumably Neolithic WHG/EEF cline to any great extent. What this suggests is that ANE was spread largely independently of EEF and even WHG. In other words, the groups that pushed ANE deep into Europe probably had very high ratios of this component. This also seems to be true for the groups that brought ANE to the Near East:
A geographically parsimonious hypothesis would be that a major component of present-day European ancestry was formed in eastern Europe or western Siberia where western and eastern hunter-gatherer groups could plausibly have intermixed. Motala12 has an estimated WHG/(WHG+ANE) ratio of 81% (S12.7), higher than that estimated for the population contributing to modern Europeans (Fig. S12.14). Motala and Mal’ta are separated by 5,000km in space and about 17 thousand years in time, leaving ample room for a genetically intermediate population. The lack of WHG ancestry in the Near East (Extended Data Fig. 6, Fig. 1B) together with the presence of ANE ancestry there (Table S12.12) suggests that the population who contributed ANE ancestry there may have lacked substantial amounts of WHG ancestry, and thus have a much lower (or even zero) WHG/(WHG+ANE) ratio.
So I think it's plausible that the 17,000 year-old Afontova Gora 2 (AG2) genome from Central Siberia, classified as part of the ANE meta-population by Lazaridis et al., is genetically the closest sample we have to the Proto-Indo-Europeans. Based on a couple of the PCA from Lazaridis et al. (below) and Raghavan et al. (see Figure SI 29 here), this genome doesn't appear to be 100% ANE. My very rough estimate is 85/15 ANE/WHG.
If my assumptions are correct here, then it's no wonder that this Bronze Age Danish sample (M4) from the recent Carpenter et al. paper (see here) shows a clear shift towards the Americans on the global PCA. M4 is better known as "the old man" from the giant Borum Eshøj barrow (see here), presumably built by some of the earliest Indo-Europeans in Scandinavia. We can probably expect such Afontova Gora 2-like results from many European samples archeologically linked to the early Indo-Europeans.
As for the first major expansion of ANE into Europe, here's an interesting map that I spotted in one of the online discussions on the paper, which shows the spread of microblade technology in almost all directions from around Lake Baikal just after the LGM (source). Among other things, it offers a very attractive explanation for the presence of ANE in Mesolithic Sweden, as well as the current distributions of Y-chromosome haplogroups R and Q (note that MA-1 belonged to R, which is the brother clade of Q).
But the problem with this scenario is the tight phylogenetic relationship between ANE and WHG. If the former expanded after the LGM from a refugium in South Siberia, then why is it so closely related to the latter, which presumably recolonized Europe from a Southern European LGM refugium, basically at the opposite end of Eurasia?
There also have been a lot of comments online about the potential correlations between ANE and certain clusters generated from modern samples with the ADMIXTURE software. I think it's obvious from just looking at the ADMIXTURE bar graph from Lazaridis et al. that ANE is linked in one way or another to the clusters that peak in Northeastern Europe, the North Caucasus, and South Central Asia (especially among the Indo-Iranian Kalash).
Below is the bar graph from the optimal ADMIXTURE run, the K=16. Note that ANE proxy MA-1 mostly shows membership in the cream and light blue clusters, which peak among the Kalash and Lithuanians, respectively. Click on the image to enlarge.
The Kalash-centered cluster, which actually first appears at K=14, and is more or less repeated in four runs, is particularly interesting, because it shows fairly similar distribution patterns to ANE. Note, for instance, that after South Central Asia it reaches its highest levels in the North Caucasus, which is where ANE also shows a major peak today (see here). Moreover, in Europe it's most pronounced in the east and north, but appears at comparatively trivial levels among the Basques, southern French and Pais Vasco Spaniards, and doesn't show up at all among Sardinians or the ancient European genomes.
However, it's often very difficult to make inferences about ancient population movements from ADMIXTURE results, and I think this is one of those cases. Just because this cluster peaks among the Kalash, doesn't mean that it has its origins within this group, or even in Asia. I'd say the most plausible explanation for its existence is that it represents ANE that expanded rapidly across Eurasia, probably during the early Indo-European dispersals, and today reaches its higher frequencies among some of the most isolated and genetically drifted recipients of this ANE gene flow (ie. those in the Caucasus and Hindu Kush).
By the way, the difference in ANE levels between southwestern Europeans and most other West Eurasians clearly shows on my own PCA and MDS maps. Below is the latest Eurogenes PCA of West Eurasia from a few months ago. Note the pronounced eastern shift among almost all the samples relative to the Basques, Pais Vasco Spaniards, and Sardinians. As per the f4 graph above, only in some instances is this shift also the result of significant ENA ancestry.
It's incredible what a few ancient genomes can add to the context of these sorts of analyses using modern DNA. I didn't really know what was causing this eastern shift when I posted the PCA, and guessed that it might simply be a lack of Mediterranean ancestry across Northern and Eastern Europe (see here).
I also just noticed that Razib posted two articles on the pigmentation traits of the ancient individuals (see here and here). The sample is tiny, but looking back, the fact that the Loschbour hunter-gatherer probably had blue eyes and dark skin, while, on the other hand, the Stuttgart farmer had relatively light skin, is actually quite remarkable.
We'll have a major story on our hands if several other hunter-gatherer genomes come back with similar results. It's just not something anyone would've predicted from modern DNA. Apart from that, there's also the slight shock factor of learning that our not too distant indigenous European ancestors were probably of a deep shade of brown. Imagine that, Europe might have only really lightened up and become white after Near Eastern migrants made their way over. Well, let's wait and see.
Iosif Lazaridis, Nick Patterson, Alissa Mittnik, et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, bioRxiv, Posted December 23, 2013, doi: 10.1101/001552
Raghavan et al., Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans, Nature, (2013), Published online 20 November 2013, doi:10.1038/nature12736
Carpenter et al., Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries, The American Journal of Human Genetics (2013), http://dx.doi.org/10.1016/j.ajhg.2013.10.002
Ancient human genomes suggest (more than) three ancestral populations for present-day Europeans
EEF-WHG-ANE test for Europeans
Mesolithic genome from Spain reveals markers for blue eyes, dark skin and Y-haplogroup C6