search this blog

Friday, July 3, 2015

ADMIXTURE analysis of Allentoft et al. and Haak et al. ancient genomes

I haven't had a chance to study the output in detail yet, and I don't know what the cross-validation errors are for each of these unsupervised runs, but I'd say they all look pretty good. A Principal Component Analysis (PCA) of some of the K=10 data, showing how present-day Armenians compare to two Bronze Age Armenians, can be seen here.

K=6 spreadsheet

K=7 spreadsheet

K=8 spreadsheet

K=9 spreadsheet

K=10 spreadsheet

I did attempt to go up to K=11, but the algorithm appeared to be struggling to find a solution, so I killed the run. I'll have another go when more samples come in.

By the way, the analysis is based on the Human Origins fully public dataset available at the Reich lab website here.

To reduce errors, I limited the markers to transversion SNPs, and only kept samples with minimum call rates of 20%. This left 113K SNPs and 101 ancient genomes; 47 from Allentoft et al., 36 from Haak et al., and 18 from other recent papers. I didn't thin the markers to correct for LD, because in my experience this often results in less accurate outcomes.


Unknown said...

Looks very nice

You could'nt include the EHG and WHG samples ?

Davidski said...

I did include all of the EHG and WHG samples. But yes, not in the PCA, if that's what you mean.

I'll put up other PCA on the weekend based on this output, and I'm pretty sure Matt will cook something up as well.

Aram said...

This is a face reconstruction from 'soldier' grave north of Lake Sevan. The RISE samples from Nerqin Getashen ( also near lake Sevan) are close to that grave (Lchashen).,_15-14th_centuries_BC._Reconstructed_by_Prof._A._D._Tchagharian,_Sardarapat_museum,_Armenia.jpg

Unknown said...


Thanks. I think it would be very informative if you added the HG groups (SHG, WHG, EHG).

Can you quantify on your graph the relative admixture rates between eastern and MN for BB's ? On pure inspection, it looks almost 50/ 50- especially if one focuses on the MN German & Spanish groups, and cryptically high-WHG farmers like the recent Vinca.

Helgenes50 said...

Thanks for all these new results

Alberto said...

Thanks David. Finally we get a better idea of the BA Armenians.

I looked mostly at K10 for now. At that level they cluster with North Caucasus. One clusters exactly with an Adygei (RISE423 with HGDP01398) and the other exactly with a Lezgin (RISE397 with lez36). I guess mostly as expected, but it's good to see some confirmation.

Interesting the (small) difference between the EHGs. Samara_HG always looked a bit more Central Asian, while Karelia_HG a bit more Siberian. Here Samara_HG gets some 10% Hindu_Kush while Karelia_HG just 2%. Maybe that's related to Samara_HG being R1b?

It's also interesting that the BA Hungarian with high WHG (RISE479) gets 0% Hindu_Kush. He still doesn't seem to have any Yamnaya or CW ancestry (or West Asian/Caucasus). Quite a mystery.

Chad said...

All of the BA Hungarians show Yamnaya ancestry. I'll put up a K10 later, but I can only add a couple Allentoft samples at a time. They tend to push the Yamnaya cluster to the east, where most Yamnaya samples come up 80-90% of that, plus 10-20% WHG.

Unknown said...

I know its perhaps the way the plot is set, and one can't make too much from visual inspections of PCA, but the major shift of BBs compared to MN predecessors is a northward one, rather than east. (?)

Davidski said...

There's significant shift to the northeast from MN to BB, because in reality Yamnaya are more eastern than Armenians.

Unknown said...

The similarity of Andronovo to east-central European groups - which we've all noted- is striking, esp given that it is found where we'd expect Yamnaya-like predecessors during the early Bronze Age. Im sure if we had Catacomb culture samples, they'd look more east-central ("Baltoid") European too (that's certainly what the mtDNA evidence suggests).

To me this suggests there was a replacement/ displacement of a south-steppe/ Transcaucasian population (=Yamnaya) by a forest-steppe/ north European plain one during the middle Bronze Age. Thus, I wouldn't simply rest the displacement of Yamnaya types from southern Russia solely or as late as medieval events (ie Slavs).

On the other hand, the genesis of Yamnaya must have had southern impetus. Despite it still being very north-eastern, and wherever one places the origin of M269, the mtDNA profile of Yamnaya samples makes this irrefutable.

So we have on the steppe, first, a clearly southern input during Enoelithic, then a shift back to a more northern one during MBA. Clearly the steppe was a volatile and fluctuant place liable to receive marked influences from elsewhere, and often be wholly swept away also.

Unknown said...

Rise 436 M R1a1a1

Looks SW European.

BA Montenegrans look Basque.

Chad said...

K10 output. Some look like garbage, fair warning.

Chad said...

Excuse me, one BaMon looks Basque, the other Scandinavian. Two CW samples look like BaHun. Most BaHun samples look between BR1 and Brits.

Davidski said...


You need to run these samples with the main Human Origins file (not the Haak one), and only with transversion SNPs.

I'll send you the SNP list in a few minutes. When you get it, merge the ancient samples from Haak and Allentoft with the Human Origins (but be careful not to duplicate some of the samples, like Stuttgart and Loschbour).

Unknown said...


Will do. Once I get it, I'll run them again.

Alberto said...


Yes, I think we're seeing that the steppe was unstable, but at the same time a very effective transit zone that played a very important role in spreading populations, cultures and languages east-west (both directions). A different thing is if it was the origin of any of them or not.

Now for some speculative thinking:

This leads me more to think that R1b kind of "passed" through the steppe. The native people before and after look to be clearly R1a. R1b coincides with the southern impetus of Yamnaya, and apparently disappears with it (leaving just some traces in a few populations, which is to be expected).

It would be interesting to know if Afanasievo was R1a or R1b. I remember Nirjhar said he thinks they'll be R1a, and he might be right if the 2 populations (Yamnaya and Afanasievo) were not descended one from the other, but they formed at the same time in different places (as similar as they look, it's hard to establish a chronological and spatial connection, both being contemporary, quite distant apart and without a continuum in the space between them, that was occupied by distinct cultures). The historical entrance into the steppe and Siberia from south/west Eurasia is through the eastern part of Central Asia. That area could have been R1a dominated. While R1b might have been more to the west, around the south Caspian. And only entered the steppe through the Caucasus at a much later time.

Alberto said...

Related to the above speculation about Afanasievo possibly being R1a, I find this stat interesting:

Yoruba baCw baAfan baYam -0.0069193453 -2.6182852949

Why would CW be closer to Afanasievo than to Yamnaya?

Matt said...

As expected, quick couple of PCAs on the K10 values - - with lots of the samples from West Eurasia, setting a cutoff point that looked about where it normally goes on these PCA. Limited to 5 samples with the same population label (which did unintentionally end up with tons of Spanish, as they all have slightly different labels by default). You'll be able to work out the colour scheme - black for present day, then various colours for different sets of ancient samples. as above, but with a higher cutoff of not having more than 20% in any combined total of non-West Eurasian components, other than Amerind. That eliminates most of the Middle East, South-Central Asia and many Russian ethnic groups from the PCA.

Some MDS plots of FST matrix for the components - You can see that the Sub_Saharan component is much less of a neutral outgroup to Eurasia than San_Bushmen. The FST matrix shows that pattern again where the Hindu_Kush component is relatively close to everyone, either through being very admixed (and thus high internal genetic diversity) or low drift, e.g. vs San Bushmen: Hindu_Kush - 0.185, Middle East - 0.193, Euro HG - 0.201, or vs Sub_Saharan: Hindu_Kush - 0.134, Middle East - 0.143, Euro HG - 0.152.

Looking at the Hindu Kush component, it looks like the Euro HG component with uniformly less drift of around 0.016 to 0.018 to what should be more or less outgroups -

Although it seems really unlikely, I wonder if this is the truth of what this component represents, a kind of Euro_HG related ancestry which branched off from Euro_HG early on and is much less drifted than it, unlike Middle East which is in theory Euro_HG like ancestry plus the separated Basal Eurasian. Or if this is just an odd effect of combining different kinds of ancestry or something and how ADMIXTURE separates them.

Davidski said...

Thanks Matt. Yeah, I think the Hindu Kush component is very mixed and basically represents the complex ancestry and isolation of the Kalash.

At K6 and K7 the Kalash are mostly Middle Eastern and European with various Eastern Asian influences, and that's essentially what the Hindu Kush component is, making it look sort of Euro HG.

Tobus said...

Alberto: What are the SNP counts for that D-stat.. -0.006 is a very low result and not usually associated with such a high Z-score. I'm wondering if it's running on only a handful of sites and so throwing the Z-scores out somewhat.

Unknown said...

@ Alberto

"This leads me more to think that R1b kind of "passed" through the steppe. The native people before and after look to be clearly R1a. R1b coincides with the southern impetus of Yamnaya"

I really don't know and can only speculate without direct evidence. But its been long noticed that R1a is found extensively from Scandinavia to India, but R1b has a more broken & localized appearance on the steppe.

And clearly R1b had an early presence on the steppe too (since at least Mesolithic). But it really depends on whether the Samara sample discovered is actually ancestral to other, later ones, or - as it appears- a dead offshoot, replaced by newer arrivals from elsewhere.

Unknown said...

What about m73 Samara and m269 Sredni Stog, pushing m73 to Central Asia.

Unknown said...

Yep. That's certainly possible.

Alberto said...


Yes, the result is very low compared to the at least slightly significant Z-score. Here's the complete D stat:

Yoruba baCw baAfan baYam -0.0069193453 0.0026427011 -2.6182852949 506941

The quality of the genomes are certainly not the worst ones in the paper, but I guess none are really that great either. In any case there's probably not too much to read into it, I just found it odd that CW would look closer to Afanasievo than to Yamnaya, against geography, archaeology and K8 admixture results. So just speculating if Yamnaya being R1b vs. CW being R1a could have some impact in it (in the also speculative case that Afanasievo were R1a, which we obviously don't know).

Alberto said...


Yes, pure speculation on my side based on the bits of information we have. It could all be random founder effects, sampling bias and what not. But I'm usually not very happy with those explanations and try to speculate to find some reason to it (a defect, certainly. It's wiser to just wait for more data that would allow us to really know).

Alberto said...


Thank, very interesting. I think we still have quite a bit to learn about Basal Eurasian, ANE and Near Eastern components.

At K6 the Middle_Eastern looks like Basal Eurasian + ANE, being very high in modern Near Easterners, Iranians and S-C Asians. But rather low in EEF, that look very European.

At K10, it seems that Hindu_Kush takes the ANE with it + Basal Eurasian, while the Middle Eastern also has Basal Eurasian + Near Eastern specific drift (whatever that is, maybe some kind of WHG?).

I don't think that Hindu_Kush is taking much (if any) Euro_HG or East_Asian, though.

But this still leaves me with too many questions that I'm unable to answer about how Euro_HG relates to the other two clusters. Ok, Euro_HG does have some ANE (courtesy of Motala and EHGs), and Middle_Eastern might have some WHG-like component. So that would somehow relate it to the other two, that would also be related via Basal Eurasian.

But who knows, probably again just thinking aloud...

Alberto said...


In these kind of unsupervised runs, wouldn't it help to be more selective with the samples used? For example, leaving Amerindians out would improve our understanding of the components, I think. Eurasians don't have any Amerindian admixture (it's the opposite), but they immediately create their own cluster that hides the origin of some components (EHG being 10-20% Amerindian is not very helpful).

Another example would be not using Motala, that clusters with WHG but gives it ANE, making for example the BA Hungarians look like they have no ANE at all. Or making EHGs have inflated Euro_HG ancestry.

I know they're supposed to be unsupervised, but that doesn't mean we can't try to improve the output by manipulating a bit the input with what we know.

Alberto said...

Euro_HG is the one that seems to have also Basal Eurasian (or Middle_Eastern), since Unetice, Lithuanians and others (BB, Sintashta) are scoring 80%. So all 3 components sharing BE and Euro_HG sharing some ANE with Hindu_Kush might explain why they are so related.

Davidski said...

Yes, leaving out the Amerindians and SHG might reveal some interesting things. I'll try a couple of runs without them later today.

By the way, here's the K15 for Oase1.

North_Sea 8.12
Atlantic 13.45
Baltic 9.65
Eastern_Euro 0
West_Med 4.51
West_Asian 0.4
East_Med 0
Red_Sea 0
South_Asian 26.1
Southeast_Asian 7.65
Siberian 0
Amerindian 0
Oceanian 9.36
Northeast_African 4.16
Sub-Saharan 16.6

Unknown said...

Very "South asian" and very little Levantine stuff. Same as K14.
Makes one wonder what was happening in the Palaeolithic near East

Unknown said...

What I've seen is that if you remove Native Americans, you have to take out EHG. That's the only way to get a Yamnaya cluster.