search this blog

Wednesday, July 15, 2015

Population genomics of Early Bronze Age Europe in three simple graphs

Thanks to recent advances in ancient genomics there's very little doubt now that the Pontic-Caspian Steppe was the source of massive population movements deep into Europe during the Late Neolithic/Early Bronze Age.

But some people still don't get it, maybe because genomics isn't their thing? Others just refuse to get it probably because it's at odds with what they've been hoping to see (fine example here).

To help the former, and piss off the latter some more, I've put together three simple TreeMix graphs featuring ancient samples from a wide range of European archeological cultures, along with a little bit of commentary. Enjoy.

Full output from the analysis above is available in a zip file here. The samples and markers are listed here and here. The ancient samples are from Allentoft et al. 2015 and Haak et al. 2015. The Sub-Saharan Africans are from the fully public Human Origins dataset available here.


Mike Thomas said...

I see. Very nice
But can you add Mesolithic samples. Wouldn't it be instructive to determine w fuller picture if They're also included, rather than starting a way into the admixture process ?

Davidski said...

That doesn't work, because the algorithm is forced to interpret several mixture events across time that involved the same signals.

It's impossible to show accurately mixture events from the Neolithic and then also from the Bronze Age. It has to be one or the other.

Mike Thomas said...

I see. And I guess we don't have wide enough data to do a Mesolithic - Neolithic look ?

Davidski said...

Ideally I'd need some Neolithic farmers from the Near East. But I might be able to do a very basic graph of the Neolithic transition in Western and Central Europe. I'll have a go at that tomorrow.

Mike Thomas said...

Ok great. And the steppe ?

Simon_W said...

It's necessary to stay open-minded and flexible enough to adapt ones theories in the light of new evidence, that's scientific progress. Some people are just trapped by their old theories.

Others may have personal aversions against the idea they might have something in common with those in the east, in the "eastern Bloc" as someone called it, as if such a thing had already existed in the Copper Age.

To Basques it may look like an indispensable matter of ethnic identity not to be descended from IEs in the purely male line. Although one of them criticized the patrilineal way of thinking as an unjustified bias.

Still others have a soft spot for outlandish theories, to fight against the establishment etc...

Helgenes50 said...


Thanks for your work,

On these new TreeMixs, the Bell Beakers are closer to the Yamnaya than the corded ware, with more of Central European , what we already see in the admixtures

Davidski said...

Yes, the Bell Beakers are a little closer to the eastern Yamnaya than Corded Ware are when their Old European admixture is disregarded.

That might have something to do with their R1b-M269.

Simon_W said...

Of course more or less anonymous commenters on blogs have it easier than professionals who may have invested their entire career into a theory...

Simon_W said...

Well, Bell Beakers are closer to Yamnaya in the second tree, because there they have an admixture edge from Copper Age Old Europe, which allows them to be a mix of Yamnaya-like people with Copper Age European admixture. Corded Ware isn't allowed to have an admixture edge in the second tree. That's why it is more distant from Yamnaya. (Probably because the admixture is stronger in Bell Beaker the only admixture edge goes to Bell Beaker.)

In the first tree, without admixture edges, Corded Ware is closer to Yamnaya.

When both are allowed to have admixture edges, in the third tree, Corded Ware and Bell Beaker look like twins.

Davidski said...

I think people should move with the data no matter who they are and what they've said or published in the past.

If there aren't any strong arguments left, and as far as I can see no one has been able to come up with any, certainly not Paul Heggarty in that recent article, then it makes no sense to keep up the charade.

Shaikorth said...


I don't think having Indo-European derived R1b would cause identity issues to Basques, at least not more than having I1 would cause to Swedes or N1c1 to Lithuanians. Perm and Baimakskiy Bashkirs have R1b and R1a frequencies near fixation, and to my knowledge this has caused no issue regarding their identity which remains firmly Turkic of local variety.

Kalash have high frequency of presumably non-IE Y-DNA but maintain I-E identity to an exceptional degree, and the current threat it is facing from their (also I-E speaking) neighbours is fully cultural and has nothing to do with haplogroups. If Y-DNA causes identity problems they are definitely what might be called "First World Problems", as are problems Y-DNA might cause to someone's academic legacy.

bellbeakerblogger said...

Yamnaya and Bell Beaker have a lot in common, culturally and genetically.

That doesn't logically mean that Bell Beaker derives from Yamnaya. Both populations may be derived from another more distant population "Q" from which both partly descend.

Chad Rohlfsen said...

Khvalynsk, maybe.

Matt said...

Does Sintashta get Copper Age Europe signal at 3 edges? Or IA/BA Armenia at 4?

Different branch lengths should mean technically different drift histories, and / populations falling more basal, such as Yamnaya falling more basal than Afanasievo. But adna probably means that doesn't hold.

bellbeakerblogger said...

@Chad Rohlfsen

I really don't know. One question is the degree to which immigration played a role in the formation of Khalynsk and Samara (Repin) Cultures in the first place.
There are conservative elements and foreign elements. So is rocker-stamp pottery and pressure retouched projectiles of Khvalynsk and that of Southern Libya around the same time indicate a common genetic/pastoral heritage? If so, what does that population look like and where did it come from?

Also, two good books...
"Ancient Metallurgy in the USSR" Evgenii Chemykh
"The Emergence of Pressure Blade Making" Pierre M. Desrosiers

Davidski said...


You almost got it, well done. On the last tree note the simultaneous migration edges from the Copper Age Europe branch to Sintashta, and from the Sintashta branch to BA/IA Armenia.

The basal edges into Yamnaya/Afanasievo are also interesting. I'm certain now that they're related to gene flow to the steppe from the North Caucasus.


I'm aware of unpublished data which show that Khvalynsk are very similar to Yamnaya in that they look like a two-way mixture of Eastern Euro foragers and Caucasians/Near Easterners, but their ancestry proportions are different. Yamnaya are ~50/50, while Khvalysnk are ~75/25.

I don't know much about rocker-stamp pottery and pressure retouched projectiles, but I suspect that a lot of cultural traits spread from early Mesopotamia to the Mediterranean and the Caucasus, and eventually also into Africa and the Eurasian steppe. Some gene flows might have accompanied these cultural transmissions, but it looks like things really started to move in this respect when the steppe people began to migrate out of the steppe.

bellbeakerblogger said...


I'm partly playing devil's advocate here since I realize the supernova in the steppe is the main thrust of these profile components. I'm just a little cautious and think there's some missing parts to the story

Mike Thomas said...


Is the migration edge from Copper age Europe to sintashta MN admixture in corded ware groups ?

And the migration edge from the caucasus must have ultimately dervived from South Caucasus. Because north Caucasus groups would essentially have just been EHG, natively (?)

Romulus said...

Heggarty never denied population movement from Steppe->Europe in the BA, his whole post was on the linguistic affinities of these people.

Davidski said...

Yes, it went something like this: Corded Ware > Abashevo > Sintashta > Asia.

But I doubt the Caucasus admixture in Yamnaya/Afanasievo is from the south Caucasus, because BA/IA Armenians are on most of these trees and there's not a single migration edge from them to Yamnaya.

Also, the migration edge that runs from the branch leading to the Lezgins and Chechens starts at a point before a migration edge from the Stuttgart/Sardinian node hits the Caucasus branch. So again, what this suggests is that Yamnaya acquired their Caucasus ancestry from an unusual population that is no longer represented all that well by any Caucasians because it lacked the typical Neolithic admixture.

My bet is that this unusual population lived somewhere in the north Caucasus before all of the Caucasus was affected by population movements from Anatolia.

Davidski said...


Heggarty argued that:

- the population movements from the steppe might have only carried Balto-Slavic and maybe Tocharian languages, so he was implying that they were much more limited than what the data in Allentoft et al. and Haak et al. showed

- steppe-related admixture might have only arrived in southern Europe during the Medieval or Migration Period, which was again an attempt on his part to downplay the impact of the Bronze Age population movements from the steppe.

Allentoft et al. did have data from southern Europe, and it showed clearly enough that there was a shift in genetic substructure there during the Bronze Age similar to the one that happened in Northern and Central Europe.

Thus, Heggarty is a fine example of someone who's not getting it because he can't or doesn't want to get it. Judging by his history, it's the latter.

You seem very fond of what he's written in that online article. Are you sure you read it carefully enough?

Nirjhar007 said...

Though i am not stubborn on this but but IMO Khvalynsk will have some Mixed Results with some I hg's

Mike Thomas said...

Slow to the boat in this one, but Why weren't there any Y Hgs from Afansievo ?

Davidski said...

All of the Afanasievo samples were females.

Sisophon said...

I think it is useful to display the percentages on the migration edges, rather than relying only on color. Some edges can be greater than 0.5. For example, in tree2 the edge from Copper_Age_Europe to Bell_Beaker is 35% and to Corded_Ware 14%. I did this with an old version of treemix (version 1.11) some time ago, so I think it should be an easy modification.

Helgenes50 said...


Some edges can be greater than 0.5. For example, in tree2 the edge from Copper_Age_Europe to Bell_Beaker is 35% and to Corded_Ware 14%.

How to calculate the different percentages of migrations ? how to get the results mentioned above

Davidski said...

Well, I wouldn't pay too much attention to the values of the migration edges, because they're influenced by the positions of the populations on the trees, which are in turn affected by the choice of the populations.

The important thing to note is that these migration edges are consistent, easily reproducible, and in line with other types of population genetic analyses and archeology.

But I uploaded all of the output files here...

So the values of the migration edges can be looked up manually by unzipping the treeout files, or maybe plotting them again with a modified version of plotting_funcs.

Davidski said...

By the way, 0.5 is 50%, so a migration edge of 35% isn't greater than 0.5.

Rokus said...

Bell Beakers are a little closer to the eastern Yamnaya than Corded Ware are when their Old European admixture is disregarded.
What Old European-like admixture may be disregarded for Corded Ware to have them 'closer' again, I wonder?

Corded Ware isn't allowed to have an admixture edge in the second tree. That's why it is more distant from Yamnaya. (Probably because the admixture is stronger in Bell Beaker the only admixture edge goes to Bell Beaker.)
Yes, this may be an artifact of the method.

Davidski said...

See here...

Bell Beakers are indeed somewhat closer to Yamnaya when 4 and 5 migration edges are allowed.

Helgenes50 said...

@ David

By the way, 0.5 is 50%, so a migration edge of 35% isn't greater than 0.5.


Sisophon said...

Davidski said...
"By the way, 0.5 is 50%, so a migration edge of 35% isn't greater than 0.5."

I only meant to say migration edges can be greater than 0.5. I did not mean to imply that they were in these cases. But I downloaded treemix 1.12 and I see the code has been updated to cover this case, so it is no longer an issue.

If you or anybody else wishes to see the percentages printed on the graphic you can add this code

text((v1[1,]$x+v2[1,]$x)/2, (v1[1,]$y+v2[1,]$y)/2, labels = paste(floor(e[i,4]*100),"%"), adj = 0, cex = cex)

to plotting_functions.R on line 165.


Davidski said...

Thanks, I updated the code...

Davidski said...

Here are the trees with 3, 4 and 5 migration edges done with the new code.

Mike Thomas said...


Thinking about a couple of comments from yesterday:
"But I doubt the Caucasus admixture in Yamnaya/Afanasievo is from the south Caucasus, because BA/IA Armenians are on most of these trees and there's not a single migration edge from them to Yamnaya"

Certainly true, but we wouldn't expect a migration age from Bronze age Armenia to Something which predates it (Yamnaya). We need neolithic and copper age samples from caucasus to disprove the growing "2nd concensus": a migration from the south to the steppe in the 5th-4th Mill..

* "So again, what this suggests is that Yamnaya acquired their Caucasus ancestry from an unusual population that is no longer represented all that well by any Caucasians because it lacked the typical Neolithic admixture...My bet is that this unusual population lived somewhere in the north Caucasus before all of the Caucasus was affected by population movements from Anatolia."

- this would need to be a special population indeed. How can what was in the Neolithic an otherwise unremarkable area be so different to its EHG 'cousins' - from karelia to Samara.

Helgenes50 said...

Thanks to this code, these graphs get more and more interesting
and very speaking.

Davidski said...


I'd say it's almost certain that the southern/Armenian-like/teal/Caucasus-related, or whatever you want to call the non-EHG admixture in the Khvalynsk and Yamnaya, is going to be the result of long-distance contacts and intermarriages with early Maikop. And in the end I don't think it'll come out looking very Armenian-like.

I'm pretty sure of this. But of course let's wait and see.

Mike Thomas said...

Well, I respect your predictive ability. To me its just surprising, at least at first thoughts. Id have thought the 'natives' of the Kuban to be similar to those elsewhere in Russia - perhaps a bit more WHG, a bit more MNE. Anyhow, i like all eagerly await the results of the vanguard researchers.

Helgenes50 said...


Did you run The Rise samples with Teal K9 ?

Davidski said...

I haven't, and I don't think I can because that dataset doesn't have enough transversion SNPs that overlap with the Rise samples.

Karl_K said...

So what's up with the African edge into Yamnaya and Afanasievo? Is this a signal from a different subset of 'Basal Eurasian' present in the different population of Early Farmer they admixed with? Or a little bit of different Neanderthal from a bit of different Paleolithic European Cro-Magnon? Or is it just noise? 2% seems high for noise, but that could be due to some statistical bias?

Karl_K said...

I guess I left out the most obvious option, admixture with Sub-Saharans?

Shaikorth said...

It's likely just Basal Eurasian/Near Eastern. We have word from Lazaridis that 61% EHG 39% BedouinB fit works for Yamnaya so it isn't unexpected an edge like that could happen.

Chris Davies said...

Present days populations of Caucasus and Volga-Urals regions bear quite a number of HLA haplotypes of Northern African and Western Asian provenance which likely did not enter them recently.

One such example:-

Haplotype: A33-B14-DR1


Shared with:-

-Iraqi Kurds,
-Balearic Islanders,
-Northern Italians,
-Southern Spanish,
-Parsis [India & Pakistan],
[More data needed]

There are several such haplotypes found among the data.

André de Vasconcelos said...

I might be wrong, but a study from Department of Immunology, Universidad Complutense, Madrid, reported that the Portuguese samples did not present the A33-B14-DR1 haplotype.

Chris Davies said...

@ Andre -

The data I used was derived from bone marrow registry data from Portugal, several thousands of people from all regions of the country.

[A33-B14-DR1 in annotated form].

Open access:

André de Vasconcelos said...

@ Chris

Thanks for the info.

My assumption was actually from an study that happens to be old, so I suppose it's not worth much these days. Anyway, I leave the abstract here just because:

Davidski said...


Those 1-2% basal migration edges running to Yamnaya/Afanasievo are probably just Basal Eurasian admixture via the Caucasus and Near East.

Or they might represent excess ANE or archaic ancestry, because what I find is that trees which are missing the correct Eurasian references also show such basal edges running to samples with very little expected Near Eastern ancestry.

Aram Palyan said...


I think the Basque is a linguistic classification issue.
There are growing voices that Basque is in some way related to PIE.

So how this happen? The initial R1b-L11 (PIE) wave enters Iberian peninsula where there is a dense neolithic population. There it had a dramatic linguistic change. The Proto-Basque is a synthesis of a PIE and some Neolithic language.

Then later. Celto-Italic evolves somewhere in Alpes. Alpe mountains are close to Balkans and R1a rich environment. This Celto-Italic has an additional layer of IE words phonetic laws that is absent in Basque.

The hallmark of this additional layer is the Caucasic-Gedrosia component.
Basques lack the Caucasus-Gedrosia component. It is quite ironic because linguists tried so hard to link Basque to Caucasic languages. (but maybe there are some links trough neolithic layer)
This slight difference in autosomal components finaly made a big difference in linguistics.

Grey said...

very cool, well done

Alberto said...


I'd say it's almost certain that the southern/Armenian-like/teal/Caucasus-related, or whatever you want to call the non-EHG admixture in the Khvalynsk and Yamnaya, is going to be the result of long-distance contacts and intermarriages with early Maikop. And in the end I don't think it'll come out looking very Armenian-like.

Yes, BA Armenians plot with North Caucasus, but they don't actually share that much drift with them. Modern Caucasus got influences from north and south (Armenians a lot from the south), that make them still plot around the same place, but I think they're quite different from the Ancient Caucasus populations. One of the trees (tree6) show this with an admixture edge going from EEF to the branch with modern Caucasus, after that branch went to Yamnaya/Afanasievo base.

I'm trying to see what there is to the "Teal"/"Hindu_Kush" component apart from a mere ENF+ANE mix. If you (or anyone else) get a chance to run these D-stats I'd appreciate:

Mike Thomas said...


Your speculations about basque have little empirical support. "synthetic" or Mixed languages are rare, and almost certainly wouldnt have occurred in EBA Iberia .

We might have to objectively consider why R1b heavy Basques aren't in any way, shape or form IE, nor are significant portions of of people R1b (x L51), and the implications for historical linguistics.

Aram Palyan said...

Ok it was a just a speculation. Is there any theory out there explaining this issue? It would be interesting to read it.

Alberto said...

@Aram, Mike

Actually Basque is not such an exception in being high R1b and Non-IE speakers. Catalonia is 82.5% R1b (Eupedia figure) and they didn't speak IE by the time the Romans entered Iberia. Actually Spain as a whole is 70% R1b, but half of the territory (the most populated half) didn't speak IE before adopting Latin (the other half of spoke Celtic).

I guess this means that R1b spread through Iberia with Bell Beakers, not with Celts. And that Bell Beakers simply didn't speak IE. I don't see how else so many populations with such high R1b could not speak IE languages.

Mike Thomas said...


To re- answer, no surprisingly there isn't a published theory yet. If I told you what I think, you wouldnt understand, because one needs a certain base understanding of the social anthropologies, and by base I mean rather developed.
But two quick points for you-
1) maybe there isn't a simple, neat correlation between Y haplogroups and language ?
2) language expansion is a socio-cultural phenomenon, and this is not deny that there were instances of "mass-migration". In fact, there were several and multiple

Davidski said...


Here you go. But I don't have any Malays in this dataset.

P.S. This was based on transversion SNPs only.

Aram Palyan said...


///To re- answer, no surprisingly there isn't a published theory yet. ////
So how You want that I unlearn if there is no a theory.
What in present moment I see is this citation from Mallory and others about BB culture is this.

///Bell Beaker has been suggested as a candidate for an early Indo-European culture; more specifically, an ancestral proto-Celtic.[82] However, it has most recently been suggested that the Beaker culture was associated with a European branch of Indo-European dialects, termed "North-west Indo-European", ancestral to not only Celtic but equally Italic, Germanic and Balto-Slavic.///

BTW don't think as a Armenian I am too much attached to the IE identity. And I will not understand something that doesn't link Armenian to Yamna or R1b-Z2103. My comments are just in the mainstream state of ideas :)
I don't venture to propose something different (because it is too revolutionary ) despite the fact that I have a huge book from Mekhitaristes on my table that predicted many things that we see now. :)

Mike Thomas said...

Yes of course, I know where your comments are coming from. As i said, sociolinguistics is still at incipiency, and more so for PIE than other aspects (eg there have been very good articles written on the polyglot situation in pre-Roman Masalla, or Black Sea). It's in the pipeline...

Suffice it to say, maybe IE came with R1b indeed. But id contend that europe - and Caucasus- was at its peak in linguistic diversity beyween the late neolithic and bronze age. This changed after the developed Bronze Age, and especially pre-Roman Iron Age, with the advent of true chiefdoms. And I think BB is far too early for anything Celtic, or even pre-Celtic ....

Mike Thomas said...

And there's nothjng wrong with being attached to ones PiE identity. It's pretty darn amazing that languages from Ireland to India are linked. Isn't that why we're all so jnterested in uncovering the nitty gritty ?

Aram Palyan said...

I agree, it's wonderful to be part of great community from Ireland to India. This brings all us here. I wanted to say that for me that pre-IE layer is also very important. And it could happen that the pre-IE was also an important actor for creating this great community.

Alberto said...

Thanks David.

I certainly don't see a relationship between populations that are usually very "Teal", "Hindu_Kush", West_Asian",... So probably that's a mixed bag of components without a real tie between them.

However, I do see some other interesting things, mostly related to ENF. There seems to be a great variation in ENF "subcomponents". At least 3 main ones. 2 are related, but the 3rd one seems rather unrelated.

- ENF in Early European Farmers (and by extension in modern Europeans)
- ENF in modern Middle Easterners (related to the first one, but with significant differences)
- ENF in North Indian (and related) populations (which I even doubt it's ENF at all).

I'll write a few observations/questions later. If anyone wants to look at the spreadsheet (easier to the eye than the text file), it's here:

Mike Thomas said...

on what basis are you isolating ENF in modern near easterners ? How can one discern what is "ENF" is a populaton lacking aDNA ? (Id treat Barcin as a Southeastern WHG - acculturated farmer)

Alberto said...


I'm just calling it ENF based on what we usually call Near_Eastern component in admixture, like in K8.

We really don't know exactly how ancient Near Easterners were, but the clues we have point to, at least in the northern part of the Near East, them being quite EEF-like, though we'll need confirmation to make any statement about it.

But the point is not so much how ancient Near Easterners were, but rather how this ENF component used in K8 seems to be a mix of different things, one of them apparently quite unrelated to the others.

Pathans in K8 score some 35% Near_Eastern and some 1.3% WHG. However:

WHG LBK_EN Pathan Mbuti 0.0069 1.71

This stat makes no sense. Whatever ASI or East Asian could be in Pathans is missing in both LBK_EN and WHG, so they should not bias the stat significantly towards one of the other. However, if WHG has 0% Near_Eastern and LBK_EN has 70%, Pathans should be much closer to LBK_EN and the stat should be very negative. But it isn't.

One could think that Pathans might have much more than 1.3% WHG, but even then LBK_EN has 30% WHG, so unless Pathans had something like 70% WHG the stat shouldn't be positive. but:

LBK_EN Pathan WHG Mbuti 0.036 10.026

And then again:

WHG LBK_EN GujaratiA Mbuti 0.0107 2.409
WHG LBK_EN GujaratiD Mbuti 0.0136 2.961

As you go deeper into South Asia, the stat becomes more positive. Meaning it's not a possible high WHG admixture that makes it positive, but rather the almost absolute lack of Near_Eastern admixture (in spite of Pathans scoring 45% and Gujaratis something less - don't know exactly, but maybe 35-40%).

So it seems (unless someone else has a better explanation) that S-C Asian populations have very little Near Eastern admixture, and what we are calling ENF in them is something else.

Alberto said...

Another point that supports the northern part of the ancient Near East being EEF-like is the fact that this kind of admixture seems to have survived much better in the Caucasus than in their southern neighbours, in spite of having received an ANE rich admixture from another population. It's probably not by chance that G2a peaks in the Caucasus today.

Mike Thomas said...

I see, thanks Alberto

Chad Rohlfsen said...


I see similar things with admixture. South Indians will get 1-2% WHG and none in Pathans. It could simply be a two to three wave Neolithic, where the first is more Barcin like, and latter more basal than Bedouins.

Alberto said...


But the interesting point is not so much the WHG/ENF ratio, but rather the absolute amount of ENF. Or more precisely, the lack of it.

What I see is that there is some ENF in Pathans, but probably single digit instead of the 45% they score in K8. And as you move south, it becomes even less. GujaratiD might be < 5%. Of course it's difficult to estimate just like this, but I'd say that the K10 run in this post by David might be quite right when it comes to Middle_Eastern admixture:

The question is: if those populations don't have much ENF, what is it they have? I don't think it's more ANE, but rather a different component, something North Indian.

Though the question I wanted to answer in the first place was not that one exactly, though it's related. Basically I wanted to know if the Hindu_Kush cluster in the K10 above was something real, but after looking at the stats and thinking about it I think it's not possible to check it with formal stats. The ASI admixture in S-C Asians and the Sub-Saharan admixture in Near Easterners, plus both having ANE makes it impossible to know with this method.

Grey said...


1) if there was an early copper based connection between Yamnaya and various other locations and the miners married local women in those locations but also arranged long-distance marriages between them as well then that might explain the Yamnaya mtdna.

2) if there were two waves from the steppe and the first R1b wave was more of a catalyst wave i.e. small groups of trader/artisans, then they might adopt the language of the people they settled among

3) Iberia was chock full of all kinds of soft metals

4) if there were multiple waves and some of the later ones were more elite replacement than population then maybe BB could be responsible for Celtic genetics but a later wave responsible for the culture (specifically around the times of the shifts first to bronze weapons and later iron with the people sitting on good sources of bronze/iron being the ones doing the elite replacement)

Grey said...

Chris Davies

"Present days populations of Caucasus and Volga-Urals regions bear quite a number of HLA haplotypes of Northern African and Western Asian provenance which likely did not enter them recently."

This may be a dumb question but could some of this be archaic rather than specifically African?

(so it survived in the Caucasus rather than moved there)

Chris Davies said...

@ Grey - "This may be a dumb question but could some of this be archaic rather than specifically African?
(so it survived in the Caucasus rather than moved there)"

If the haplotype arrived in the Caucasus, say, 40kya it would be highly equilibrated. That is not the case, in fact it is in strong disequilbrium. Suggesting more recent asymmetric migration, probably in the order of <15kya.

In the case of A33-B14-DR1 the haplotype is found in disequilbrated form only outside Africa. The haplotype itself is found in Africa, but the greatest frequency and diversity of recombinants of the haplotype's allelic components is generally within Africa.

Outside Africa, A*33:01-bearing haplotypes are generally limited only to the one mentioned above [A*33:01-B*14:02, or A33-B14 for short].

Whereas West Africa has several variants:-

Matt said...

@ Alberto:

Very weird bunch of stats, huh?

D( Lithuanian, Georgian; LBK_EN, Mbuti) D= 0.009, D= 4.16
D (WHG, Georgian; LBK_EN, Mbuti) D= 0.0082,Z = 1.984
D( Lithuanian, Palestinian; LBK_EN, Mbuti) D= 0.0328, D= 16.39
D (WHG, Palestinian; LBK_EN, Mbuti) D= 0.0317,Z = 7.909

Implies Lithuanian essentially no closer to LBK than WHG (although you could test that directly via D(Lithuanian,WHG;LBK_EN,Mbuti)) and that they are both closer to LBK_EN than Georgian and Palestinian.

The D stats with Palestinian and Syrian place them *really* far away from everyone. Syrian is even further from WHG than Pathan. Power of even a little African admixture? Or just something to do with ENF ancestry generally? Either way, seems like a super powerful effect on D-statistics.

(Reminds me of the pattern with the stats which David ran for me D(Pop,Corded Ware;BedouinB,Mbuti) where the populations Ashkenazi Jew and Adygei were negative (less shared drift with BedouinB than Corded Ware) while Georgian was equally positive to Icelandic. These populations who were together classed as high ENF were not close via a D-stat. Although those weren't stats weren't based on the transversions in CW, I think, but on Haak which may have been high quality enough that's not an issue.)

Also, I predict Sein will love these stats:

D(Lithuanian, Georgian; Pathan, Mbuti) D= 0.0051, Z= 3.294
D(RISE_baAfan, Georgian; Pathan, Mbuti) D= 0.0079, Z=2.512
D(Lithuanian, Iranian; Pathan, Mbuti) D= 0.0128, Z=7.764

At the same time, Georgian and LBK are closer to WHG than Pathan:

D(Georgian, Pathan; WHG, Mbuti) D=0.0108, Z=5.23
D(LBK_EN, Pathan; WHG, Mbuti) D=0.306, Z=10.026

And also consider (as you've pointed out):

D(WHG,LBK_EN;Pathan, Mbuti) D=0.0069, Z=1.71
D(WHG,LBK_EN;GujuratiA, Mbuti) D=0.0107, Z=2.409
D(WHG,LBK_EN;GujuratiD, Mbuti) D=0.0136, Z=2.961

Interesting that GujuratiA and D is more tilted towards WHG than LBK_EN than Pathan is. It's hard to see why that is.

I would say that's an indirect consequence of WHG to ENA affinity, where WHG is closer to ENA than LBK_EN is, as it seems unlikely the GujuratiA and GujuratiD have more actual ANE or WHG related ancestry than Pathans.

Chad Rohlfsen said...

SC Asians have a good amount of Bedouin like admixture.

Chad Rohlfsen said...

Gujarati are less Near Eastern than Pathans, and Pathans are less Near Eastern than Georgians. Not a real surprise. Tajiks may be significantly closer to Lithuanians than Pathans.

Alberto said...


Yes, indeed some weird results there. I think a good part of the weirdness of the ones with Syrian and Palestinian are because of the small amount of Sub-Saharan, especially when those population are in positions A or B and Mbuti is on D. I didn't think about it, but probably Mbuti is not a neutral outgroup for Near Eastern populations.

The Pathan-Lithuanian connection is interesting. It's difficult to say how strong it really is, but I'm surprised it seems stronger than with Afanasievo (in indirect comparison with Georgian).

But for the the most surprising is the last part you commented about. Going to the basics of Admixture, by K8:

WHG: 100% WHG
LBK_EN: 30% WHG, 70% Near_Easterner
Pathan: 1.3% WHG, 45% Near_Easterner, 33.5% ANE, 15.5% South_Eurasian, 2% Sub-Saharan

I can't find any workaround that explains the stat where Pathan is closer to WHG unless its 45% Near_Easterner is completely different from the 70% Near_Easterner in LBK_EN.

And in general, it looks like the gene flow from West Eurasians to S-C Asians is quite limited (if GujaratiD has basically 0% WHG and it has even less LBK_EN affinity, we're talking about almost 0 gene flow, unless I'm missing something obvious).

I wonder... so why the affinity S-C Asians with Lithuanians? Gene flow the other way around, but not mediated by Afanasievo??

Quite mysterious for me these results...

Alberto said...


SC Asians have a good amount of Bedouin like admixture.

But see this:

GujaratiD LBK_EN Palestinian Mbuti -0.0519 -18.272
Pathan LBK_EN Syrian Mbuti -0.0299 -11.908
WHG Syrian GujaratiD Mbuti 0.0244 6.87

While the Sub-Saharan and ASI might be influencing these stats, it still doesn't look to me like S-C Asians have much ENF (either Bedouin-like or EEF-like).

Chad Rohlfsen said...

Gujaratis have a good amount of ENA. You need to compare to other mixed pops. See here:

result: Mbuti Lithuanian Tajik_Pomiri Pathan -0.0129 -10.321 17050 17496 354212
result: Mbuti Georgian Tajik_Pomiri Pathan -0.0088 -7.393 17045 17347 354212
result: Mbuti LBK_EN Tajik_Pomiri Pathan -0.0125 -9.122 16963 17394 353603
result: Mbuti Kharia Tajik_Pomiri Pathan 0.0069 5.578 17079 16845 354212
result: Mbuti Vishwabrahmin Tajik_Pomiri Pathan 0.0056 4.889 17152 16960 354212
result: Mbuti Onge Tajik_Pomiri Pathan 0.0033 2.345 16918 16807 354212
result: Mbuti Yamnaya Tajik_Pomiri Pathan -0.0116 -8.859 17016 17415 352864
result: Mbuti Loschbour Tajik_Pomiri Pathan -0.0150 -7.655 16839 17350 351075
result: Mbuti BedouinB Tajik_Pomiri Pathan -0.0075 -6.282 16800 17052 354212
result: Mbuti BedouinB LBK_EN Pathan -0.0399 -24.303 16512 17886 353603
result: Mbuti BedouinB Spain_MN Pathan -0.0352 -16.885 16024 17194 343660
result: Mbuti BedouinB Loschbour Pathan -0.0161 -5.365 16675 17222 351075
result: Mbuti MA1 Tajik_Pomiri Pathan -0.0090 -4.755 12198 12419 253297
result: Mbuti Karelia_HG Tajik_Pomiri Pathan -0.0141 -7.545 16467 16940 341554
result: Mbuti Corded_Ware_LN Tajik_Pomiri Pathan -0.0129 -8.881 17014 17458 353010
result: Mbuti Bell_Beaker_LN Tajik_Pomiri Pathan -0.0129 -9.666 16750 17189 348317
result: Mbuti Atayal Tajik_Pomiri Pathan 0.0006 0.420 16986 16965 354212
result: Mbuti Karitiana Tajik_Pomiri Pathan -0.0055 -3.416 17001 17188 354212
result: Mbuti Cambodian Tajik_Pomiri Pathan 0.0003 0.249 16965 16954 354212
result: Mbuti Han Tajik_Pomiri Pathan -0.0009 -0.645 16944 16974 354212
result: Mbuti Lithuanian Kharia Pathan 0.0514 32.690 18436 16633 354212
result: Mbuti Georgian Kharia Pathan 0.0542 36.257 18409 16515 354212
result: Mbuti LBK_EN Kharia Pathan 0.0512 29.625 18333 16547 353603
***warning: repeated population: Mbuti Kharia : Kharia Pathan
result: Mbuti Vishwabrahmin Kharia Pathan -0.0135 -9.123 17293 17766 354212
result: Mbuti Onge Kharia Pathan -0.0477 -26.947 16576 18238 354212
result: Mbuti Yamnaya Kharia Pathan 0.0514 31.918 18381 16585 352864
result: Mbuti Loschbour Kharia Pathan 0.0391 16.950 18054 16697 351075
result: Mbuti BedouinB Kharia Pathan 0.0475 32.018 18026 16391 354212
result: Mbuti MA1 Kharia Pathan 0.0283 10.853 12916 12206 253297
result: Mbuti Karelia_HG Kharia Pathan 0.0427 18.239 17687 16239 341554
result: Mbuti Corded_Ware_LN Kharia Pathan 0.0541 29.863 18448 16556 353010
result: Mbuti Bell_Beaker_LN Kharia Pathan 0.0508 29.398 18102 16352 348317
result: Mbuti Atayal Kharia Pathan -0.0572 -30.730 16538 18544 354212
result: Mbuti Karitiana Kharia Pathan -0.0198 -9.848 17207 17902 354212
result: Mbuti Cambodian Kharia Pathan -0.0558 -32.426 16546 18502 354212
result: Mbuti Han Kharia Pathan -0.0552 -32.020 16549 18482 354212

Chad Rohlfsen said...

Sorry, for the repeats. I'm multi-tasking a little too much.

Alberto said...

Thanks Chad.

But I think those results rather insist in Pathans having little ENF (and they are at the top of S-C Asians, as you get into India it's less). For example:

Mbuti BedouinB Loschbour Pathan -0.0161 -5.365 16675 17222 351075

How can Pathans have much BedouinB, when Loschbour has 0% Bedouin admixture and it's still closer to Bedouin than Pathan?

Chad Rohlfsen said...

result: Mbuti Lithuanian GujaratiD Pathan 0.0211 14.578 17631 16904 354212
result: Mbuti Georgian GujaratiD Pathan 0.0208 14.848 17559 16845 354212
result: Mbuti LBK_EN GujaratiD Pathan 0.0240 15.670 17578 16754 353603
result: Mbuti Kharia GujaratiD Pathan -0.0213 -15.485 16742 17471 354212
result: Mbuti Vishwabrahmin GujaratiD Pathan -0.0142 -10.737 16938 17427 354212
result: Mbuti Onge GujaratiD Pathan -0.0157 -9.476 16718 17251 354212
result: Mbuti Yamnaya GujaratiD Pathan 0.0197 12.510 17558 16881 352864
result: Mbuti Loschbour GujaratiD Pathan 0.0155 6.945 17360 16830 351075
result: Mbuti BedouinB GujaratiD Pathan 0.0188 13.809 17258 16621 354212
result: Mbuti MA1 GujaratiD Pathan 0.0103 4.237 12462 12207 253297
result: Mbuti Karelia_HG GujaratiD Pathan 0.0168 7.478 16984 16424 341554
result: Mbuti Corded_Ware_LN GujaratiD Pathan 0.0198 11.594 17587 16904 353010
result: Mbuti Bell_Beaker_LN GujaratiD Pathan 0.0219 13.375 17330 16588 348317
result: Mbuti Atayal GujaratiD Pathan -0.0124 -7.378 16864 17288 354212
result: Mbuti Karitiana GujaratiD Pathan -0.0023 -1.270 17115 17196 354212
result: Mbuti Cambodian GujaratiD Pathan -0.0124 -8.073 16850 17274 354212
result: Mbuti Han GujaratiD Pathan -0.0127 -8.400 16841 17274 354212

Chad Rohlfsen said...

Because of the ASI, would be my guess. Look at them compared to ASI heavy folks. Much closer to Bedouins. Plus, it's Loschbour admixture into Bedouins, not Pathan admixture.

Chad Rohlfsen said...

For instance, Loschbour can go into Bedouins at 35-40%, maybe. Whereas it would be less than that for Pathans, and probably a little less Bedouin into Pathans than that figure for Loschbour. Dstats don't differentiate between direction of flow.

Chad Rohlfsen said...

More examples here. It doesn't matter the direction of flow. Loschbour like admixture in the Bedouins is stronger than any signal for a few pops. Either into, or from Bedouins. Look at the Caucasus pops! Even Yamnaya is further from Bedouins than Loschbour. Add a little Bedouin, Onge, and Atayal, then you have Pathans. That is why they are further. Not because they lack Near Eastern, but Loschbour is just a better fit into Bedouins than Bedouin is either way, for most of the others.

result: Mbuti BedouinB Yamnaya Loschbour 0.0039 1.178 16359 16233 349736
result: Mbuti BedouinB Karelia_HG Loschbour 0.0158 3.664 15544 15059 338485
result: Mbuti BedouinB Georgian Loschbour -0.0107 -3.551 16695 17057 351075
result: Mbuti BedouinB Lezgin Loschbour -0.0066 -2.166 16642 16865 351075
result: Mbuti BedouinB Kumyk Loschbour -0.0039 -1.273 16710 16841 351075
result: Mbuti BedouinB Nogai Loschbour 0.0096 3.086 16971 16648 351075
result: Mbuti BedouinB Atayal Loschbour 0.0641 18.520 18503 16272 351075
result: Mbuti BedouinB Onge Loschbour 0.0724 20.305 18638 16121 351075

Seinundzeit said...

Very interesting stuff. And yes, I am definitely pleased to see these stats (hat tip to Matt):

D(Lithuanian, Georgian; Pathan, Mbuti) D= 0.0051
D(RISE_baAfan, Georgian; Pathan, Mbuti) D= 0.0079
D(Lithuanian, Iranian; Pathan, Mbuti) D= 0.0128

Not to mention these stats, from Chad:

D(Mbuti, Corded_Ware_LN; Kharia, Pathan) D= 0.0541
D(Mbuti, Bell_Beaker_LN; Kharia, Pathan) D= 0.0508

D(Mbuti, Corded_Ware_LN; GujaratiD, Pathan) D= 0.0198
D(Mbuti, Bell_Beaker_LN: GujaratiD, Pathan) D= 0.0219

I think these provide nice verification of the qpAdm models.

Davidski said...

I have solid evidence that the qpAdm estimates of 60-70% steppe ancestry in the Hindu Kush might well be correct. I'll post it later today.

Seinundzeit said...

That sounds very interesting.

Alberto said...

Ok, so the explanation could be that ASI and Basal Eurasian are extremely distant from each other and have a huge effect on the stats?

So the stat:

D(WHG,LBK_EN;Pathan, Mbuti) D=0.0069, Z=1.71

could be explained by this other one:

LBK_EN WHG Dai Mbuti

being extremely negative?

Matt said...

Alberto: Ok, so the explanation could be that ASI and Basal Eurasian are extremely distant from each other and have a huge effect on the stats?

Could be (and the effect of even low level African admixture in the recent Near East for that involve those via IBD, etc).

I think David has noted before that D-stats place huge weight on how early populations branch from one another in the phylogeny, relative to other measures, so populations that branch early (or have ancestry from population that branch very early) from one another can be very distant by this measure.

it's hard to get a handle on how big effects are without literally running and looking at tons of different stats.

You could also look at:

D(WHG,LBK_EN;Pathan,Dai) or D(WHG,LBK_EN;Pathan,Onge)

Where the stat would probably find that LBK_EN would be closer than WHG to Pathan on the Pathan-Dai/Onge axis (e.g. Pathan-Dai axis / Pathan-Onge axis = what Pathan is after ASI admixture is taken into account).

And also D(WHG,Georgian;Pathan,Dai) and D(WHG,Georgian;Pathan,Onge)

Then again, these would only provide a very rough measure of ASI free closeness, and it might actually lose some "true" WHG affinity (because WHG shares some drift / phylogenetic position with Dai more than LBK_EN does, which is the same reason D stats D(WHG,LBK_EN;Pathan,Mbuti) are coming positive).

I think the classic ADMIXTURE clusters and PCA and what seem like essentially North Caucasus populations plus ENA on ADMIXTURE are systematically closer to WHG than Pathan is (e.g. Nogais ) has weight. But looking a single D-stats for such a central population, affinity to a single population can come from different sources, and I guess it is hard to adjust for each of them (that's an advantage of qpAdm, but you do have to feed it the informative comparison populations, and you can't have the same pop in left and right (or even closely related ones in both), which is its pitfall).

Alberto said...


Yes, I agree with all what you said, and it seems difficult to really grab the whole picture without spending hours doing/analysing tons of stats. They seems to be very sensitive to certain events (early branching, small amount of "exotic" admixture,...)

Since I don't have that possibility for now, from the stats above:

Mbuti Kharia GujaratiD Pathan -0.0213 -15.485 16742 17471 354212
Mbuti LBK_EN GujaratiD Pathan 0.0240 15.670 17578 16754 353603
Mbuti Loschbour GujaratiD Pathan 0.0155 6.945 17360 16830 351075

It seems that indeed ASI and Basal Eurasian are like cat and dog, while ASI and WHG get along significantly better. Again:

Mbuti Loschbour Kharia Pathan 0.0391 16.950 18054 16697 351075
Mbuti LBK_EN Kharia Pathan 0.0512 29.625 18333 16547 353603

Something to remember when looking at these kind of stats in the future.

Davidski said...

Here it is...

Balaji said...


Thanks for bringing up some interesting discussion. Davidski had calculated some months ago some of the same statistics that you requested but with Chimp instead of Mbuti. I compare them below

D(WHG,LBK_EN;Pathan,Mbuti) = 0.0069 z = 1.71
D(WHG,LBK_EN;GujaratiA,Mbuti) = 0.0107 z = 2.409
D(WHG,LBK_EN;GujaratiD,Mbuti) = 0.0136 z = 2.961

D(WHG,LBK_EN;Pathan,Chimp) = -0.002 z= -0.659
D(WHG,LBK_EN;GujaratiA,Chimp) = 0.0025 z = 0.793
D(WHG,LBK_EN;GujaratiA,Chimp) = 0.0055 z = 1,69

The results follow the same trend. However the statistics are shifted in a positive direction for Mbuti with respect to Chimp. I think this suggests a small amount of LBK_EN admixture in Mbuti.

I have also given some of the other interesting numbers that Davidski calculated below.
D(WHG,LBK_EN;Dai,Chimp) = 0.0167 z = 4.828
D(WHG,LBK_EN;Motala_HG,Chimp) = 0.0948 z = 25.802
D(WHG,LBK_EN;EHG,Chimp) = 0.0571 z = 13.719

Dai prefer WHG to LBK_EN because LBK_EN has BEA.

The ENA in Pathans will shift the D statistics in a positive direction but to do appreciably will probably require more than the 9% that Davidski has estimated using qpAdm. The approximately 30% ASI estimated by Reich labs should do the trick.

Sein loves the following statistic.

D(Lithuanian, Georgian; Pathan, Mbuti) D= 0.0051, Z= 3.294

Part or perhaps all of the reason for this statistic is the ASI in Pathans. This makes them prefer Lithuanians who have less BEA than Georgians.

Chad Rohlfsen said...

But, then there's this...

result: Mbuti Lithuanian Georgian Pathan -0.0181 -14.882 16997 17622 354212
result: Mbuti Lithuanian Armenian Pathan -0.0155 -12.174 17066 17605 354212
result: Mbuti Kharia Georgian Pathan 0.0251 20.835 17366 16515 354212
result: Mbuti Kharia Armenian Pathan 0.0270 22.271 17427 16511 354212
result: Mbuti Kharia Lithuanian Pathan 0.0162 12.346 17180 16633 354212
result: Mbuti Kharia Yamnaya Pathan 0.0141 8.526 17059 16585 352864
result: Mbuti Onge Georgian Pathan 0.0208 14.341 17194 16495 354212
result: Mbuti Onge Armenian Pathan 0.0218 15.112 17244 16506 354212
result: Mbuti Onge Lithuanian Pathan 0.0107 6.974 16993 16631 354212
result: Mbuti Onge Yamnaya Pathan 0.0086 4.266 16869 16583 352864

Balaji said...

Thanks Chad for providing these statistics. They suggest that Pathans must have significantly more ENA than the 9% in Davidski's qpAdm modeling of them.

Chad Rohlfsen said...

I agree. The South Asian cluster, no matter if it is 74% or 88%, always comes out just over 60% ENA. I've checked three unsupervised tests. Paniyas were at 74,84, and 88% of this cluster. Depending on the amount, dictates how much excess shows up in Onge, Papuan, and Atayal. I used Kharias (76% ENA on Admixture and qpAdm) to gauge the amount, in all three tests, and all three are about as identical as can be. Kharias always over 60% of what the Paniya get(plus much more Atayal in excess), and Pathans are getting 33% of what the Paniya do, over and over. I think the real number for the Pathan is closer to 20% ENA.

Matt said...

Interesting stats. Despite Lithuanians rather than Georgian sharing more drift with Pathan on the Pathan-Mbuti axis, Georgians rather than Pathans are much closer to Lithuanians on the Lithuanian-Mbuti.

Or put another way, the Georgians have a bit more non-Pathan relatedness than Lithuanians, while the Pathans have a quite a bit more non-Lithuanian relatedness than Georgians.

D(Mbuti Lithuanian; Georgian Pathan) D=-0.0181 -14.882
D(Lithuanian, Georgian; Pathan, Mbuti) D= 0.0051, Z= 3.294

At the same time, although they have more Lithuanian relatedness than Pathans, Georgians seem to have WHG relatedness drift:

D(Georgian, Pathan; WHG, Mbuti) D=0.0108, Z=5.23

Also Georgians further from the LBK side of Europeans than WHG are -

D (WHG, Georgian; LBK_EN, Mbuti) D= 0.0082,Z = 1.984