search this blog

Sunday, March 6, 2016

D-stats/4mix tour of ancient Eurasia

This 4mix experiment is based on a series of statistics of the form D(Chimp,Reference_pop/Test_pop)(Mbuti,X), where X represents one of 9 ancient and present-day outgroups. The input data is available here. Feel free to try it yourself and post your models in the comments below.

Here's a Principal Component Analysis (PCA) based on the D-stats. As far as I can see, it makes very good sense. Click to enlarge.

See also...

Yamnaya = Khvalynsk + extra CHG + maybe something else

PC/nMonte open thread


Matt said...

Nice. Will need to have a further look but look like the stats fit what would be expected for position on European PCA and relatedness between populations and with ancients.

Vaguely on topic, and continuing from the last post, btw, I was using 4mix and target stats D(Chimp,Ancient)(Mbuti,Ref) for Ancients WHG, Samara, MA1 and Kostenki14 to try and estimate how much WHG is in the Early and Middle Neolithic farmers, by trying to fit the target stats with combinations of La Brana, Karelia, Satsurblia and BedouinB.

The results tended to be pretty consistent whether I used (La Brana, Karelia, Satsurblia, a copy of Satsurblia), (La Brana, Karelia, BedouinB, Satsurblia) or (La Brana, Karelia, BedouinB, a copy of BedouinB), within 5-10%. The Satsurblia and BedouinB were proxies for EN's ancestors before they picked up any WHG.

The results tended to around 20% WHG in Early Neolithic (more in Spain, slightly less in Hungary), 40% in the Middle Neolithic samples, and 30% in the Copper Age Iceman from Italy.

So I guess I'd estimate from that, that the Anatolian_Neolithic should have around 10% WHG, since the EN are supposed to have picked up another 10% in Europe, which gets us up to the 20% for Early Neolithic, and then another 10-20% by the Middle Neolithic, which finally gets diluted by half in the Yamnaya expansion to get to roughly the present day levels (with a little resurgence around the Baltic / Eastern Europe and Basque?).

Another thing I tried to do then with those was test them by use regression equations on the results for WHG level vs the stats I had to generate a set of estimated stats for what the pre-HG Neolithic farmers should be. That set of stats sat in the right place on a PCA, kind of parallel to the EN and WHG but displaced further away from WHG. And crucially, statistically that stat construct was more or less equally related to WHG and Samara_HG.

I'd like to see if the set of groups and stats you're using here can be used to make the same estimates at some point, as well!

Krefter said...

What outgroups did you use?

Grey said...

looking good

Matt said...

Hey, Krefter, I used the Outgroups:

Samara_HG, WHG*, Kostenki14 and MA1

* WHG was average of the stat for Bichon and Loschbour

This was with the D(Chimp,Ancient)(Mbuti,Ref) stats.

But I basically got the same result with the full set of Outgroups:

Samara_HG, Loschbour, Bichon, Kostenki14, MA1, LBK_EN, Kotias, Yamnaya


The difference was around +3% more La Brana in each group with the full set of stats.

Btw. The "Estimate 0% WHG Neolithic Farmer" I generated using regressions on the results from the fit with the full outgroups had the following stats:

D(Chimp,LBK)(Mbuti,Est0%)= 0.4493
D(Chimp,Loschbour)(Mbuti,Est0%)= 0.3692
D(Chimp,MA1)(Mbuti,Est0%)= 0.3616
D(Chimp,Samara_HG)(Mbuti,Est0%)= 0.3724
D(Chimp,Kotias)(Mbuti,Est0%)= 0.3942
D(Chimp,Yamnaya)(Mbuti,Est0%)= 0.3914
D(Chimp,Kostenki14)(Mbuti,Est0%)= 0.3824
D(Chimp,Bichon)(Mbuti,Est0%)= 0.3705

So more or less
- equally related to Loschbour, Bichon, and Samara
- equally related to Kotias and Yamnaya (and most strongly related to them)
- least related to MA1, Kostenki14 intermediate relatedness to the HGs and the Neolithic / Post Neolithic.

Krefter said...

The above percentages are the most accurate I've seen. In my spreadsheet Lithuanians didn't fit well as Sweden MN+Yamnaya. When I added WHG and SHG, it chose SHG, and Lithuanians came out 20%. I did the same for Basque, and they also chose SHG over WHG. So, it's really hard to say whether this means the high WHG affinity in Balts/Estonians is due to SHG or WHG admixture.


I don't understand the second part of your post. LBK was modeled as 0% WHG? And is equally related to EHG and WHG?

Matt said...

Krefer, sorry. To explain:

So I modelled the Early and Middle Neolithic as WHG plus Bedouin / Satsurblia as above. That gave results of around around 20% WHG in Early Neolithic (more in Spain, slightly less in Hungary), 40% in the Middle Neolithic samples, and 30% in the Copper Age Iceman from Italy.

From that, I could estimate what the stats would be, for the outgroups, for an Early Neolithic population, if it had 0% WHG (using a regression line equation).

That's what the stats are for D(Chimp,Ancient)(Mbuti,Est0%) that I posted up. An estimate of each stat for an ancient Neolithic farmer population would have, if it had 0% WHG, but was otherwise like the Early and Middle Neolithic groups.

Matt said...

Btw, Krefter as a last thing using the stats for that virtual "0% WHG Neolithic farmer" population to generate proportions generates pretty good PCA, like this -

The reason seems to be that the stats for the virtual Neolithic farmer can mix nicely with the Satsurblia stats to accommodate the Middle East, which the European EEF can't do because they are too related to WHG. It's a bit of an imperfect mix (judging by the D-stats) but sort of works.

I guess that's just virtual until they dig up and test the right bones though ;)

Davidski said...

OK, the link to the datasheet is now up, and I've also posted a few extra results.

Seinundzeit said...

This does seem to be the most accurate set of estimates we've seen yet, everything makes sense.

It'll be interesting to see Iranians in this context.

truth said...

It's not a bit redundant to have both a Caucasus HG component and Yamanya ? The latter aren't also half CHG ?

Davidski said...

Not at all.

Splitting Indo-European CHG from pre- and post-Indo-European CHG is essentially what this analysis is all about.

Rob said...


Do you suppose that Anatolian farmer -like people didn't reach India, or do you suspect it's subsumed somewhere within the yet - to-be-defined "Dravidian' component.

Also, can you throw in BA_Hungary as one of the ancients ?

Davidski said...

As far as I can tell, Kotias does have a few percent of Anatolia Neolithic-related ancestry. And so do the Karasuk guys. So the Anatolia Neolithic admixture that shows up in Indians in ADMIXTURE analyses is probably related to that.

I'm sure Krefter and Matt will try some models, including also with Hungary BA. I gotta run for the time being.

Krefter said...

I combined my outgroups with Davidski's. I have a CHG outgroup. When I tried to make modern Europeans and West Asians CHG/Yamnaya/EEF/WHG, the fits weren't good for West Asians and some South Europeans. But when I replaced CHG with Cypriot the fits improve.

IMO, in order to get accurate CHG percentages, you need a CHG outgroup. To get accurate Middle Eastern percentages in general, we need many Middle Eastern outgroups. We need close relatives of Cypriot as an outgroup, to know whether those Cypriot percentages are realistic. Cypriot in my test, is nothing more than Early Neolithic European with less affinity to WHG and EEF. The Cypriot ancestry percentages might not be from Cypriot-like people.

Krefter said...

Sicilians are probably mostly descended of non-EEF people from West Asia. Look at these results.

WestSicily: 28% CHG + 5% Yamnaya + 0% WHG + 67% Sardinia @ D = 0.024
WestSicily: 81% Cypriot + 4.99999999999999% Yamnaya + 4% WHG + 10% Sardinia @ D = 0.0055

In D-stats, they have a similar relation to North Eurasian hunter gatherers(WHG, EHG, MA1) and non-West Eurasians as Early Neolithic, but their relationship to EEF is about as strong as North European's relation to EEF. The only explanation, is they mostly descend from people who had a similar amount of Basal Eurasian as EEF but weren't EEF.

Cypriot is a much better Basal Eurasain/Near Eastern proxy than Sardinia and CHG. This is not the case of other Italians(Tuscan, Bergamo). Maybe if we had Cypriot outgroups, Cypriot wouldn't be a good proxy anymore.

Whether the Near Eastern(xEEF, CHG) ancestors of Sicilians were like Cypriot or not doesn't matter, all we know is they probably lived in the Levant or maybe Mesoptamia. CHG makes the Caucasus unlikely, and North Africa/Arabia is unlikely because Sicilans have very little Sub Saharan ancestry.

Davidski said...

Whether the Near Eastern(xEEF, CHG) ancestors of Sicilians were like Cypriot or not doesn't matter, all we know is they probably lived in the Levant or maybe Mesoptamia. CHG makes the Caucasus unlikely, and North Africa/Arabia is unlikely because Sicilans have very little Sub Saharan ancestry.


The problem you're talking about is indeed caused by Sub-Saharan ancestry in much of the Near East and North African/Near Eastern ancestry, which includes Sub-Saharan ancestry, in much of Southern Europe. That's because Sub-Saharan ancestry is phylogenetically very distinct, so even when it's minor, it has a significant impact.

So don't worry about CHG or Cypriot outgroups for now; try modeling most Near Easterners and Southern Europeans as partly Bedouin, Mozabite and Yoruba.

But it won't be easy. That's why I only attempted to model a few Near Eastern populations, and left out Southern Euros like Sicilians.

George Okromchedlishvili said...

Great Stuff!

My only suggestion is to use something Sintashta-like for Caucasus pops. I believe that a large degree of Northern Euro like ancestry they carry came relatively late when there was enough blondism in th IE-carriers. So Yamnaya is a little too early IMO.

Matt said...

Ah. Interesting choice of outgroups / stats: Eastern_HG, Esperstedt_MN, Han, Hungary_EN, Iberia_Chalcolithic, Iberia_EN, Iberia_Mesolithic, Iberia_MN, Ju_hoan_North, Karitiana, LBK_EN, Motala_HG.

That covers most of the bases except a CHG outgroup. I kind of don't feel you need as many EN / MN to get correct results and I'm not sure if having as many in will change the results a little (have to run a few experiments using more and less to see how it systematically changes). Plus if they're not used as outgroups, they can be used in the 4mix run.

Personally I still feel I kind of prefer using one of EHG as a D-stat (outgroup) and then one in the run, and the same for the CHG. I do think a CHG outgroup is important, as I don't quite trust it to get it right without it (have to run a few checks on that though). Although I can understand that doing it your way here maximizes some sample sizes.

Looking at the datasheet and also looking at the results in PCA view, this is pretty clearly more useful than either just using the ancient stats or outgroup stats for working out South Asia and the populations with East Asian contributions. The only stat that really doesn't matter (contribute to any variance) is Ju_Hoan_North, because it's basically around 0 for all populations. I don't know if a Yoruba group wouldn't be helpful instead (since some ME populations seem to have actually West African admixture?), then use another African population in place of Yoruba in the 4mix run.

You can see why the combinations of the Karasuk_subset, CHG, AN and Dravidian India should work, since they can contain all the South Asian populations.

OTOH Ust Ishim would also seem like it should work with Karasuk subset, although fail with Sintashta. Dravidian seems a little below cline for a straightforward Dai+CHG mix, which no doubt reflects the slight divergences between the South Asian ancestors and the ENA outgroups.

Davidski said...

If you work out which of the outgroups are superfluous, and better used as test pops, I can edit the datasheet accordingly.

I'm hesitant to break up Caucasus_HG into Kotias and Satsurblia, because the latter is low coverage and has much less than 500K markers. The tests I ran gave me the impression that Satsurblia as an outgroup was a bit wobbly.

Davidski said...


Good fit for Iranians (probably shouldn't be taken too literally though).

[1] Target = 17% Anatolia_Neolithic + 6% Caucasus_HG + 25% BedouinB + 52% Kalash @ D = 0.0023


Best fit for East Sicilians that I've been able to find is this one:

[1] Target = 32% Anatolia_Neolithic + 40% Armenian + 8% Mozabite + 20% Poltavka @ D = 0.0047

Alberto said...

Thanks, these numbers look good. Though it's hard to say if they work equally good for all populations. For example, Spanish_Extremadura:

42% Anatolia_Neolithic + 40% Caucasus_HG + 15% Loschbour_WHG + 3% Yamnaya_Samara @ D = 0.0041

It looks like the minor SSA boasts the Caucasus_HG at the expense of Yamnaya. But no idea if that can be avoided with a different choice of outgroups.

I agree with Matt that having an EHG in the test pops would be nice too, or at least MA1.

I took a shot at what Karasuk_subset would look like:

7% Dai + 42% Okunevo + 44% Sintashta + 7% Caucasus_HG @ D = 0.0024

Or using only Okunevo + Sintashta:

43% Sintashta + 57% Okunevo @ D = 0.0107

Alberto said...

Probably what we need for that problem with SSA or ENA (in European populations) is to hack the script into a 5mix rather than choosing other outgroups. The script looks simple enough, but probably the one who wrote it would be the best option to make that change (or at least someone who does R scripting regularly).

Davidski said...

I've put MA1 into the datasheet.

But I'm not sure how useful he'll be, because like I say, single low coverage genomes appear to produce wobbly results in this analysis.

I'll ask about turning 4mix into 5mix.

FrankN said...

@Dave: "Splitting Indo-European CHG from pre- and post-Indo-European CHG is essentially what this analysis is all about."

Not sure it's that easy. The Steppe provides a credible link between Balto-Slavic and Indoaryan. Tocharian can also be explained via that Steppe connection, its distinctiveness is potentially explainable by having taken up quite some East Asian substrate.
Germanic can as well be linked, via CWC and especially Unetice->Nordic BA.

So far, so good. But now come the problems:

- Anatolian languages, for their distinctiveness assumed to have been the first split out of PIE. Well, Hithitte is known for substantial Hurrian influence, so its distinctiveness may possibly be explained by strong Hurro-Urartian substrate, rather than it being a very ancient IE branch. We have Central European domestic pigs, with genetic signature pointing to Germany, entering Armenia and Eastern Anatolia around 2000 BC, and can trace back intensive pig-breeding to Funnelbeakers and GAC. That's so far only pig aDNA - human aDNA from Kura-Araxes and Hithittes would be a great help. Still, those BA pigs provide a tentative link between Balto-Slavic and Armenian /Anatolian. But, if Yamnaya is the culprit, i.e. the original source of PIE, why does your analysis have 0% Yamnaya with Armenians?

- Italic is commonly assumed to have already been spoken before the transalpine Urnfield expansion (12 cBC) that, as far as archeology tells, reached Northern Italy, but didn't cross the Apennines. Hence, Ital_Bergamo, with a reasonable Yamnaya admix, still fits the picture, also provides a plausible explanation for the distinctivenes of Venetic. But further south, especially on Sicily, there is neither genetic trace of Yamnaya, nor AFAIK any archeological evidence of linkage to CWC, Unetice atl. Those Siculians and Messapians must have arrived over the Mediterranean, ultimately from Anatolia, the Levante or Cyprus, and heavily CHG loaded.

- Greek: Still waiting for BA/IA aDNA. But considering that modern Greeks probably received quite a bit of Slavic, possibly also Gothic genes, both of which would have been Yamnaya-loaded, your stats aren't really putting forward a strong case for them having become IEed from the Steppe. Same applies to Albanian, and thus possibly also Illyrian.

- Celtic: That great mystery. Early and well attested in NW Iberia, and in the Western Alps, but hardly a toponymic or epigraphic trace in-between, and the land connection blocked by non-IE (Proto-)Basques and Iberians. Seems they originally travelled by boat..
If available, I would love to see Ligurians, Swiss, Galicians and (North) Portuguese incorporated in your table as well.
So far, it seems that those Iberians with the highest Yamnaya to CHG ratio, i.e. Catalans and especially Basques, were the last to become IEed. That doesn't bode well for Vasco-Caucasian, but neither for the Steppe theory.

Let's put it like this: Your analysis seems to declare the Steppe theory as officially dead, and puts the focus back on Anatolia and/or Armenia. I don't believe in EEF having been PIE speaking (all those North African pastoralists happen to speak Afroasiatic), while Armenian stands a bit too isolated to make it a strong candidate for the IE Urheimat.

There is one idea I have, but before I elaborate on it, however, I would be grateful if you, or somebody else, could do an alternative run, where (a) Yamnaya is replaced by CWC or any of the CWC-like BA Steppe populations, and (b) the analysis includes BA Armenians, Cypriots, and Turkish subpopulations from areas where Anatolian languages were once spoken.

Alberto said...

Thank you David.

You are right that MA1 is not too useful. For example, in Norwegians trying to use MA1 in the place of Yamnaya (to act as EHG), the result doesn't make a lot of sense:

52% Anatolia_Neolithic + 0% Caucasus_HG + 30% MA1 + 18% Loschbour_WHG @ D = 0.0072

With S-C Asians it doesn't help much, though it doesn't hurt either. For example, Kalash take 2% when using MA1 in the place of Anatolia_neolithic, but the result doesn't improve:

17% Dravidian_India + 46% Caucasus_HG + 2% MA1 + 35% Karasuk_subset @ D = 0.0026

I wanted to check the SSA in Iberia with this method, so I just went to model Iberians as a mix of Basque + North Italians + Yoruba:

68% Spanish_Pais_Vasco + 30% Italian_Bergamo + 2% Yoruba @ D = 0.0029

Spanish Aragon:
74% Spanish_Pais_Vasco + 25% Italian_Bergamo + 1% Yoruba @ D = 0.0025

Spanish Castilla - La Mancha:
42% Spanish_Pais_Vasco + 57% Italian_Bergamo + 1% Yoruba @ D = 0.0037

Spanish Catalonia:
66% Spanish_Pais_Vasco + 33% Italian_Bergamo + 1% Yoruba @ D = 0.0035

Spanish Extremadura:
53% Spanish_Pais_Vasco + 45% Italian_Bergamo + 2% Yoruba @ D = 0.003

Spanish Castilla y Leon:
69% Spanish_Pais_Vasco + 29% Italian_Bergamo + 2% Yoruba @ D = 0.0024

Spanish Valencia:
71% Spanish_Pais_Vasco + 28% Italian_Bergamo + 1% Yoruba @ D = 0.0028

Spanish Galicia:
28% Spanish_Pais_Vasco + 70% Italian_Bergamo + 2% Yoruba @ D = 0.0028

Krefter said...


With outgroups MA1, WHG, EEF, CHG, EHG, I modeled Europeans as MA1+HungaryEN+Bichon. ANE/MA1 scores were basically the same as in ANE K8, for modern and ancient samples. WHG scores were much lower and EEF much higher. Highest WHG was in Lithuanians at I think 33%. Everyone was better with some MA1 except Middle Neolithic samples.

Also, check out out my spreadsheet. CHG fits well as Basal Eurasian+EHG. With a Basal Eurasian(little more distant from Eurasians than Ust-Ishim), all West Eurasians are better off with WHG+EHG not just WHG.

Alberto said...


Here are a few fits with CW, BB and Unetice for just a few populations. I'll try to run more as time permits, but if you are interested in some more specific ones, post then here. From this sheet provided by David:

Choosing any population from the left column as Target and then 4 other populations from that same left column as the admix populations.

33% Anatolia_Neolithic + 23% Caucasus_HG + 3% Loschbour_WHG + 41% Corded_Ware_Germany @ D = 0.0047
24% Anatolia_Neolithic + 24% Caucasus_HG + 2% Loschbour_WHG + 50% Unetice @ D = 0.0044

English Cornwall:
27% Anatolia_Neolithic + 3% Caucasus_HG + 5% Loschbour_WHG + 65% Corded_Ware_Germany @ D = 0.0058
12% Anatolia_Neolithic + 4% Caucasus_HG + 2% Loschbour_WHG + 82% Unetice @ D = 0.0045
6% Anatolia_Neolithic + 0% Caucasus_HG + 0% Loschbour_WHG + 94% Bell_Beaker_Germany @ D = 0.0062

33% Anatolia_Neolithic + 11% Caucasus_HG + 7% Loschbour_WHG + 49% Corded_Ware_Germany @ D = 0.0048
22% Anatolia_Neolithic + 12% Caucasus_HG + 5% Loschbour_WHG + 61% Unetice @ D = 0.0041
17% Anatolia_Neolithic + 9% Caucasus_HG + 4% Loschbour_WHG + 70% Bell_Beaker_Germany @ D = 0.0056

45% Anatolia_Neolithic + 55% Caucasus_HG + 0% Loschbour_WHG + 0% Corded_Ware_Germany @ D = 0.0048
45% Anatolia_Neolithic + 55% Caucasus_HG + 0% Loschbour_WHG + 0% Unetice @ D = 0.0048
45% Anatolia_Neolithic + 55% Caucasus_HG + 0% Loschbour_WHG + 0% Bell_Beaker_Germany @ D = 0.0048

25% Anatolia_Neolithic + 62% Caucasus_HG + 0% Loschbour_WHG + 13% Corded_Ware_Germany @ D = 0.0124
23% Anatolia_Neolithic + 63% Caucasus_HG + 0% Loschbour_WHG + 14% Unetice @ D = 0.0124
22% Anatolia_Neolithic + 63% Caucasus_HG + 0% Loschbour_WHG + 15% Bell_Beaker_Germany @ D = 0.0127

Alberto said...


But where those fits with this latest results sheet provided by David? Here there is no EHG. MA1 was added by David now, but the fits I tried above didn't prove too helpful.

Alberto said...

BTW, Cypriots don't take any Yoruba, but they do take some Dravidian_India (but no additional Kalash, Sindhi or Dai):

48% Anatolia_Neolithic + 47% Caucasus_HG + 4.99999999999999% Dravidian_India + 0% Kalash @ D = 0.0041

Simon_W said...


Hittite does have Hurrian loans, besides Indo-Iranian and Semitic ones, but I would presume the main substrate was Hattic, because that was the substrate language in the area where the Hittite empire was centered.

The lack of steppe-related admixture in modern Armenians has been noted many times before and been used as an argument against the steppe theory. However according to formal stats David presented on this blog Bronze Age Armenians did have some steppe admixture. The question remains if these spoke a language ancestral to Armenian...

As for Italic, there was the Protovillanovan culture that flooded Italy in the 12th century BC from southern Switzerland to the tip of Calabria and even entered northeasternmost Sicily. In pottery and funerary rites it had affinities to the central European Urnfield culture, especially to the eastern groups. I would say it's very likely that this was somehow connected to the spread of Italic. The question is if it just involved the Oscan-Umbrian group and if the Latino-Faliscan group must be explained otherwise, or if it included both these different groups. If Latino-Faliscan had a different origin it's still not quite clear which one. As for Venetic, some include it to Italic, others don't... but it surely belonged to the closer relatives of the other Italic groups. However, Venetic wasn't spoken in the surrounds of Bergamo. And note that according to David's table Tuscans (from central Italy) have just 2% less Yamnaya ancestry than Italians from Bergamo. Indeed there is this steppe admixture in the entirety of mainland Italy and Sicily, progressively getting weaker towards the south, but not a lot weaker, and nowhere as weak as on Sardinia. I'm not sure where you're getting the idea that there is no steppe admixture on Sicily. Sure there is strong West Asian and CHG in southern Italy and Sicily, but it's not clear why this must be from the Sicels and not from a more ancient substrate that spoke another language. The Messapians are said to be related to Illyrians and appear to have come from the southern Balkans.

As for Albanians, well the table says 22% steppe, that's not a negligible amount. This doesn't really suggest a non-steppic West Asian origin, although it's not completely favouring the steppe theory either. However, on David's Polishgenes blog there was recently posted a PCA with two ancient Montenegrin samples, one from the LBA, the other one from the Iron Age. While the former still had quite southwest European-like position, low on steppe and low on CHG, the latter had a strong shift towards LNBA central Europe. And considering his date and location he's likely to have been an Illyrian.

I think the high Yamnaya to CHG ratio in non-IE Basques and late IE Catalans doesn't really speak against the steppe derivation of the Celtiberians – as long as the absolute amount of Yamnaya is lower in Basques. Only the absolute amount matters, as this can't be somehow negated by the simultanous presence of moderate levels of additional CHG.

Rob said...

The conundrum is that the cremating proto-Villanovian culture is where Etruscan arises
Rather, italic appears in the less state tidied settlements around it, where inhumations persisted
To me this suggests there were still some non-IE around Central Europe as late as LBA

Krefter said...


I used outgroups: EHG, WHG, CHG, EEF, and MA1. I guessed the closeness someone from MA1's population would have with MA1. I guessed 0.47. I also tried 0.45, which got the basically the same results. The 30% MA1 score for Norwegians doesn't make sense, I'll look into it. The scores

Krefter said...

The CHG scores are way off. IMO, no one south of the Caucasus is 30%+ CHG. IMO, CHG for the Caucasus is like WHG for Europe. Georgians are probably like 20-30% CHG. I also wouldn't be surprised if Yamnaya received CHG from a CHG+unknown Middle Eastern hyprid. Using only ancient outgroups, Yamnaya fits best with 10-20% EEF.

Davidski said...

Georgians are probably like 20-30% CHG.


Davidski said...

Krefter, have a look at this, and think about it carefully.

George Okromchedlishvili said...

That would make us plot near Sardinians which is not the case.
CHG would not be replaced to that large degree since it lived in relatively isolated areas.

Davidski said...

All of the analyses we've seen to date, including the PCA, suggest that Georgians are mostly CHG.

They're probably around 60% CHG overall.

Grey said...

"The Sea Peoples were conjectured groups of seafaring raiders,[1][2] usually thought to originate from either western Anatolia or from southern Europe, specifically from a region of the Aegean Sea."

The odd thing about the sea peoples is their attacks are recorded but the source cultures aren't which might imply the source cultures were trashed by somebody else as part of the same process.

If the farmers around the Black Sea had a maritime tradition and if they were being raided by IE could some of them have moved away by sea?

Matt said...

@ Alberto: MA1 is useful for Native Americans and Beringians. Long story short: he gives the right level of relatedness to EHG, without as the excessive relatedness to other West Eurasians that using EHG would give. Conversely, it's less useful for West Eurasians for the opposite reason.

@ Davidski: Re: Satsurblia and using either of the CHG as an "ingroup" / "outgroup" on its own, due to the overlap, that makes a lot of sense. It's a shame though, as I do think it makes the modelling quite difficult for the Near East. It seems to me that when it comes for 4mix to apply proportions, a lot of the modern Near Eastern populations who just don't really share that much with CHG (judging by the D(Chimp,Pop)(Mbuti,Kotias) stats)seem to end up picking up quite a lot of it just because it's the least HG admixed Near Eastern, and it's relatively far away from the LBK_EN while still being recognisably Near Eastern, because there's nothing to tell the program that actually they're not as much like CHG.

Btw, I found just using LBK_EN instead of the full complement of EN+MN groups didn't change the fits for me, at least for Europe when I tested it.

Davidski said...

OK, hang on.

Kotias and Satsurblia are both real diploid genomes, so I can try and make them into pseudo diploids, and that way, at least in theory, I'll have two high coverage CHG sets.

Also, I'll run all of the MN/EN samples, except LBK_EN, as test pops and put Yoruba in as an outgroup.

Arch Hades said...

So it looks like the Southern Yamnaya sample has less EHG ancestry than the Samaran one.

Davidski said...

Yes, overall, but probably not ~20% more. That's probably due to deaminantion. The real figure is around ~10%.

In any case, they still form a clade and cluster together in PCA. So the PCA from Jones showing some of the Yamnaya clustering near North Caucasians was way off, probably due to a technical fault with projection.

No other PCA has shown anything like that.

Davidski said...

It seems to have worked.

I'll post a couple of new datasheets later today. Here are some interesting models:

Armenians: 20% Anatolia_Neolithic + 4% Caucasus_HG + 14% BedouinB + 62% Armenia_BA @ D = 0.0066

Georgians: 39% Anatolia_Neolithic + 38% Caucasus_HG + 13% Dravidian_India + 10% Andronovo_full @ D = 0.0032

Sardinians: 67% Anatolia_Neolithic + 18% Druze + 10% Loschbour_WHG + 4.99% Yamnaya_Samara @ D = 0.0069

Davidski said...

Here are the new datasheets with Caucasus_HG2 as an outgroup.

Simon_W said...

I'm not so convinced of that argument. This is a map of the main archaeological sites with Protovillanovan material:

Although there is quite a dense concentration in the later Etrurian heartland, it's not too obviously confined to that area. So we may also see here an IE, Italic substrate in Etruria.

And besides, Caere and Populonia had mixed inhumation/cremation rites already in the 9th century.

And at least for about 150 years during the Final Bronze Age, cremation used to be the predominant custom in almost all of central Italy, and is also well attested for southern Italy. The bounce back of inhumation customs occured in southern and east-central Italy during the early Iron Age, in Latium vetus after ca. 830 BC.

Rob said...


Exactly - my point wasn't that cremation should = Etruscan, rather that cremation (influence from Urnfield) doesn't simply = IE, as a major chunk of it centres on Ertruria

Simon_W said...

I found somewhat curious that in the original datasheet Azeris from Baku are the population with the largest CHG share, amounting to no less than 78%, far more than Georgians got in that analysis. If correct they might be the only still predominantly CHG population in existence.

Davidski said...

Can't reproduce that with the new sheet. I'm seeing stuff like this with the lowest distances (although not that low).

Azeri_Baku = 35% Anatolia_Neolithic + 25% Caucasus_HG + 24% Dravidian_India + 16% BedouinB @ D = 0.008

Might need a 5mix for some of these populations, otherwise it won't be possible to isolate the ancient components properly.

George Okromchedlishvili said...

My guess is that ENA admixture in Georgians and especially Azeris is inflating their assigned Dravidian scores.
On the other hand Azeris and Northern Iranians indeed look like a very good proxy for mostly CHg-derived folks.

Davidski said...

Yeah, now that we have an ancient Caucasian reference and outgroup, an ancient South Asian reference and outgroup would also be really useful to balance things out.

Gill said...

Speaking of which, whenever I include MA1/ANE as a component in various admixture experiments (in addition to Gedrosian/Caucasian/EHG/Arctic/etc), it only seems to draw significant admixture in South Asians.

I still think there's a high-ANE (possibly high ANE and "ASE") input population for South Asia we may not be accounting for, or it could just be that the ANE admixture in South Asia is different from other places.

Any thoughts?

truth said...

Seems like the CHG compoent correlates well with y-dna haplogroup J.

Krefter said...

BTW, Sindhi is listed in the spreadsheet twice. 4mix doesn't allow dublicate rows, so to run 4mix you have to delete one of the Sindhi rows.

Davidski said...

There's a similar package to 4mix here that allows more than four populations in each test.

Matt said...

@ Davidski: Thanks for this. Don't have much time tonight, however, had a go using the new stats with the CHG_2 group. They generate pretty nice proportions, that are like expected with CHG, WHG, Yamnaya and Anatolia_Neolithic.

The proportions for the above look right - trying to graph them to a PCA is tougher though. Just putting those proportions above into a PCA generates -, because the PCA doesn't "know" to put close together the CHG and Yamnaya population, so instead puts the CHG much closer to the Anatolian_Neolithic (which as a proportion it is in correlation with), and populations like the Lezgins who model as 28 CHG, 0 WHG, 44 Yamnaya and 28 Anatolia_Neolithic get displaced far from populations like Georgian with 45 CHG, 0 WHG, 15 Yamnaya and 40 Anatolia Neolithic.

(One proportion that did seem to stand out was Sintashta seeming to have a bit more CHG than I would expect, although not hugely).

If I then take the proportions and manually transform the Yamnaya into 50:50 EHG:CHG, though, and add a 100% EHG group, it does look pretty nice -

(Proportions for both above -

(Btw, for the above PCAs I put a cutoff for the Druze's D statistic fit on the above, and anything with less fit I didn't include).

With that in mind, wondering if it would be worth trying a version of the datasheet which used EHG one of the row populations, rather than column. I'm thinking it might be possible for Motala as a column contrasted against with Iberian_Mesolithic, and on the other hand Karitiana as contrasted against Han, to between them implicitly serve as measures of EHG affinity.

Obviously not as good as having the actual population as a column itself but maybe worth a try. If you want to put up a version of the datasheet like that, I'd run through and try and see if it generates sensible proportions.

Another couple of things:

1. Looking at the EHG-WHG shift in these stats, I'd estimate there is probably around 10-14% WHG like ancestry in Anatolia_Neolithic. Although that could've gone to them well before the Neolithic, so the stats could be capturing pretty well the amount of WHG ancestry from different parts of Europe during the Neolithic.

2. For the Yoruba stats, it looks like this graphed against the Han stat:

The Mozabite (Northwest African) has a clear bump in relatedness to Yoruba, while the Middle East not so much.

Even still, the East Africans are low as Eurasians, and even the Esan_Nigeria (another group from Nigeria) at the high end has a far lower stat than e.g. West Eurasians have with Dai.

This could be because of something in how the stats D(Chimp,Reference_pop/Test_pop)(Mbuti,Yoruba) behaves and maybe D(ChimpA,Reference_pop/Test_pop)(ChimpB,Yoruba) would work differently if we had two sets of Chimpanzee to use.

Using 4mix and Caucasus_HG, Esan_Nigeria, Masai_Kinyawa and Anatolia_Neolithic as pops, to model the Near East (assuming no major EHG, WHG to perturb this, which is an assumption but 4mix only allows 4):

It seems like the Mozabites are the only ones to get a clear West African related signal (presumably West African -> Mozabite, but stranger things have happened). While the Southern Middle East folk prefer Masai, which could reflect Arabia->East African or vice versa, or just be because there's no closely related to Masai D-stat in the analysis, to stop them taking it to push them away from the ancient and modern Eurasians. At least in these stats, it seems like, for e.g. BedouinB models better as 24 CHG, 13 Massai, 63 Anatolia Neolithic, than LBK_EN does as 100 Anatolia Neolithic.

Davidski said...


With Eastern_HG as a test pop.

Chad Rohlfsen said...

Here's a few, using my new Onge K9

Bell_Beaker_Germany = 23% Loschbour + 40% Anatolia_Neolithic + 25% Karelia_HG + 12% Kotias @ D = 0.0075
Corded_Ware_Germany = 20% Loschbour + 27% Anatolia_Neolithic + 34% Karelia_HG + 19% Kotias @ D = 0.0049
Yamnaya_Samara = 11% Loschbour + 10% Anatolia_Neolithic + 48% Karelia_HG + 31% Kotias @ D = 0.0094
Yamnaya_Kalmykia = 10% Loschbour + 11% Anatolia_Neolithic + 47% Karelia_HG + 32% Kotias @ D = 0.0051

Bell_Beaker_Germany = 19% Loschbour + 35% Anatolia_Neolithic + 6.99999999999999% Karelia_HG + 39% Yamnaya_Samara @ D = 0.0073
Corded_Ware_Germany = 13% Loschbour + 20% Anatolia_Neolithic + 5% Karelia_HG + 62% Yamnaya_Samara @ D = 0.005

Chad Rohlfsen said...

Thanks to Zeph, for the help.

Davidski said...

That's not bad, but the D-stats produce more precise results.

Corded Ware with the new sheet looks like this. Really impressive fits and correlation there.

Corded_Ware = 22% Anatolia_Neolithic + 0% Caucasus_HG + 7% Loschbour_WHG + 71% Yamnaya_Samara @ D = 0.0025

Corded_Ware = 26% Germany_MN + 1% Caucasus_HG + 3% Loschbour_WHG + 70% Yamnaya_Samara @ D = 0.0023

Chad Rohlfsen said...

Use these ones and see if the results are closer to mine. The other ones have a lot of SSA from deamination, which might be skewing them towards CHG and Yamnaya and from EHG.

Corded_Ware_Germany I0104
Corded_Ware_Germany I0049
Corded_Ware_Germany I0103
Corded_Ware_Germany I0106
Corded_Ware_Germany I1532
Corded_Ware_Germany I1540

Davidski said...

They look basically the same, probably because most of the Corded Ware samples are UDG treated, so they don't have much deamination.

Corded_Ware = 22% Anatolia_Neolithic + 0% Caucasus_HG + 8% Loschbour_WHG + 70% Yamnaya_Samara @ D = 0.0032

Corded_Ware = 27% Germany_MN + 0% Caucasus_HG + 3% Loschbour_WHG + 70% Yamnaya_Samara @ D = 0.0032

Seinundzeit said...


I have to agree, this sort of analysis seems to be very robust.

For example, I wanted to see fits for South Central Asians using recent South Central Asian + South Asian populations. Sindhis fit quite well like this:

63% GujaratiA + 30% Brahui + 7% Dravidian_India, D=0.0026

Makes complete sense, in terms of what we know about the history of Sindh, and the general ethno-social dynamics at play in Sindhi culture.

Pashtuns fit very well as:

39% Tajik_Shugnan + 24% Brahui + 23% Kalash + 14% Dravidian_India, D=0.0014

I guess we can't take this too literally. But, it does account for the integration of Dardic peoples in the Pashtun ethnogenesis (Kalash), long term/intensive genetic links with Balochistan (Brahui), the close relationship between Pashto and Pamiri languages (Shugnan), and a history of gene-flow with groups from greater India like Doms, Gujars, and Kasabghar (Dravidian_India, I used them since the aforementioned groups are usually from the lower castes of Punjab and beyond).

Seinundzeit said...

Also, the importance of Dravidian_India for Central Asia, West Asia, the Caucasus, and southeastern Europe is very fascinating (but also surprising/puzzling).

Central Asia:

Pamiri Tajik (Shugnan)
58% Andronovo_subset + 31% Dravidian_India + 8% Caucasus_HG + 3% Anatolia_Neolithic @ D = 0.0072

West Asia:

37% Dravidian_India + 35% Anatolia_Neolithic + 22% Caucasus_HG + 6% Andronovo_subset @ D = 0.0094

52% Anatolia_Neolithic + 23% Dravidian_India + 21% Caucasus_HG + 4% Ulchi @ D = 0.0091


43% Anatolia_Neolithic + 27% Dravidian_India +16% Andronovo_subset + 14% Caucasus_HG @ D = 0.0026


41% Andronovo_subset + 22% Anatolia_Neolithic + 21% Caucasus_HG + 16% Dravidian_India @ D = 0.0049

40% Andronovo_subset + 23% Anatolia_Neolithic + 21% Caucasus_HG + 16% Dravidian_India @ D = 0.0055

38% Caucasus_HG + 38% Anatolia_Neolithic + 13% Dravidian_India + 11% Andronovo_subset @ D = 0.0032


48% Anatolia_Neolithic + 41% Andronovo_subset + 8% Dravidian_India + 3% Caucasus_HG @ D = 0.004

Dravidian_India is a noticeable signal from the Balkans to Central Asia. And the pattern in the Caucasus is quite unique, with northern Caucasians having more than southern Caucasians.

By contrast, the South Asian signal disappears in the rest of Europe, including the southern portion.

For example:

83% Andronovo_subset + 17% Anatolia_Neolithic + 0% Dravidian_India + 0% Caucasus_HG @ D = 0.0129

77% Andronovo_subset + 23% Anatolia_Neolithic + 0% Dravidian_India + 0% Caucasus_HG @ D = 0.0152

66% Andronovo_subset + 34% Anatolia_Neolithic + 0% Dravidian_India + 0% Caucasus_HG @ D = 0.0095

72% Anatolia_Neolithic + 27% Andronovo_subset + 1% Dravidian_India + 0% Caucasus_HG @ D = 0.0106

This is something only South Asian aDNA can solve. Right now though, I think this is an indication that ASI isn’t really ENA. Rather, it might be a mix of a third Crown Eurasian lineage, Basal Eurasian, and ENA. This stream of ancestry obviously had a huge effect in West Asia and the Caucasus, as well as a noticeable effect on the portion of Europe which has recent genetic links with West Asia.

Onur said...


Dravidian_India absorbs the East Eurasian ancestry in some of the populations for which you provided stats. Case in point: the Turkish population shows 4% Ulchi despite having a few percent more total East Eurasian ancestry, so the rest of the East Eurasian ancestry of the Turkish population is absorbed by its Dravidian_India and this artificially inflates the level of Dravidian_India in the Turkish population (bear in mind that Dravidian_India includes both West Eurasian and East Eurasian genetic elements, so its excess includes some of the West Eurasian ancestry too).

Seinundzeit said...

Naturally, but the general pattern stands.

Looking at the stats, the effect you've described is rather negligible for all of these populations. Not to mention that the relative positioning of populations in relation to this component isn't explicable via ENA.

Also, Europeans with minor to substantial ENA still turn out 0% Dravidian_India. It only appears in Balkan populations, no other Europeans (with or without ENA). So, I think we are looking at a rather robust pattern.

Interestingly, Dravidian_India takes a lot away from CHG. This is something that can only be explained with the analysis of South Asian aDNA. I think that's what we really need right now, South Asian aDNA, and Upper Paleolithic/Mesolithic aDNA from the southern Near East.

Onur said...


Dravidian_India appears to absorb only certain kinds of West Eurasian and East Eurasian ancestries, the kinds that contributed significantly to the formation of South Asian genetics. So not all kinds of West or East Eurasian ancestries are absorbed by Dravidian_India. That is why East European populations with substantial East Eurasian ancestry can show no Dravidian_India.

I agree with you that more ancient DNA from West Asia and South Asia will help clarify the genetics of those regions better.

Chad Rohlfsen said...

Could take out Kotias and put in Karelia_HG? Kotias being 0% leads me to believe it'll get closer to mine.

Seinundzeit said...


For whatever it's worth, since this is based on d-stats, such confounds aren't much of an issue. If we were dealing with ADMIXTURE output and 4mix, this could be at play. But with this method, we are allowed much greater precision and accuracy.

Right now, I think Dravidian_India tracks a third Near Eastern component, distinct from both Anatolia_Neolithic and CHG, but much closer to CHG, with the possible inclusion of either ENA or an unidentified Crown Eurasian group (or maybe just more extra ANE/EHG in comparison to CHG).

But that's very speculative on my part. I guess we can only find out with some actual aDNA samples, which will be very exciting.

For the fun of it, Anatolia_Neolithic:

61% BedouinB + 39% Loschbour + 0% Dravidian_India + 0% CHG, D=0.0873

Not a great fit, but it makes one wonder about BedouinB.

Davidski said...


Corded_Ware = 22% Anatolia_Neolithic + 0% Eastern_HG + 7% Loschbour_WHG + 71% Yamnaya_Samara @ D = 0.0024

Corded_Ware = 27% Germany_MN + 0% Eastern_HG + 2% Loschbour_WHG + 71% Yamnaya_Samara @ D = 0.0021

Chad Rohlfsen said...

Hmm. Thanks!

Onur said...


I still think certain combinations of West and East Eurasian ancestries tend to inflate the Dravidian_India levels even without any contribution from South Asia. I will wait to see ancient DNA results from South Asia to see the real levels of South Asian-related ancestry in modern populations.

Krefter said...


To solve that issue we need South Asian outgroups. David can probably do that latter. A series of D-stats in Africa and East Asia and America and Oceania, will probably give us insights into those regions no one currently knows.

Davidski said...

Just posted this table in an update above...

Matt said...

@ Davidski: With Eastern_HG as a test pop.

Thanks. Here are quick PCA based on proportions, for the set of populations with a Han stat equal or lower than Karasuk_Subset and equal or higher than Mozabite:

Easy to read PCA:
PCA with full labels:

(The colour coded clusters were assigned by Past3's K-means function)

Proportions look OK; I'm not sure without an EHG stat it's quite finding exactly the right level of EHG-WHG -

I think you actually do need an EHG D-stat after all unfortunately to predict the EHG properly. To check, I ran a quick experiment with a regression equation predicted result for what the D(Chimp,EHG)(Mbuti,EHG) would be based on all the other stats (99% confidence), with a value of 0.4595. That allowed me to run EHG as an row and column at the same time. That gave fits that looked good again, but had the opposite result of what looked like too much WHG, probably because the regression prediction was too weak. I think there's no substitute for the direct stat.

Davidski said...

Thanks. I'm very happy for now that we have a method that actually works.

We'll probably soon see lots of ancient samples released, covering everything from the Upper Paleolithic to the Middle Ages, including some new EHG samples, so this will be easy.

I'll just keep adding test samples and outgroups as they appear online.

Alberto said...

I took a quick look at that other script that takes any number of populations. It goes through *all* the populations in the list to find the best match. But by just creating a source file with only the populations that you want to test as admixing ones, it seems to work. It's still quite more compute intensive, though.

Here are sample files for Armenian (taken from Davidski's latest Dstats file). The source file contains 8 populations, and the target file is the same as 4mix but comma separated too (instead of using tabs).

So it can be run like:
getMonte('source_armenian.txt', 'target_armenian.txt')

The output gives the distance and the percentage of each population:

distance = 0.1653

Anatolia_Neolithic 51.95
Caucasus_HG 27.10
Yamnaya_Samara 6.75
Nganasan 5.35
Dravidian_India 4.90
Masai_Kinyawa 2.70
Loschbour_WHG 1.25
Dai 0.00

huijbregts said...

@ Alberto
Some comments on my script 'nMonte':
1. It definitely is more time-consuming then 4mix.
The reason is that nMonte is a Monte Carlo simulation. The result has a statistical uncertainty.
If you terminate the simulation too soon, the result will have too much remaining uncertainty.
As nMonte in its present form is conceived as a general purpose replacement for 4mix,
it has got very ample run-time (better safe then sorry). Also the termination rule is hardly intelligent (stop after 1 million trials).
So when nMonte is used for special applications, it seems not impossible to improve on its run time.
2. In its present form nMonte expects that the datasheet is in percentages (57.0%), not in decimal format (0.57).
Therefore the output is in percentages and so is the 'distance'. So you should have read the distance as 0.153% = 0.00153.
3. One of the advantages of nMonte is that it is possible to have a look at all the populations, this does not take more run-time.
Most of them will be zero. But don't forget to remove the target population from the datasheet!

Davidski said...

Here's what I got for the Kalash using huijbregts' program. It was really quick. I ran it on an 8GB RAM laptop.

Dravidian_India 42.20
Afanasievo 32.55
Caucasus_HG 15.40
Anatolia_Neolithic 9.50
Nganasan 0.20
Mezhovskaya 0.15

distance% = 0.001979

Davidski said...

And after taking out some of the pops to hone in more realistically on the Indo-Iranian expansions into the Hindu Kush.

Dravidian_India 38.7
Andronovo_full 27.6
Caucasus_HG 20.6
Karasuk_subset 9.4
Anatolia_Neolithic 3.7

distance% = 0.003453


Alberto said...


Thank you for writing this script. It's very useful indeed. Our only requirement was to be able to choose the source populations (rather than just finding the best possible match from a while big list), so I wanted to warn the users of 4mix that they need to create a source list just with those specific populations. Once you know that, it becomes a great tool for us.

Yes, it's more compute intensive than 4mix, but certainly not something that any modern computer can't handle in less than a minute, so no worries there.

Alberto said...

Fascinating. I hope I'll get more time to play with it over the weekend. Now I just tested the effect of adding SSA and ENA to the basic 4 populations for Europeans. I used Spanish_Extremadura that has clear SSA admixture to test:

With 4 pops:
Distance: 0.017262
Anatolia_Neolithic 63.50
Loschbour_WHG 3.05
Caucasus_HG 15.05
Eastern_HG 18.40

With 5 pops:
Distance: 0.004354
Anatolia_Neolithic 59.60
Loschbour_WHG 10.10
Caucasus_HG 9.85
Eastern_HG 17.65
Esan_Nigeria 2.80

With 6 pops:
Distance: 0.000933
Anatolia_Neolithic 59.60
Loschbour_WHG 13.45
Caucasus_HG 10.10
Eastern_HG 11.10
Esan_Nigeria 2.60
Dai 3.15

So the score clearly improves and the proportions accommodate better. Dai is quite high, it would be interesting to know when and how it arrived there (if it's real). No extra CHG here (or few, since Yamnaya seems to score about 30% CHG and 15% Anatolia_Neolithic, while still 50% EHG).

Alberto said...

For the sake of experimenting, adding GujaratiD to the mix:

With 7 pops:
Distance: 0.0009
Anatolia_Neolithic 58,00
Loschbour_WHG 13.60
Caucasus_HG 7.25
Eastern_HG 9.00
Esan_Nigeria 2.00
Dai 0.00
GujaratiD 10.15

Slight improvement in score, and high GujaratiD at the expense mostly of "Yamnaya" (EHG + CG) and Dai. So adding Yamnaya too:

With 8 pops:
Distance: 0.000855
Anatolia_Neolithic 55.70
Loschbour_WHG 13.10
Caucasus_HG 2.60
Eastern_HG 0.85
Esan_Nigeria 2.10
Dai 0.40
GujaratiD 8.15
Yamnaya 17.10

Another small improvement. Yamnaya takes most of the EHG + CHG, GujaratiD still stays quite high. I tried adding Nganasan as a 9th pop, but it got 0.00% and the rest remained about the same.

FrankN said...

@Matt et al.:

1. Would it be possible to replace Massai by Mota in the SSA calculations, in order to eliminate effects of CA and later contact along the western Indian Ocean coast?

2. Looking at your PCA (with the "artificial" EHG), Component 1 forms an almost perfect differentiation between Anat_Neol and Afanasievo, which may be interpreted as "East Med" vs. "Steppe" differentiation. From this cline, pops are either bent away "north-" or "southwards), with the northwards-bent pops essentially covering non-Med Western Eurasia, and the southward-bent pops covering the Mediterranean, Near East, Caucasus and SCA.
The northward-pulling pole is WHG (Loschbour), which sits reasonably close to the y-Axis (Comp. 2) of the PCA. The southern pole is approximated by CHG that, howeverk, is located remote from the y-Axis with substantial "Steppe" pull. Looking at the "southern" pops close to the y-Axis (Azeri, Georgian, Iranian Jew etc.), I wonder whether UP aDNA from the southern Caspian refugium, or also the Persian Gulf, should it become available, wouldn't provide for better "southern" poles. I could well imagine such "Crown" (or basal?) Eurasian to be behind that Dravidian_India pattern described by Seinundzeit.

3. Alternatively, returning to our previous discussion, UI as possible proxy of the first, 100kya OOA migration is worthwhile further exploration. UI seems to provide basal genetic linkage between (South) East Asians and some Africans, including Mozabite, and you have previously shown it to be an opposite PCA pole to WHG (which, as Kostenki, may be understood as derived from the second, 50kya OOA migration).

Chad Rohlfsen said...


Could you provide a few runs of Beakers and Corded ware, including breaking steppe ancestry into EHG and CHG? Thanks!

Krefter said...


Mozabite and Moroccan are better African references than Nigeria. You should also put Cypriot in there. With David's spreadsheets, Spain needs Mozabite and Cypriot to get a good fit.

You can see fits I got for Spain here.


Once you added a CHG outgroup, CHG percentages went way down. So, I don't see how Georgians can be 50%+ CHG.

huijbregts said...

Could you provide a few runs of Beakers and Corded ware, including breaking steppe ancestry into EHG and CHG?
That is my question too.
As far as I understand, I can only do this if the datasheet has EHG and CHG as populations/rows and not as outgroups/columns.
So no, not with the present datasheet.

truth said...

The Mozabite or Cypriot levels seem inflated, because of the bidirectional effect, that is, mozabites themselves also have some farmer ancestry, which inflates the level. Same with the Selkup component, they seem to have a bit of European admixture, which inflates the levels of Selkup in many europeans. It's better to use more "basic" component that are not related to each other, ie. sub-saharans and near-easterns with as low SSA as possible.

FrankN said...

@Simon W:

Thanks for the corrections on Hittite and Hattic, and the extent of Proto-Villanovan influence in Italy. I am not too well informed on Mediterranean archeology, and wasn't yet aware of Proto-Viollanovan settlement having been found as far south as Apulia and Sicily.

The Urnfield - Proto-Villanova link seems archeologically credible. The Urnfield phenomenon is certainly complex and draws from several roots, a.o. the CE Tumulus Culture (post-Unetice), and cremation cultures that during the MBA had spread from the Southern Balkans through the Carpathian Basin. However, an early fusion of both seems to have taken place with the Tyrolean Laugen-Melaun Culture, and that culture signifies the likely entry point into Italy (not neccessarily one-way, Laugen-Melaun may equally have incorporated North Italian Terramare culture elements and trnasferred them north of the Alps into the Urnfield phenomenon; note also that Terramare was a late left-over of the pile-dwelling cultures that tended to dominate the whole Circum-Alpine region since the 5th mBC).
Strong Proto-Villanovan presence in Tuscany actually provides a plausible explanation for the substantial "Yamnaya" share there. Whatever the origin of Etrurians - early IE Anatolian immigrants, neolithic continuity, or some CA movements - none of these scenarios is likely to have included substantial Yamnaya-like DNA. Tuscany/ Etrurians, however, also demonstrates that even a quite substantial "Yamnaya-type" incursion didn't neccessarily lead to adapting Indo-European.

The problem of the Italic languages is that they are in general too diffenteriated to support a common (Proto-Villanovan) linguistic base by 1150 BC, especially when considering the geographic proximity of Latin and Umbrian that should have promoted convergence rather than differentiation. We are only talking 600-900 years here (first attestations in the 6th cBC, most texts from the 3rd cBC), and we have quite some modern reference how much IE languages can be expected to differentiate over such a short period (Low German vs. Dutch, Catalan vs. Langue d'Oc/ Provencal, Bulgarian-Serbian-Croatian, etc.).
Hence, my understanding is that most linguists accept a Proto-Villanovan overforming of some Italic languages, possibly including Osco-Umbrian, but tend to assume a more ancient origin. That original Indo-Europeanisation is thought to have originated from Illyria, possibly only reaching Southern and Central Italy, i.e. leaving out the Po plain and the Terramare culture there). Under such a scenario, EBA/ MBA Illyria must already have been speaking some kind of IE (proto-Illyrian), at a time when, as you said in reference to Dave's Polishgenes blog, their aDNA was still low on Steppe and CHG [btw, under such a scenario, i.e. Proto-Italic=Proto-Illyrian, the discussion to which extent Messapian is Illyrian-influenced becomes rather moot.]

FrankN said...

@Grey: Your reference to the Sea People is relevant here. First of all, it demonstrates that by 1200 BC, a number of major seabound migrations ocurred. We are only having a rudimentary idea of those migrations' extent, essentially restricted to the East Med, but AFAIK there is nothing to preclude similar movements also having occured in the West Med. To the opposite, after 1200 BC, Sardinia and Galicia display a remarkable uptake in copper mining and long-range maritime copper export up to Scandinavia, where they replace EastMed (Cyprus, Attica) copper sources that still prevailed during the MBA.

Secondly, there has been tentative reconstruction of the Sea People's ethnonyms as follows:
- Denyen = Danaians (Greeks)
- Ekwesh = Achaeans (Greeks)
- Lukka = Lycians
- Peleset = Philistes (Palestines), originally possibly from Palaeste (Palase) on the Albanian Riviera
- Shekelesh = Siculi (Sicily, Itlaic)
- Sherden = Sardinians
- Teresh = Tyrrhenians (Proto-Illyrians, c.f. Tirana), and/or Etrurians
- Tjeler = Teucrians (Greeks, later settling the Troad)
- Weshesh = unclear, maybe Oscans (Italic)

To the extent that reconstruction can be trusted, we would be talking primarily, though not exclusively (e.g. Sardes) IE-speaking people. Sekelesh seems to represent an attestation of the ethnonym Siculi that pre-dates Proto-Villanova (though it is no proof that the Sekelesh already spoke Siculian, or any other form of IE).

Krefter said...


Cypriot is in the same family as EEF. I doubt it has a lot of EEF ancestry though. Cypriot basically the same relationship to non-West Eurasians and European hunter gatherers as did AnatoloiaNeolithic. Although Cypriot lacks the close relationship Anatolia Neolithic has to EEFs.

For many Europeans their affinity to Yamnaya is too low to explain their lack of affinity to EEF. They'd need 40% of Yamnaya to explain their lack of affinity to EEF and they'd need 70% EEF to explain their lack of affinity to Yamnaya and non-West Eurasians. The Yamnaya+EEF forumla doesn't work. So, what they need is someone apart of the same Near Eastern family as EEF with less relation to EEF. Cypriot fits the bill perfectly, because it lacks the South Asian that Iranians have, the African SW Asians have, and has a less amount of CHG than Caucasus.

Then Mozabite is just there to explain minor North African ancestry in Iberia and Sicily. Mozabite isn't important to anyother South Europeans.

The exception to this rule are Basque and Bronze age Hungary. They have around as much Yamnaya as most South Europeans, but also have a higher affinity to EEF and WHG. They don't need Cypriot to get a good fit.

Alberto said...


Distance: 0.000923
Eastern_HG 35.80
Germany_MN 28.10
Caucasus_HG 23:10
Iberia_MN 11.90
Dai 0.55
Loschbour_WHG 0.55

Distance: 0.003795
Eastern_HG 25.15
Germany_MN 58.40
Caucasus_HG 15.15
Iberia_MN 0.00
Dai 1.00
Loschbour_WHG 0.30

Adding Yamnaya_Samara didn't make any difference for CW, it just takes 1% and same score (which is a good fit). BB doesn't improve the score either (not so good fit), but it does take 49% at the expense of EHG and CHG and some Germany_MN.

Distance: 0.003561
Eastern_HG 1.05
Germany_MN 49.35
Caucasus_HG 0.00
Iberia_MN 0.00
Dai 0.25
Loschbour_WHG 0.50
Yamnaya_Samara 49.35


Yes, Iberians didn't take West African admixture directly from West Africans. We can find better fits with Moroccans, with Cypriots, etc... But it depends what you want to test. Sometimes other options are preferable.

Matt said...

@ Davidski, btw when I said this:

" I ran a quick experiment with a regression equation predicted result for what the D(Chimp,EHG)(Mbuti,EHG) would be based on all the other stats (99% confidence), with a value of 0.4595. That allowed me to run EHG as an row and column at the same time. That gave fits that looked good again, but had the opposite result of what looked like too much WHG, probably because the regression prediction was too weak. I think there's no substitute for the direct stat."

I've had a look at the data I used for that, and I have to admit that was wrong. Due to a copying error.

There was actually minimal change whether I used the set of stats you included with EHG as a row but not a column, or whether I included it as a row and a column and using a regression predicted value to substiute into the EHG. WHG was quite low relative to EHG in both cases and the changes were only very slight. So things might not change even with more EHG samples.

Food for thought, hopefully not overcomplicating things, or getting overenthusiastic and going down a blind alley, it looks (although I can't be sure) like WHG being low when EHG is allowed to vary freely (rather than be pegged to Yamnaya?) is a function of a little bit of WHG proportion increasing relatedness to WHG by a lot. The value of the D(Chimp,Loschbour_WHG)(Mbuti,Iberia_Mesolithic) is 0.5085, while D(Chimp,Anatolia_Neolithic)(Mbuti,LBK_EN) is only 0.4281. Or Iberia_MN has 0.4192 with Mesolithic and only 0.4241 with Anatolia_Neolithic, despite it seeming like it gets most of its ancestry from Anatolia_Neolithic, and is increased from Anatolia_Neolithic having only 0.3857 with Iberia_Mesolithic. Comparably, Yamnaya_Samara has 0.4212 with EHG, even though something like half their ancestry come from there, and the value of D(Chimp,Karelia_HG)(Mbuti,Samara_HG) in some other stats you ran for me was only 0.4676 (although that would still be a pretty huge value comparing present day people).

And maybe if that's true (WHG being very closely related to each other) that has something to do with very low population sizes and strong isolation during the late Upper Paleolithic...

huijbregts said...

@ Alberto
How did you get a line for Eastern_HG in your source files?
In the datasheets I have seen Eastern_HG only as outgroup/column, not as a row name.

Davidski said...


Once you added a CHG outgroup, CHG percentages went way down. So, I don't see how Georgians can be 50%+ CHG.

Well, how much CHG are you seeing in Georgians here? And keep in mind that Yamnaya has a lot of CHG, and Anatolia Neolithic might have some too.

Georgian = 41% Anatolia_Neolithic + 48% Caucasus_HG + 0% Loschbour_WHG + 11% Yamnaya_Samara @ D = 0.0139

Alberto said...


Davidski provided this link above with EHG as a row too:

huijbregts said...

That is Calvinball, but thanks.

Krefter said...

When Georgians are modeled as Cypriot+CHG+Yamnaya this is what they get.

Georgian= 65% Cypriot+ 24% CHG+ 10% Yamnaya @ D=0.0055

Cypriot probably has CHG and Yamnaya does, so maybe CHG reaches 40% at the max in this test.

I don't think we should model any West Asians as Anatolia_Neolithic+(), until we do lots of D-stats in West Asia. Anatolia_Neolithic might have been completely exterminated in West Asia. So, we don't know who's the reference to represent non-CHG ancestry in West Asia, which is needed to know CHG ancestry percentages.

Armenia_BA has the same non-EEF/CHG signal the Middle East is dominated by today. So, there was big migrations from the South to the Caucasus by 2000 BC, that wasn't from EEF-types.

Davidski said...

Of course Cypriots have CHG. They're a modern population with complex ancestry and can't be used to estimate ancient ancestry proportions in nearby groups.

FrankN said...

@Alberto, Dave: Thx for the new calculations.
Clearly, most of the European genetic structure north of the Pyrenees/ Alps/ Carpathians/ Balkans was established during the CA, with either CWC, or BB Germany, a MN-enhanced version of CWC, capturing some 70% or more of todays genetic pattern there. Moreover, CWC is nearly identical to Srubnaya, and quite close to the Andronovo subset (which seems to include some Afansievo on top).

We may still trace some 13-15% of CWC/BB genes into Turkey (but shouldn't forget about IA Celtic migrations into Galatia and beyond), but latest with Cypriots, and, in extrapolation, most likely also Sicilians etc, that relation breaks. IOW, the Mediterranean demography is determined by very different patterns that are still quite obscure. The same applies to the spread of IE in the Mediterranean, which hardly can be linked to Yamnaya, CWC or BB_Germany DNA. I had been thinking about a CWC->Hittite->Anatolian->Greek/Illyrian/Italic scenario, but your calcs, Alberto, provide it with anysthing but genetic support.

As food for thought, have a look at the following article:

It discusses the spread of bronze-making out of the Aegean, ultimately Levante, towards 3-4 "peripheries", namely (i) eastern and (ii) western Balkans, (ii) Italy, and (iv) Iberia (discussed only briefly). Peripheries is also understood in a soxio-political sense, i.e. as the elites taking over cultural (also linguistic?) patterns from the "centre". That spread is contrasted to the BB phenomenon emerging out of Iberia, including an interesting discussion about something he calls "Eastern Bell Beakers", a syncretistic culture developing between the W. Baltics and Moravia, with influence reaching as far as Finland, Moldova, Italy, and even back into the Aegean.
In this interplay of East Med bronze-making and BB, he sees Yamnaya, Corded Ware and Catacomb sidelined, either marginalised (CWC in Central Europe) or displaced eastwards (Yamnaya/ Catacomb).

I am not yet sure what to make out of all this in terms of the IE question. There is a convincing Indo-Germanic trajectory through the steppe, starting somewhere between Lower Vistula and Middle Volga. And there are a lot of good genetic and archeological arguments for a CA/EBA Anatolian-Aegaean-Italic-Iberian link. But I am having a hard time combining both into one convincing story about the genesis and spread of IE. For CHG being that link there are, according to my taste, too many CHG-heavy populations speaking non-IE languages.

Davidski said...

By the way, Krefter, there's another issue that needs to be considered. Kotias and Satsurblia aren't identical. Kotias might actually have some Anatolia-related ancestry.

So a reference population made up of Kotias and Satsurblia might be underestimating Kotias-related, in other words late CHG, ancestry in modern Caucasians.

Davidski said...


Try this sheet with Karelia_HG as a test pop and Samara_HG as an outgroup. And try Hungary_HG instead of Loschbour. You should see more sensible results.

Rob said...

@ Frank N

Some interesting thoughts.
For Italic, have you thought about the Terramare or Appenine culture, which are considerably earlier, and are associated with a population drop in the Po regions, and possible movement south ? Also, there are trans-Adriatic connections to consider, although the latter might be only pertinent to Messapic.

Ultimately, I see strong correlation between a form of more dispersed, decentralised hillfort society around the proto-Urban proto-Etruscan one, whose mobility and pastoralism could be linked with IE groups spreading through Italy. Who knows.

As for "I had been thinking about a CWC->Hittite->Anatolian->Greek/Illyrian/Italic scenario" I'd have found that impossible even without Alberto's calculations, as it makes no sense culturally. The problem is, there is no evidence for steppe invasions into Anatolia, to account for Hittite. Quite obviously, the genesis of Anatolian, and perhaps Balkan IE cannot be account for by the yYamnaya model, unless aDNA shows this. But with the continued lack of aDNA from those regions, pertaining to a relevant period, and an obvious aversion to properly analyse the Kumtepe 6 sample, will see no leads any time soon, but it will come eventually.

Rob said...

Oh, and there's no such entity as "Indo-Germanic"

Davidski said...

Okay, I'm gonna go ahead and edit this post so that the first set of results aren't taken as definitive.

The old input/output will still be available here...

FrankN said...

To add to the IE question, here a run over what I over the last months have been able to find out about CWC::

-SE Baltic CWC concentrates on the coast, as sedentary farmers. HGs/ Aquatic foragers are displaced somewhat inland, with a quite distinct culture (Comb Ceramic) that emerged from the Pit-Comb Ware.

-Scandinavia/ W. Baltic: Battle Axe Culture as regional CWC variant. Similar to SE Baltics, Battle Axe sets forth earlier FB sedentary farming traditions, and contrasts with Pitted Ware (SHG revival, another Pit Comb wre offspring).

-Poland: While Kujawian lowlands remain in FB/post-Lengyel (MN) hands, CWC appears in the upland, with a typical grave good assembly including bows, i.e. HG equipment.

-Northern Saxony (Riesa County): Only sparse MN settlement, restricted to the Elbe valley north of Riesa (next settlement chamber around Dresden 45 km SE). CWC finds still comparatively sparse, but distributed more geographically widespread, including the uplands that lack traces of EN/MN occupation. Typical grave goods include axes and chisels. i.e. wood processing tools.

-Elbe-Havel (N of Magdeburg - Frankfurt/Oder) No significant CWC influence, continuity of local FB/ GAC groups.

-Elbe-Saale: CWC generally merging into the area's traditional multicultural pattern. Cemeteries, often in use since the EN (Karsdorf, Esperstedt, Quedlinburg) have CWC graves appearing next to FB, GAC, and BB burials. Some indication of new, separate CWC settlement in the Thuringian Basin.

-Franconia: CWC displays cultural continuity to MN - small hamlets, agricultural focus, community graveyards (e.g. Bergrheinfeld, RISE446, 35 graves)

-Netherlands/ Rhineland( Lower Saxony:: Strong continuity with preceding MN cultures (MC/ FB/ Wartberg), to the extent that finds could only be associated to MN or CWC based on AMS dating. Many Rhinish settlement finds - small houses (~20 m²), similar to contemporary dwellings in Franconia, and Wessex.

- Upper Rhine: Apparent settlement hiatus, so far unexplained.

-Switzerland: Continued use of lakeshore pile dwellings, but some break with preceding Horgen culture. Dense settlement indicates predominantly sedentary lifestyle.

What does that leave us with:
1. CWC in general seems to have merged into pre-existing cultural complexes, especially of the MC-FB tradition (including the “beaker” rite of communal drinking set forth by BB).
2. There is little evidence for CWC having promoted a nomadic-pastoralist lifestyle. Where HGs contrast to sedentary farmers (SE Baltic, Sweden), CWC represents the farming side. Yet, utilization of uplands (Poland, Saxony, German low mountain ranges) increases, as do cattle bones in Swiss CWC settlements.

3. Its hard to construct a CWC package. Kurgans? Not in Franconia, Switzerland, the Single Grave culture. Grave goods? Often (battle) axes, but not so in Poland. Flexed,, gender differentiated burials? Not in Saxony.. Even the eponymous Corded Ware is often lacking from Battle Axe burials.

It is tempting to qualify CWC as a phantom, but aDNA clearly attests immigration. There are signs of de-population, obvious on the Upper Rhine, but also very small "central settlements", e.g. in Franconia and on the Rhine in contrast to MC “mega-sites”. The newcomers seem to have rather benefitted from, than directly caused the depopulation, and tended to assimilate into pre-existing cultural contexts.
LBK and Rössen longhouses give way to small huts (20-30 m²), a tendency that already started with FB. An egalitarian society, with a lot of communal investment (stone-paved village roads, wells, wooden pathways through swampland), a “beaker” drinking rite designed to enhance that community spirit,, and little social differentiation, except for a bit of amber ornaments here and there. This doesn’t look like a scenario from which to expect language shift, clearly not one of the “elite dominance” type. It rather feels very Puritan - though, those Puritans on the Mayflower…

Grey said...


"I had been thinking about a CWC->Hittite->Anatolian->Greek/Illyrian/Italic scenario"

Unless it has changed since IIRC Davidski's stats linked Yamnaya with Afanevso (sp?) and Corded Ware with Sintashta which implied (to me anyway) there was an initial Yamnaya stage which was the catalyst for and eventually got over run on by the later Corded Ware stage.

So just for the sake of this post labeling Yamnaya as PIE and Corded Ware as IE.

Then if the regions around the Black Sea were initially full of CHG farmers maybe the sequence was something like.

1) PIE develops adjacent to the CHG farmers and expands into the steppe displacing other steppe HGs but isn't yet strong enough to displace any of their southern farmer neighbors.

1a) maybe some peaceful expansion of PIE copper workers / horse traders along trade routes

2) PIE gets stronger and starts raiding / hassling the Black Sea farmers enough for some of them to decide to bug out -> wave of CHG farmers with CHG languages moving south including the eastern med looking for some nice safe mountains to hide in

3) PIE eventually over run the weakened remnant of the Black Sea farmers creating a hybrid society of CHG farmers with a PIE elite (and languages)

4) Corded Ware IE displace the hybrid PIE/CHG populations who escape south into the east med like the previous wave


alternatively wave 3 doesn't happen as a wave - the Black Sea farmers just head for the nearest hills until the remainder on the flat land are weak enough to be taken over by the PIE and it's the resulting hybrid PIE/CHG culture that gets displaced by Corded Ware later (with the displaced PIE/CHG becoming the sea peoples?)


no doubt a bunch wrong with this but the idea of successive waves as the result of a domino effect is plausible imo

Rob said...


Nice summary (as always)

What about the subsequent period - the BA proper. Unetice, the Carpathian metallurgical centres, etc. Ie the rise of true chiefdoms, inequality and power relations ?

capra internetensis said...


If you had a scenario where a new group with a common origin and shared language and a compelling ritual/religious system settled over a wide area among Balkanized, quarreling previous groups*,even if a minority, it might be to the advantage of all concerned to assimilate to the newcomers rather than the other way around. Mere conjecture, I know little of the actual cultures under discussion.

*Assumed to be Balkanized and quarreling because that is the ethnographic norm for Neolithic farmers, to put it mildly.

Davidski said...


Is there a way to modify the nMonte script so that the results are arranged so that they can be copy pasted quickly into a spreadsheet directly from the screen?

Davidski said...

By the way, those of you running nMonte, try 10000 cycles. You can do this by opening the R file as a text file and changing the 1000 to 10000.

It produces much cleaner results for me now, and it's still pretty fast.

huijbregts said...

You mean setting a field separator to '\t'?
I will do it, but you know that reformatting an output is often more complicated than you expect.
As to the 10000 cycles, this indicates that the simulation has not sufficiently converged. So the runtime is not as ample as I thought. nMonte needs to burn a lot of computer cycles.

Davidski said...

Basically I'd like to get the output like this, or horizontally, but in the same way, if possible.

Anatolia_Neolithic 0.00
Caucasus_HG 8.60
Dai 0.00
Esan_Nigeria 0.05
Loschbour_WHG 0.00
Masai_Kinyawa 0.00
Ulchi 2.95
Yamnaya_Samara 88.40

Davidski said...

Actually, scratch that idea with the 10000 cycles; it doesn't improve the runs.

FrankN said...

@Capra: "If you had a scenario where a new group .."
You mean, like Afanasevo, with their package of sheep, wheat and arsenic bronze, who were instrumental to the emergence of the North Chinese Longshan culture after the preceding Yangshao culture's demographic collapse? Yup, they clearly made Chinese speaking IE! Whether they were also behind the appearance of cord-decorated ceramics in the Chinese plain is yet unclear, though.

OK, now seriously - the pattern isn't fully comparable. Afanasevo brought a number of innovations to China, Yamnaya not so. They came to an area that already posessed carts, domesticated horses, copper etc., and that had a quite well developed agriculture. Speculatively, Yamnaya could have had advanced cattle-breeding and dairying systems, but, with the oldest cheese-making evidence stemming from EN Poland, dairying in the CE plain shouldn't have been that underdeveloped either.

"Compelling ritual/ religuous system" - like what? "We all drink from the same beaker" was popularised by Funnelbeakers, and taken over by GAC, then CWC, finally BB. There isn't any evidence for Yamnaya having propagated viticulture, nor hemp growing, so the brew would most likely have been from hazelnut, crab apple, or barley, possibly enriched with some poppy seed and wild-growing herbs. Nothing new, good old LBK and/or Ertebolle practice.

"Common origin and shared language", since Caucasians always lived next to Karelians? I start to believe that when you show me aDNA from, say, 4500 BC, which demonstrate a similar CHG-EHG mix as Yamnaya.
In fact, we don't have a solid pre-CWC baseline, and as such don't even know whether it was formed by Yamnaya people, or several, independent sources (e.g. East Baltic EHG, plus CHG from CT). We only have Bernburg-Culture mtDNA, with 30% WHG/SHG/EHG (U5a/b), and, for NW Germany, Blätterhöhle with more than 50% WHG, which both point at substantial HG shares in Funnelbeaker. The spread of Nordic Megalithism out of Denmark/ Holstein into N Germany and Poland indicates corresponding population movement between 3500 and 3300 BC. This was a movement of people with common origin, language and ideology that overformed existing MN cultures.

Whether those cultures were "balkanized and quarreling", stands to question. There was conflict, ending violently, between
Baden groups expending out of the Carpathian Basin, Michelsberg groups expanding out of the Paris Basin and the Rhineland, and GAC expanding out of Kujawia. The latter two joined forces with FB, and together drove back Baden, i.e. destroyed the Baden-derived Salzmünde Culture (Esperstedt_MN) around 3100 BC. By around the same tiome, GAC also aborbed CT. Thus, a hypothetical Yamnaya incursion would have to have dealt with a coalition of three expansionist cultures, all possibly united by a much hiher WHG/SHG role than in MN groups further south. In fact, it makes a lot more sense to interprete CWC as continuation of that HG revival, or HG turning farming trend, than as a new phenomenon.

That isn't saying that these processes didn't also effect language change. They most likely did. But who contributed which element in this fusion of MN, WHG/SHG/EHG, and CHG, is difficult to determine.

huijbregts said...

I have saved a script temp_nMonte.R in my dropbox
Is this what you need? If so, I will will rename temp_nMOnte.R to nMonte.R

Rob said...


The changing arid / ameriolate cycles are well described in the steppe & Balkans between 5000-2000 BC

Have you come across anything for central and Northern Europe ?

Matt said...

@ Davidski, would it be possible to do a version of the "EHG as row not column datasheet" with BedouinB as a column stat, and not a row (or ideally, splitting the BedouinB into two subsets so you can have one of both)?

I'd be interested to see if this model can handle the recent Near Eastern / Middle Eastern admixture, and I think that's best tested with an explicit stat BedouinB.

(Also I want to see whether the models like this - - with BedouinB, ENA, WHG and EHG, get better or worse).

Davidski said...


Thanks, that's very useful.


Krefter said...


What Middle Eastern outgroups do you think would be best, to express Middle Eastern diversity? Making all Middle Easterners 80%+ Cypriot isn't realistic. We know Middle Easterners have non-EEF and non-CHG, but heavily Basal Eurasian, ancestry, and so I'm asking what outgroups do you think are best to deceiver what this ancestry is.


Cypriot is a better ENF/Basal Eurasian proxy than Bedouin, because Cypriot has little or no African ancestry. When I took out EEF outgroups, all West Eurasians fit perfectly as Cypriot+HG/African/South Asian.

Davidski said...

Can't think of one.

You can try and make a proxy, until we get some ancient genomes from the Middle East.

Krefter said...

I'm just looking for trends. If we didn't have EEF genomes and did the same experiment for Europe, we'd find an EEF-trend via Sardinians, North Italians, and Iberians. ADMIXTURE did this before ancient DNA. I think D-stats of the form D(Chimp, West Asian)(Mbuti, West Asian), where West Asian=Bedouin, Lezgin, Cypriot, etc(not asking for you to do a test yet) can do this.

Learning more about Middle Eastern diversity can help learn what Basal Eurasian+UHG are, and therefore learn about Europe to. Whatever, I'm in no hurry. I might wait before I can do D-stats myself.

Davidski said...

You can work it out with 4mix/nMonte by finding a good three- or four-way fit for BedouinB including Anatolia Neolithic, Caucasus HG and one or two African populations.

Then use the Anatolia Neolithic/Caucasus HG proportions to guide you.

Whoever lived in the Middle East before the Sub-Saharan admixture happened here, they were very similar to Anatolia Neolithic, Caucasus HG and BedouinB, so this shouldn't be too difficult.

Alberto said...

Yes, we need some ancient Near Eastern that is more basal than Anatolia Neolithic. For Asian populations it's not possible to model them well without using a modern Near Eastern.

Looking at the Kalash, the behaviour is interesting. Starting from David's model above:

"Dravidian_India" 39.1
"Andronovo_full" 29.7
"Caucasus_HG" 20.4
"Karasuk_subset" 7.7
"Anatolia_Neolithic" 3.1
distance = 0.003806

Adding Karelia_HG improves considerably, but takes all the Andronovo:

"Dravidian_India" 41.9
"Caucasus_HG" 26.6
"Karelia_HG" 14
"Anatolia_Neolithic" 13.2
"Karasuk_subset" 4.3
"Andronovo_full" 0
distance = 0.002514

So it's ANE that it needs. Changing Selkup for Andronovo helps a bit, but not much, it just takes the Karasuk away too:

"Dravidian_India" 41.7
"Caucasus_HG" 27.2
"Karelia_HG" 14.7
"Anatolia_Neolithic" 14.2
"Selkup" 2.2
"Karasuk_subset" 0
distance = 0.002407

To really get a significantly better fit, you need to add a modern Near Eastern population. Iranian_Jew works best, but just to avoid neighbours, here with Lebanese:

"Lebanese" 31.2
"Caucasus_HG" 26.2
"Selkup" 19.2
"Dravidian_India" 12.5
"Karelia_HG" 10.9
"Anatolia_Neolithic" 0
"Karasuk_subset" 0
distance = 0.001213

Big improvement, and it does away with Anatolia Neolithic (while Dravidian goes much down and Selkup quite up). Lebanese provides some amount of SSA that Dravidian was providing before too, but it still needs a bit more. Adding Esan_Nigeria rounds things up:

"Lebanese" 33.4
"Caucasus_HG" 28
"Selkup" 26.9
"Karelia_HG" 9.8
"Dravidian_India" 1.2
"Esan_Nigeria" 0.7
distance = 0.000914

Which basically does away with Dravidian, and Selkup continues to go up to provide enough ENA and ANE. You can even just remove Dravidian and get the same (a tiny bit better):

"Lebanese" 33.5
"Caucasus_HG" 28.2
"Selkup" 27.7
"Karelia_HG" 9.8
"Esan_Nigeria" 0.8
distance = 0.000904

Maybe this is because without any ASI outgroup there's no way to differentiate enough ASI from other ENA, so Selkup just provides the needed ENA while providing enough ANE. But still, that would mean that with an Onge-like ASI ancient sample and a better ANE sample than MA1, we could remove the Selkup, split in some 60/40 (that would put Kalash at some 17% ENA/ASI).

But the interesting thing is that they need this more basal Near Eastern, a good amount of ANE and a bit of SSA to have good fits, and once they have that they reject any WHG-rich population. Which seems to agree with ANE_K8.

capra internetensis said...


Thanks for the info!

Chad Rohlfsen said...

I have the Onge. I'll see about joining in on this soon.

Alberto said...

For completeness after seeing David's post, I tried with Anatolia Neolithic instead of Lebanese, but adding Esan too:

"Caucasus_HG" 35.3
"Selkup" 31.3
"Anatolia_Neolithic" 18.9
"Karelia_HG" 11.3
"Esan_Nigeria" 3.2
distance = 0.001404

So that works quite decently too. Maybe we just need something a bit more basal than Anatolia Neolithic and the rest is just SSA admixture.


Yes, having Onge for S-C Asians will clearly help (so Dravidian and Onge could be used one as column, other as row). Have you tried in qpAdm adding SSA to models of S-C Asians? Here it seems required to get good fits, but in qpAdm I've never seen it (maybe because it's not possible because of them being used in the outgroups?).

FrankN said...

@capra: In my previous reply to you, I gave in to my sarcastic mood. Hope you didn't misunderstand that. You asked thought-provoking, and as such valuable and legitimate questions. I appreciate that, but probably didn't signal such appreciation particularly well (if at all).

I also still owe you thanks for earlier correcting me on the issue of African rice (a discussion to be continued elsewhere, when African genetics/ population dynamics are more in the foreground).

capra internetensis said...


No problem, I didn't suppose you meant any offence by it, having come across stronger than intended many times myself.

I hope we do have African results to discuss soon, there have been some good recent papers on South Africa, but West and Central Africa are certainly due for more work (a truckload of Y chromosome full sequences would be a good start).

batman said...


What happened to your original post on the 4-way stat, as of March 1st? Have your blog been hacked?

batman said...
This comment has been removed by the author.