search this blog

Monday, May 11, 2015

4mix: four-way mixture modeling in R

Thanks to Eurogenes project member DESEUK1. A zip file with the R script, instructions and a couple of data sheets is available here.

So let's model Poles as a bunch of ancient genomes from Central and Eastern Europe using output from my K8 analysis.

Copy & Paste: source('4mix.r')


Copy & Paste: getMix('K8avg.csv', 'target.txt', 'HungaryGamba_EN', 'HungaryGamba_HG', 'Karelia_HG', 'Corded_Ware_LN')


After a few seconds you should see the results...

Target = 19% HungaryGamba_EN + 14% HungaryGamba_HG + 2% Karelia_HG + 65% Corded_Ware_LN @ D = 0.0062

Obviously the script can use ancestry proportions and/or population averages from any test, provided they're formatted properly. The accuracy of the modeling will depend on the quality of the input.

Update 19/05/2015: A new version of the 4mix script that can run multiple targets is available here, courtesy of Open Genomes.


1 – 200 of 204   Newer›   Newest»
Gökhan said...

Thx lot David.
Here my scores.

54% HungaryGamba_EN + 46% Corded_Ware_LN @ D = 0.5223

Actualy HungaryGamba_EN has meaning but Corded ware didnt make any sense on me.

Davidski said...


Why are you using Corded Ware? Why not something more relevant?

Gökhan said...

I think i misunderstood the subject. Yeah you are right. I wish we had some ancient DNA from Near East.

Gökhan said...

yeah that makes some sense now.

Target = 60% Assyrian + 0% Cypriot + 40% Georgian + 0% Armenian @ D = 0.0575

Seinundzeit said...

This is actually pretty cool. I tried to imitate one of the successful fits from qpAdm, and found something quite similar, using my results:

Target = 34% Yemen + 33% MA1 + 23% Yamnaya + 10% Dai @ D = 0.3681

Terrible fit, but quite similar to what one finds with qpAdm. In fact, the combined MA1+Yamnaya score is almost identical to what one qpAdm fit showed with regard to my total Yamnaya-related ancestry. Also, the qpAdm fit had me at 9% Dai, and this has me at 10%, so it's basically the same result.

The best fit so far is this:

Target = 92% Pathan + 7% Kyrgyz + 1% Druze + 0% Lezgin @ D = 0.0785

This isn't too bad either:

Target = 64% Afghan_Pashtun + 13% Kalash + 23% Punjabi + 0% Pathan @ D = 0.0798

Finally, this fit is cool:

Target = 42% Tajik_Pomiri + 36% Punjabi + 19% Iranian + 3% Kyrgyz @ D = 0.1047

Helgenes50 said...

Thanks David for this new tool,

[1] Target = 43% HungaryGamba_EN + 19% HungaryGamba_HG + 0% Karelia_HG + 38% Corded_Ware_LN @ D = 0.0089

[1] Target = 21% HungaryGamba_EN + 0% HungaryGamba_HG + 0% Karelia_HG + 79% Bell_Beaker_LN @ D = 0.0102

[1] Target = 41% Stuttgart + 19% HungaryGamba_HG + 1% Karelia_HG + 39% Corded_Ware_LN @ D = 0.0091

[1] Target = 43% HungaryGamba_EN + 19% Loschbour + 0% Karelia_HG + 38% Corded_Ware_LN @ D = 0.01

Like in K13, the same percentage

[1] Target = 22% Basque + 0% English + 78% English + 0% Norwegian @ D = 0.0196


[1] Target = 25% French_South + 31% English + 44% English + 0% Norwegian @ D = 0.0146

Now, my Best fits as Norman

[1] Target = 77% French + 0% English + 0% English + 23% Norwegian @ D = 0.0192
[1] Target = 20% French + 52% French + 5% French + 23% Norwegian @ D = 0.0192
[1] Target = 78% French + 0% English + 0% English + 22% Icelandic @ D = 0.0187

I remember that you got( in Detective...)last year from 18% to 21 % of Scandinavian
That’s very close

PersonaMan said...

I followed the instructions but i just get this message:

Error in `[.data.frame`(tVec, , i) : undefined columns selected

Anyone know how to solve that?

PersonaMan said...

Nevermind got it working, i'm a numpty and forgot to save it as a tab delimited file. xD

It's interesting how it much, much prefers having Corded Ware in addition to Bell Beaker rather than the two alone. It doesn't like just having Bell Beaker and it really doesn't like just having Corded Ware. Also seems to favour Esperstedt_MN over HungaryGamba_EN as well. Don't really know how it all works as such so perhaps this is meaningless waffle, but the lowest score i've managed to get so far 0.0064 which is either:

1) Target = 34% HungaryGamba_EN + 15% HungaryGamba_HG + 15% Bell_Beaker_LN + 36% Corded_Ware_LN @ D = 0.0064

2) Target = 27% Bell_Beaker_LN + 32% Corded_Ware_LN + 8.00000000000001% HungaryGamba_HG + 33% Esperstedt_MN @ D = 0.0064

Unknown said...

The closest could get on Yamnaya.

Yamnaya = 19% Kalash + 49% Samara_HG + 32% Lezgin = 0.1362

Yamnaya = 15% Kalash + 85% Corded_Ware_LN = 0.1089

Gill said...


first try with HRP0393 Haryana Jatt:

31% Pulliyar + 43% Balochi + 26% Yamnaya + 0% Abkhasian @ D = 0.0529


35% Hakkipikki + 37% Balochi + 28% Yamnaya + 0% Abkhasian @ D = 0.0329

Closest so far:
0% Hakkipikki + 22% Balochi + 28% Yamnaya + 50% GujaratiD @ D = 0.0067

Gill said...

Could someone write a script that used a limited number of populations in an input file and went through all permutations to find the least genetic distance?

Unknown said...

Corded Ware = 6% Lezgin + 52% Yamnaya + 42% Lithuanian = 0.0116

Highest I could find( if anyone can find better). Fun. :P

Krefter said...

What type of program should the 4-mix file be opened with? I opened it with notepad and it didn't work.

Krefter said...

This is the same idea as ORacle just you pick only 4 pops to be who test is a mixture of, right?

Alberto said...


You need a program called R:

Alberto said...

Lowest I found so far for Lithuanian (without using any modern European population, obviously):

Target = 8% Spain_MN + 31% Georgian + 2% Turkmen + 59% HungaryGamba_HG @ D = 0.0035

Davidski said...

Yep, this looks like the closest for Yamnaya, when not using Corded Ware.

Yamnaya = 19% Kalash + 49% Samara_HG + 32% Lezgin = 0.1362

There seems to be a cline that runs from Corded Ware Germany via Samara Yamnaya to the modern Hindu Kush.

That's probably not a coincidence.

Alberto said...

It can be improved a bit by adding HungaryGamba_HG:

Yamnaya = 35% Lezgin + 11% HungaryGamba_HG + 20% Kalash + 34% Samara_HG @ D = 0.1349

It seems like the Georgian-like population had quite a bit of WHG (more than Lezgins).

Matt said...

So, if this works like I think it does, taking Alberto's example:

Lithuanian average is 63% European, 19% Central Asian, 16% EEF.

Spain MN is 33% European, 66% EEF.
Georgian 5% Euro, 59% CA, 35% EEF
Turkmen 11% European, 33% CA, 19% EEF
Hungary_Gamba_HG is 100% European.

So combine those like 8%+31%+2%+59% and you get:

European=(33*0.08+5*0.31+11*0.02+100*0.59)=2.64+1.55+0.22+59=63.41, pretty close to Lithuanian average level
and so on across the rest of the components.

Seems like combining three almost pure populations of EEF, European and Central Asian would pretty much work best for West Eurasians, as the script would basically just set the levels of each to what the population average is in the K9 spreadsheet (even if they're not historically plausible contributors). Georgian is the closest "pure" Central Asian component population in the list, but I'd guess it has too much EEF to work well as a contributor sometimes though which is why Kalash is better in some ways in the Yamnaya mixes. Some other populations can work well when they're in the right places though, I guess.

Alberto said...

I guess this might work better with K8 than with K9. For example:

Basque = 3% Pathan + 7% Loschbour + 75% Spain_MN + 15% HungaryGamba_HG @ D = 0.0034

Because Basque don't get any teal, they result in a mix that would only give them 1% ANE, which is too low.

Davidski said...

I'll put together a new K8 sheet later today with as many ancient samples as possible.

If it works as well as I think it will, then it'll be a lot of fun.

Unknown said...

MA1 = 22% Quechua + 34% Loschbour + 17% Kalash + 27% GujaratiA = 0.0239

Unknown said...

Interesting results chaos.
But for example , how are Basques 75% mid neolithic when they're ~ 85% R1b ?

Unknown said...

I've got a K12 where Basques are about 65% EN, 10% WHG, 25% Yamnaya. I'm trying to perfect it for all Eurasians. It may take until I get to 13-14 component, but it will look nice.

Unknown said...

(That was meant to say *chaps*, not chaos).

Davidski said...

Modeling modern samples as ancient genomes seems to work better with the K8 sheet.

But I've only run a few tests so far.

Krefter said...

"Interesting results chaos.
But for example , how are Basques 75% mid neolithic when they're ~ 85% R1b ?"

Founder effect. 2 random R1 men(R1a-Z283, R1b-L11) who lived ~6,000YBP< represent almost 50% of west and northeast European paternal lineages. Most paternal lineages from that time period are extinct or very very rare. There are other clear founder effects from that time period within I1a-DF29 and I2a2a-M223. Also, as R1b-L11 moved in west Europe it kept having regional founder effects(R1b-L21, R1b-DF27, R1b-U152, R1b-U106).

You don't see the same trend at all with mtDNA. One ~5,000YBP maternal lineage in a region of Europe will probably represent at most a few percent of the maternal lineages.

Krefter said...

Can someone run Tuscans, using Yamna, Esperstedt_MN, Hungary_HG, and Cyproit as ancestors?

I had earlier used the ANE K8 PCA to get all possible European(Yamna-MN or Stuttgart)+West Asian ancestral proportions. I picked a random North Italian, who with Spanish_MN got in every possibility 70%> European(~55-60% MN, ~13-17% Yamna)+~30% West Asian(always on the lower ANE side, Caucasus doesn't work).

Krefter said...

Actually Spain_MN would works to. Just not one of the MNs who score over 45% WHG, because I doubt Italian ones did(Otzi, Sardinians don't). And you can try out a variety of West Asians.

Davidski said...

Using the K8 sheet.

Tuscan = 11% Yamnaya + 26% Esperstedt_MN + 8% HungaryGamba_HG + 55% Cypriot @ D = 0.0103

Tuscan = 18% Yamnaya + 49% Stuttgart + 5.99% HungaryGamba_HG + 27% Cypriot @ D = 0.0085

Krefter said...

Here's another one.

Test=German and Dutch
Ancestor1=Unetice or Urnfield.
Ancestor2=Unetice or Urnfield.
Ancestor3=Unetice or Urnfield.

NL4's ANE K8 results are in the spreadhseet.

We have signs of genetic continuation of a North sea-type pop in Germany during the Bronze age, and I think some modern Germans are still very North-sea-like and some admixed with a Near eastern-shifted pop, not just EEF-shifted like Tuscans(Romans?).

Davidski said...

I don't understand what model you want exactly based on what you wrote, but anyway this is a good fit for the Dutch average.

Dutch = 62% Unetice_EBA + 8% Halberstadt_LBA + 9% HungaryGamba_IA + 21% Tuscan @ D = 0.0016

How come you can't run the script?

Seinundzeit said...

I like the K8 sheet results, very cool.

Target = 64% Iranian + 26% Yamnaya + 10% Dai @ D = 0.1651


Target = 69% Burusho + 17% Iranian + 14% Lezgin @ D = 0.0307

Seinundzeit said...

Okay, this is the best I've gotten yet:

Target = 85% Pathan + 11% Afghan_Uzbek + 4% Iranian @ D = 0.0172

Davidski said...


Check out the new instructions above.

Krefter said...

Thanks alot! I can do it myself now.

Alberto said...

Yes, with K8 I get a more reasonable result for Basques:

Spanish_Pais_Vasco = 16% Alberstedt_LN + 49% Spain_MN + 2% Karelia_HG + 33% HungaryGamba_BA @ D = 0.0046

Alberto said...

Or a bit better:

Spanish_Pais_Vasco = 3% Alberstedt_LN + 57% Spain_MN + 4% Armenian + 36% Unetice_EBA @ D = 0.0041

Krefter said...

Here's a pretty good fit for French.

French=10% Esperstedt_MN + 34% Tuscan + 50% Unetice_EBA + 5.99999999999999% Unetice_EBA @ D = 0.0041

It's a better fit than making them a mixture of an "Eastern" Bronze age pop and a Middle Neolithic pop and WHG. see..

French=9% Yamnaya + 2% HungaryGamba_HG + 66% Esperstedt_MN + 23% Yamnaya @ D = 0.0152

French=19% Corded_Ware_LN + 0% HungaryGamba_HG + 54% Esperstedt_MN + 27% Corded_Ware_LN @ D = 0.0099

French=34% Bell_Beaker_LN + 0% HungaryGamba_HG + 39% Esperstedt_MN + 27% Bell_Beaker_LN @ D = 0.0303

French=5% Unetice_EBA + 0% HungaryGamba_HG + 46% Esperstedt_MN + 49% Unetice_EBA @ D = 0.0529

Matt said...

Alberto: Because Basque don't get any teal, they result in a mix that would only give them 1% ANE, which is too low.

Good point, the EuroHG cluster is probably the main representative of mediation of "ANE" into Basques in K9, but since WHG populations get 100% EuroHG (it fails to distinguish well between SHG, WHG and EHG ancestry and you get a basically SHG cluster), you get this strange result in this mixture algorithm seems to imply low levels of ANE when populations are combined and this inevitably includes a fair amount of the WHG populations. Probably most extreme for the Basque population.

Krefter said...

Notice French especially don't fit as a mixture of a North Sea-type Bronze age pop(German Unetice, Bell Beaker) and Esperstedt_MN.

This is why I don't think Southern-shifted North Euros like: West Germans, French, and South Dutch, aren't southern shifted only because of excess EEF ancestry. There's probably East Mediterranean ancestry from Roman times. That's why I chose Tuscans as a southern source.

Davidski said...

I've added HungaryGamba_CA (the Copper Age sample) to the K8 sheet. It might be a useful addition for modeling Yamnaya-related incursions into East Central Europe.

Polish = 4% HungaryGamba_HG + 16% HungaryGamba_CA + 23% Yamnaya + 57% Unetice_EBA @ D = 0.0027

Unknown said...

Cool I'll have a play too
Krefter; which ancient pop are modern day Tuscans meant to proxy for ??

Krefter said...


Romans, who else would it be? I know the Roman empire was international, but I'm sure in the military and government people from Rome itself and surrounding areas were the majority, at least in its early days.

Germans, Dutch, and French quite clearly have some East Mediterranean ancestry. The same is even more obvious for Spanish, but their European side is more EEF.

Helgenes50 said...

@ Krefter

In France the History is not similar everywhere
You are certainly right for Southern France, or for Along the Rhone and the Rhin.

But the history in NW France is probably different.
Our East_Med is certainly arrived with the LBKs

The Tuscan could be a Neolithic origin that we have in common with Italians , via maybe the LBKs
But that! before the Roman Empire

Unknown said...

Ah yep. Of course. But then I think that might be an overestimation. I don't think the roman colonisation of Gaul was that extensive ?

Interesting Poles are so high in Uneticians.
What do you suspect it represents ?

Davidski said...

"What do you suspect it represents?"

The beginning of the formation of the modern East Central European gene pool.

truth said...


It's not "Roman ancestry". The early neolithic samples in Central-Europe already have plenty of East-Med (the LBK samples average around 30%, the Hungarian NE1 also. ). But there was also another wave of farmers, not just EEF, coming from the Caucasus that brought also east-med with them, besides of West-Asian (which EEF lacked). And it's not the Yamnaya cos they don't have East-Med.

Unknown said...

could somebody do the calculations it for the Irish thanks in advance

Alberto said...

Testing now Lithuanians with K8 (again, without using modern European populations), some observations:

- If I use Alberstedt_LN I can get closer:

Lithuanian = 10% Corded_Ware_LN + 5% Yamnaya + 60% Alberstedt_LN + 25% Motala12 @ D = 0.0026

But if I restrict to using Yamnaya and CW or older, it prefers Caucasus pops:

Lithuanian = 36% Georgian_Imer + 0% Yamnaya + 2% Lezgin + 62% Motala12 @ D = 0.006

Lithuanian = 36% Georgian_Imer + 2% Corded_Ware_LN + 1% Lezgin + 61% Motala12 @ D = 0.0059

Taking a look at what is the best fit for Alberstedt_LN, it's not too different:

Alberstedt_LN = 43% Georgian_Imer + 10% LBK_EN + 1% Spain_MN + 46% Motala12 @ D = 0.0071

Alberstedt_LN = 38% Georgian_Imer + 13% LBK_EN + 7.00000000000001% Corded_Ware_LN + 42% Motala12 @ D = 0.0071

So if we could use more than 4 pops, Lithuanians would take small parts of CW and Yamnaya, but they're still mostly Motala + Georgian.

Helgenes50 said...



In Northern France the East_med is certainly related to LBKs and not to a Roman Ancestry

Unknown said...


If the high proportion of Unetice in Poles "
The beginning of the formation of the modern East Central European gene pool", then does the the higher presence of Unetician c.f. CWC suggest that the formation of Central European gene pool was more completely formulated by the mid Bronze age rather than EN-EBA .?

Alberto said...


This is the best fit I've found so far for French:

French = 29% HungaryGamba_BA + 16% LBK_EN + 46% Unetice_EBA + 9% Turkish @ D = 0.0018

I guess that 9% Turkish could be the extra NE you're referring to. But it works quite worse with other populations, like Armenians, Georgians, Lebanese,... So it seems that the bit of East Eurasian helps there too.

Davidski said...


Modern populations generally produce lower scores because they're less homogeneous and also less unusual than ancient samples.

I already know Lithuanians are largely of Corded Ware origin via multiple lines of other evidence, so now I'm using this tool to see how much Corded Ware ancestry they might have.


I think the modern European gene pool was pretty much set by the Iron Age, but it's hard to be specific in regards to each ethnic group.

Alberto said...


"Modern populations generally produce lower scores because they're less homogeneous and also less unusual than ancient samples."

I don't use modern European samples, and actually with ancient samples we get the best results:

Lithuanian = 33% Alberstedt_LN + 21% Unetice_EBA + 25% Corded_Ware_LN + 21% Motala12 @ D = 0.0017

Polish = 83% Alberstedt_LN + 1.99999999999999% Corded_Ware_LN + 2% Yamnaya + 13% Motala12 @ D = 0.002

But for the Caucasus-like population we don't have aDNA, so Georgians seem to be a decent proxy.

Alberstedt_LN can be modelled equally good (though not too good) as low CW or high CW:

Alberstedt_LN = 38% Georgian_Imer + 13% LBK_EN + 7.00000000000001% Corded_Ware_LN + 42% Motala12 @ D = 0.0071

Alberstedt_LN = 2% Armenian + 28% LBK_EN + 59% Corded_Ware_LN + 11% Motala12 @ D = 0.0072

And this is just an oracle, so I don't take the results too literally. But all the results can be tested with qpAdm for example, or formal stats, and get a better understanding. It would be interesting, for example, to see if Unetice and Alberstedt have still high EHG affinity as Yamnaya and CW or if they have more WHG/SHG affinity.

EHG WHG Unetice_EBA Yoruba
SHG WHG Unetice_EBA Yoruba
EHG WHG Alberstedt_LN Yoruba
SHG WHG Alberstedt_LN Yoruba

We know that modern populations have higher WHG/SHG affinity, but we don't know exactly when and why this shift happened.

Unknown said...

You don't have a WHG source, so you get inflated Motala and decreased steppe ancestry. As far as those latter numbers go.

Alberto said...

If I choose the usual suspects I get high CW, but the result is worse:

Lithuanian = 0% Yamnaya + 18% Loschbour + 74% Corded_Ware_LN + 8% Esperstedt_MN @ D = 0.0077

Tobus said...


Karelia_HG Loschbour Unetice_EBA Yoruba -0.0023 -0.51
Samara_HG Loschbour Unetice_EBA Yoruba -0.0025 -0.548
Motala_HG Loschbour Unetice_EBA Yoruba 0.0115 3.417
Karelia_HG Loschbour Alberstedt_LN Yoruba 0.0065 0.983
Samara_HG Loschbour Alberstedt_LN Yoruba 0.0076 1.037
Motala_HG Loschbour Alberstedt_LN Yoruba 0.0243 4.921

Karelia_HG LaBrana1 Unetice_EBA Yoruba 0.0133 2.773
Samara_HG LaBrana1 Unetice_EBA Yoruba 0.0109 2.235
Motala_HG LaBrana1 Unetice_EBA Yoruba 0.0256 7.242
Karelia_HG LaBrana1 Alberstedt_LN Yoruba 0.0139 2.04
Samara_HG LaBrana1 Alberstedt_LN Yoruba 0.018 2.383
Motala_HG LaBrana1 Alberstedt_LN Yoruba 0.0301 5.565

Unknown said...

Hinxton4 = 50% Unetice_EBA + 36% HungaryGamba_IA + 8% HungaryGamba_BA + 6% West_Scottish = 0.0047

ah, just noticed the K8. Hinxton4 works out well, on the lowest score could find.

Davidski said...


Mixing populations from very different time periods, and including populations from more recent periods, produces the best fits, but isn't informative.

The most informative tests are those that include plausible ancient source populations which we know had little or no contact with each other before the admixture event that we're testing.

So yeah, if you get a low score with the "usual suspects" then you're onto something.

On the other hand, you can't really test for, say, Corded Ware and Unetice admixture in modern populations at the same time, because it's very likely that Unetice was in large part of Corded Ware origin.

Unknown said...

I'm really thinking there's something to that East Eurasian in myself and others on here. Over and over, on the Ktests I'm running, I repeatedly get Turk/Ataic stuff in Western Europe. Whereas Siberian is basically left alone. It's as if IR1 types didn't make an impact, but nomads incorporated in Roman ranks did. It's a possibility and I will keep digging. I'd say set up a test involving the Yakut and French. See if that is better than Nganasan. Maybe the four used could be French South, WHG, Yamnaya, Yakut.

Unknown said...

Or, use an EN sample with some steppe like stuff, from Hungary.

Krefter said...

"Lithuanian = 33% Alberstedt_LN + 21% Unetice_EBA + 25% Corded_Ware_LN + 21% Motala12 @ D = 0.0017

Polish = 83% Alberstedt_LN + 1.99999999999999% Corded_Ware_LN + 2% Yamnaya + 13% Motala12 @ D = 0.002"

"Lithuanian = 10% Corded_Ware_LN + 5% Yamnaya + 60% Alberstedt_LN + 25% Motala12 @ D = 0.0026"

Hunter gatherer survival in East Europe, which was the first proposal for why WHG is high there, appears to be correct.

Now it make sense that an originally West European-centered form of ancestry is highest in the East.

Also, looking at Iberian results I think there's alot of actual Neolithic Iberian blood(not just mainland Euro leftovers), maybe over 50%.

Chad said...


Have you read this?

Alberto said...


Thanks for those tests. Very interesting that they agree with the high Motala affinity showed in these other less scientific tests I was making.


Sure, these are tests that can show some hints, no need to take them literally. They need to pass other tests and filters. But I'm not just using random combinations, anyway.

For example, the Motala guys seem not only to have been invited to the party, but they actually might have played important roles. From Yamnaya to CW and then to other LNBA it's mostly adding Motala-like admixture (and bits of MN). Whether from West Yamnaya or from the Baltic, or both.


Yes, some bits of East Eurasian might be also at play. For French I got the best score when adding Turkish to the mix. And while this other result is quite random (probably meaningless), I found it curious enough to share. Best match by far for Basques:

Spanish_Pais_Vasco = 43% HungaryGamba_IA + 10% Loschbour + 1% Bedouin + 46% LBK_EN @ D = 0.0013

Helgenes50 said...

The best way I found, as Norman, to use the 4Mix
I try to understand my differences with the French.

As Norman, I am less neolithic
That is confirmed in these results, with the modern populations and the ancient genomes

Norman = 15% French + 82% French + 3% French + 0% Bedouin @ D = 0.048
Norman= 15% French + 82% French + 3% French + 0% LBK_EN @ D = 0.048
Norman = 15% French + 82% French + 3% French + 0% Spain_EN @ D = 0.048
Norman = 15% French + 82% French + 3% French + 0% Tuscan @ D = 0.048
Norman = 15% French + 82% French + 3% French + 0% Spanish_Cantabria @ D = 0.048
Norman = 15% French + 82% French + 3% French + 0% Sardinian @ D = 0.048

And now my Nordic and my IE Ancestry via my German or Scandinavian Ancestors
Once more, the results speak for themselves
( I live in a region settled by the Danish, English, Batavian,Saxons...)

First, the ancient genomes

Norman= 8% French + 75% French + 2.99999999999999% French + 14% Corded_Ware_LN @ D = 0.0405
Norman = 26% French + 63% French + 5% French + 5.99999999999999% Yamnaya @ D = 0.0441

and now with the modern populations

Norman = 0% French + 58% French + 4.99999999999999% French + 37% Norwegian @ D = 0.0198
Norman = 3% French + 29% French + 35% French + 33% Swedish @ D = 0.0171
Norman = 0% French + 11% French + 29% French + 60% Dutch @ D = 0.0204
Norman = 0% French + 2% French + 34% French + 64% SW_English @ D = 0.0167
Norman = 0% French + 4% French + 34% French + 62% SE_English @ D = 0.0172

Krefter said...


Davidski already basically said what i mean to now in a previous post. In summary, the best fit isn't what we're going for. The best fit of believable mixtures is what we're after.

Alberto said...


And I agree. But what is believable is not the same as having a fixed preconception and discard whatever doesn't match it. Just a few years ago it was not believable that R1b could have come to WE in the Bronze Age, now it looks more than likely.

We just need to keep an open mind and look at possibilities that are plausible, especially when the data points in that direction.

I actually always try to go for the option that makes more sense with the data available and argument my reasoning the best I can.

Unknown said...


If these stats which (repeatedly) suggest Motala like ancestry are true, then my initial impression of CWC as a largely autonomous european plain development, albeit with some steppe admixture, remains possible

Unknown said...

Alberto and mike,

The Motala isn't real. The program is compensating for him not having a Middle Neolithic pop. He doesn't have a WHG source, so Motala is the best fit. Trust me, when I make the Motalas a component, they're not really anywhere.

Simon_W said...

I don't have any K8 or K9 data for my grandparents and my father, so I made a K15 sheet. This should work nicely as well, the Eurogenes K15 really isn't a bad calculator.

My paternal grandmother is from the Low German speaking part of East Prussia, from the surrounds of modern-day Braniewo in Warmia, close to the Baltic sea. There is no ancient DNA of the local substrate population, the Old (Baltic) Prussians, but Lithuanians should work as a substitute, and I also use Poles, because undoubtedly there was some Polish admixture as well, recognizable in surnames. The more interesting question will be: Where did the medieval German settlers come from? The local dialect suggests a core area in northernmost Germany, from the mouth of the Weser into Holstein. But eastern Holstein was also colonial area, with Frisians, Westphalians and Dutch settlers being involved, besides the Holstens. Historical sources also suggest that some of the settlers in Warmia were from the Lower Rhine and from Holland.

The best approximation I could find for my grandmother uses the Hinxton Anglo-Saxons:

27% Hinxton2 + 42% Hinxton5 + 12% Polish + 19% Lithuanian @ D = 4.0359

Modern northern Germans, or any other Germans, don't fit well:

4% North_German + 69% North_Dutch + 0% West_German + 27% Lithuanian @ D = 5.6485
4% North_German + 69% North_Dutch + 0% South_Dutch + 27% Lithuanian @ D = 5.6485
4% North_German + 69% North_Dutch + 0% East_German + 27% Lithuanian @ D = 5.6485

Using only modern populations, the Northern Dutch and the Danes are a better fit:

0% North_German + 71% North_Dutch + 6% Polish + 23% Lithuanian @ D = 5.6414
0% North_Dutch + 74% Danish + 0% Polish + 26% Lithuanian @ D = 5.0612

But still, the best are clearly the Hinxton Anglo-Saxons. Which may suggest that the population of northern Germany has changed from Anglo-Saxon-like to what it's like now. An alternative explanation may be that northernmost Germans are still Anglo-Saxon like and the north German sample is just from a larger area of northern Germany.

Simon_W said...

This would suggest an Old Prussian and Polish admixture of 31%, taken together. Also interesting, that's considerable, but not as high as some have suggested (some thought it was closer to 50%). On the other hand there certainly were local differences, with northeastern East Prussia near present-day Kaliningrad (former Königsberg) having much stronger Old Prussian ancestry.

Davidski said...

The modeling showing minimal Corded Ware admixture across northern Europe is based on faulty methodology.

The only way it's possible to achieve these results is to use samples with lots of Corded Ware/Yamnaya admixture in the first place, like Alberstedt_LN, Unetice_EBA or Bell_Beaker_LN, and/or to mix and match with modern samples, which are more mixed and therefore produce better fits.

And even then the post-Neolithic shift in uniparental markers in Europe speaks for itself.

Come on guys, this is really sad to watch. It's already happened, so what's the point of all of this whining?

Helgenes50 said...
This comment has been removed by the author.
Simon_W said...

My maternal grandmother's ancestry is 50% northern Swabian near Stuttgart, the other 50% from southern Swabia south of the Danube and from northwestern Switzerland near Basel.

The best fit for her:
39% French + 13% Austrian + 29% Hinxton2 + 19% Italian_Abruzzo @ D = 7.426

I read this as follows:
The local Celts were somewhere inbetween the French and the Austrians. Hinxton2 represents the Germanic influence. And Italian Abruzzo is most likely ancient Roman input. Close to Basel there was the important Roman colony Augusta Raurica, also known as Colonia Augusta Rauricorum. And southern Swabia south of the Danube was also quite a firm part of the Roman empire, with the major city Augusta Vindelicorum nearby. This would suggest that the Roman influence on the whole was rather Abruzzo-like than Tuscan-like. I think this makes sense; the Tuscans are more light-pigmented than other central Italians, they are atypical.

I also tried to replace the Italian_Abruzzo with Ashkenazi, but it's a worse fit.

Simon_W said...

My father's paternal (phased) half is best approximated by quite a similar result:

50% Hinxton4 + 36% North_German + 14% Italian_Abruzzo @ D = 6.7756

My paternal grandfather's ancestry was from near Basel, Switzerland, from both sides of the Rhine, thus from southwesternmost Germany and from northwestern Switzerland near the Rhine. Hence it's no surprise that his result is similar to my maternal grandmother's result. He seems to be a little more Germanic and less Roman. What's puzzling is that the Celtic component is much better approximated with Hinxton4, the Celtic Briton, than with the French and Austrians. But this is in line with my father's FTDNA "MyOrigins" analysis, where he has lots of the Northwest European component and 0% of the central European component. In fact, my grandfather and his father indeed looked very Northwest European, the grandfather more the Atlantic Basque-like type, the great-grandfather more like an English gentleman.

Alberto said...

@Chad, David

Did you look at the D-Stats posted by Tobus? Did you see look at Haak's own figures? Or David's own D-Stats the other day about Lithuanians?

Why do you find it so strange that some Motala-like HG have contributed genes to modern Europeans? It's not only reasonable to think so, there is data suggesting it quite clearly, and we have ancient samples that existed in Sweden!

Simon_W said...

I also tried to determine what my maternal grandfather is like by analyzing my own phased maternal half. According to Eurogenes K15 oracle, my maternal grandmother is best approximated as 87% South_Dutch + 13% Ossetic. Therefore I selected these two populations to represent my grandmother's influence.

Using only three populations (by selecting an outgroup that yields 0%), my maternal half is this:

34% South_Dutch + 2% Ossetian + 64% North_Italian @ D = 6.2957

Indeed, my paternal grandfather's parents were from Northern Italy (province of Forli-Cesena).

But interestingly I obtain the best results when adding Algerians or Mozabites to the blend:

45% South_Dutch + 2% Ossetian + 39% North_Italian + 14% Algerian @ D = 5.1584
48% South_Dutch + 2% Ossetian + 33% North_Italian + 17% Mozabite_Berber @ D = 4.572

To some extent this is because of noise. My maternal half (according to K15) has 3.76% Sub-Saharan. But my entire DNA just scores 0.59% Sub-Saharan. According to the new K6 I'm 0.97% Sub-Saharan. This is close to a significant amount of 1%. So I don't think this North African-like input is nothing more than noise, but it may be inflated. Judging from IBD sharing with my maternal grandmother, I have about 24.6% from her and 25.4% from my grandfather. This means my maternal half cannot be 64% North Italian as the first approximation suggested. But adding the Algerian or Mozabite input I can get my grandmother's influence closer to 50% on my maternal half, which is closer to the truth. For what it's worth, my grandfather had some resemblance with the Algerian footballer Samir Nasri, when he was young:

I also found the San Marinese footballer Manuel Battistini to have a strangely exotic look, like Near Eastern or North African admixed:

Unknown said...

Alberto, it's because all Europeans have EHG, but not enough to be closer to EHG than Motala. If you look at the PCA, Northern Europeans fall under Motala because of EHG. If it were Motala ancestry then we would cluster between Loschbour, with no one more ANE like than Motala. This isn't the case. Most of us have about as much EHG/ANE as Motala, so they can't be a primary source. My K runs have Norwegians as the highest in Motala, at 3%, which makes sense.

Simon_W said...

The best approximation for German Bell Beakers I found, using K8 data:

Bell_Beaker_LN = 47% Corded_Ware_LN + 5% La_Brana-1 + 19% Spain_MN + 29% Alberstedt_LN @ D = 0.0056

Did anyone find a better approximation?

Simon_W said...

Using anything from the Carpathian Basin or from northern Europe makes it just worse.

Simon_W said...

It's possible to replace Alberstedt_LN with Yamnaya, this results in an equally good fit. The problem is just that it's not possible that there was a Samara Yamnaya-like wave into Germany which left no genetic trace in Bronze Age Hungary, and no archeological trace in Poland.

Simon_W said...

Using French Basques enables better fits. But Basques are modern, so it's dubious that they can be used to explain Bell Beaker people.

Alberto said...


No one is saying it's the primary or only source of ANE. But if your reasoning is based on K8 numbers, this is a simple algorithm based on K8:

Corded_Ware_LN = 0% Loschbour + 28% Esperstedt_MN + 65% Yamnaya + 6.99999999999999% Motala12 @ D = 0.0122

Unetice_EBA = 0% Loschbour + 46% Esperstedt_MN + 32% Yamnaya + 22% Motala12 @ D = 0.0193

Lithuanian = 1% Loschbour + 33% Esperstedt_MN + 38% Yamnaya + 28% Motala12 @ D = 0.0119

If you go by D-Stats, they show the same. Stronger EHG in Yamnaya, quite decresed, but still strong in CW, stronger to Motala in Unetice, equally close to Loschbour and Motala in Lithuanians.

You might want to try with qpAdm the best fits using those sources to see if they agree or not.

We lack aDNA from the Baltic/East Europe and from Western Yamnaya, so we can't know for sure. But there are good hints pointing to it. Why this denial of a real and reasonable possibility?

Alberto said...

Though to be honest, the fun part starts once you remove Loschbour that stays unused, and add Georgian:

Corded_Ware_LN = 30% Georgian_Imer + 1% Esperstedt_MN + 36% Yamnaya + 33% Motala12 @ D = 0.0065

Unetice_EBA = 34% Georgian_Imer + 15% Esperstedt_MN + 0% Yamnaya + 51% Motala12 @ D = 0.0135

Lithuanian = 34% Georgian_Imer + 3% Esperstedt_MN + 4% Yamnaya + 59% Motala12 @ D = 0.0064

Big change and big improvement. Is it cheating? Yes, probably adding a modern population is cheating. But we don't have an ancient sample for that population, and then again, giving Yamnaya as the only option for Caucasus-like admixture in Europe is cheating too. The truth might be somewhere in between.

Krefter said...

Here's the ANE K8 spreadsheet for oracle from this January. It's easy to add results of new Haak genomes. You can even delete every single pop except for 1.

It can be more useful than 4mix, because you don't have to try a million possibilities to see which one makes most sense.

Admix4 is downloadable here.

I noticed in January Lithuanians fit best as Caucasus+Motala, but didn't think much of it.

Simon_W said...

@ Krefter

Well, the problem is just that this oracle works only with steps of 25%. I think an alternative to trying millions of possibilities is to proceed strategically, i.e. to think first what might make sense considering history and archeology, and then to try systematically.

Simon_W said...

This is a couple of K8 approximations to my own DNA:

15% Unetice_EBA + 8% Lithuanian + 62% HungaryGamba_BA + 15% Cypriot @ D = 0.0072

10% Corded_Ware_LN + 2% Samaritan + 80% HungaryGamba_BA + 8% Cypriot @ D = 0.0072

10% Corded_Ware_LN + 1% Ashkenazi + 79% HungaryGamba_BA + 10% Cypriot @ D = 0.0072

10% Corded_Ware_LN + 3% Lebanese_Christian + 80% HungaryGamba_BA + 7% Cypriot @ D = 0.0072

10% Corded_Ware_LN + 2% Sephardic_Jewish + 79% HungaryGamba_BA + 9% Cypriot @ D = 0.0072

9% Corded_Ware_LN + 2% Lithuanian + 78% HungaryGamba_BA + 11% Cypriot @ D = 0.0071

So, according to K8 I don't have any noteworthy North African ancestry. Just some slight Samaritan, Jewish or Lebanese Christian admixture, undoubtedly from my Italian grandfather.

But what strikes me most is the very strong influence from Bronze Age Hungary, about 80%! This is much more than I could have inherited from my Italian ancestors alone. As you might know, 50% of my ancestry is from southern Germany and the German speaking part of Switzerland. Add to this 25% of North Italian ancestry.

Very strong influence from Bronze Age Hungary in both southern Germany/Switzerland and Italy - isn't this a hint to the origin of the Italo-Celts?

Archeologically it makes sense, both Tumulus and Urnfield culture may have originated somewhere in the Carpathian Basin.

Krefter said...


"oracle works only with steps of 25%"

Yeah, I just realized that. But it's still a nice tool to have.

"I think an alternative to trying millions of possibilities is to proceed strategically, i.e. to think first what might make sense considering history and archeology"

I agree. Based on experiments like that...

It looks like there's a lot of native Neolithic blood in Iberia. My guess is the same is true for Italy) and the Balkans. And I think there's a way to confirm with 4mix and Oracle they have LN/BA ancestry. I'll post later with evidence.

Basque in particular maybe something like 70% Neolithic SW European, with no Near eastern or North African.

We can't assume all of Iberians Euro_MN beyond what's in LN/BA and West Asians/North Africans is from Iberia, but it's safe to say it comes from far western Europe.

a said...

@36K many parts of Europe were under ice.

@36K is near the IJ M429 split; most likely in East of present day Europe.


Krefter said...


"Very strong influence from Bronze Age Hungary in both southern Germany/Switzerland and Italy - isn't this a hint to the origin of the Italo-Celts?"

All Central-South Euros score high in "HungaryGamba_BA" because he was more Neolithic farmer admixed than LN/BA Germans.

I would suggest using 1 LN/Ba reference for each run. German Bell Beaker and Unetice are better references for your LN/BA ancestry.

Simon_W said...

@ Krefter

Yes, I tried it a lot with Bell Beaker and Unetice, but with HungaryGamba_BA I get lower distances. Also, strangely, with K8 input I get Bell Beaker scores of 0 - 4%, Unetice seems to work much better.

Simon_W said...

But then, with K9 data it's completely different, according to K9 I'm almost 3/4 Bell Beaker, with the rest being EEF and North African:

14% HungaryGamba_EN + 1% Cypriot + 13% Algerian + 72% Bell_Beaker_LN @ D = 0.1416

That's more in line with what I would have expected. EEF + North African input would mostly be from my Italian grandfather.

PersonaMan said...

Two best fits i found for myself so far:

Target = 27% HungaryGamba_BA + 16% Esperstedt_MN + 38% Bell_Beaker_LN + 19% Corded_Ware_LN @ D = 0.006

Target = 19% Spain_MN + 24% HungaryGamba_BA + 36% Bell_Beaker_LN + 21% Corded_Ware_LN @ D = 0.006

Chad said...

The biggest issue to begin with, is your use of Esperstedt. Esperstedt is only about 17-19% more WHG than Stuttgart, where Gok2 is 33% WHG. That is in a place with solidly verifiable Neolithic presence. Lithuanian is a gray area. We aren't sure what they looked like, but there is a decent chance that they were even more WHG than that. That would almost totally eliminate any need for Motala. It works with what you give it, plain and simple. Without the right reference, you will be misled into something that is most likely incorrect. That is where logic needs to play in heavily.

Chad said...

Lithuanian = 34% Gokhem_MN + 17% Loschbour + 43% Yamnaya + 5.99999999999999% Karelia_HG @ D = 0.019

capra internetensis said...

@ a

Did you post in the wrong thread or something?

Anyway, IJ split is almost certainly much earlier than 36 kya (Karmin says 45 kya, Y Full says 43 kya). And why would it be in Eastern Europe anyway? I mean it could be, but why not West Asia?

Krefter said...

I'm about ready to nuke Murcia Spain. Based on the assumption they're Spain_MN+LN/BA+Middle East+maybe WHG I have gone through over 100 tests exhausting almost every Middle Eastern pop and every LN/BA. This is really frustrating.

The bests fits I can get so far are... Most other scores are 0.03(very bad fit).

42% Saudi + 31% Unetice_EBA + 13% Spain_MN + 14% Loschbour @ D = 0.0038

41% Saudi + 29% Alberstedt_LN + 13% Spain_MN + 17% Loschbour @ D = 0.0046

40% Saudi + 28% Halberstadt_LBA + 18% Spain_MN + 14% Loschbour @ D = 0.0048

The same set up but with Bedouin and Palestinian instead of Saudi and is around 0.006-7.

When Caucasus pops take the place of LN/BA pops the best fits are...

39% Saudi + 26% Georgian_Laz + 1% Spain_MN + 34% Loschbour @ D = 0.0089

39% Saudi + 16% Tabassaran + 21% Spain_MN + 24% Loschbour @ D = 0.0093

More with Saudi, and Bedouin in the same position.

Then the next best fits are with NorthWest Africans in the place of Saudi and Bedouin.

Loschbour is scoring too high in all of these, it's unrealistic(Spain had been colonized by farmers for thousands of years). I think this is because Loschbour tunes out the high ENF in Near Easterns.

Spain_Murcia scores 2% Sub Saharan and so I think this part of the reason Middle Easterns with 5% or more Sub Saharan are included in its best fit.

Krefter said...

My goal here was to find who gave Spanish_Murcia most of their ANE, LN/BA or Caucasus pops.

It's pretty clear LN/BA probably is, even though few fits worked well. This is because when Caucasus pops take its place Spanish_Murcia is fitted as Loschbour+West Asian. The are some rare exceptions though where Spain_MN takes the place of Loschbour to raise WHG.

Alberto said...


With K8 I was not finding good fits for Spanish. Best ones were something like Spain_EN + HungaryGamba_BA + Bedouin. Then using HungaryGamba_IA things improved considerably, but I think it doesn't make a lot of sense to use that sample.

With K9 the best I found is:

Spanish = 55% Spain_MN + 20% HungaryGamba_HG + 24% Syrian + 1% Kalash @ D = 0.0035

Combinations with R1b pops like Yamnaya or Bell Beaker are quite bad.

Alberto said...


"It works with what you give it, plain and simple."


"Without the right reference, you will be misled into something that is most likely incorrect."

Yes, but it gives a score of how correct or incorrect it is.

"The biggest issue to begin with, is your use of Esperstedt. Esperstedt is only about 17-19% more WHG than Stuttgart, where Gok2 is 33% WHG."

But I included Loschbour to compensate for any lack of WHG. It didn't take it. But let's try again:

Lithuanian = 0% Loschbour + 36% Gokhem_MN + 39% Yamnaya + 25% Motala12 @ D = 0.0144

That's again a better fit than the one you propose. The difference is forcing EHG instead of using Motala. Given both options it clearly prefers Motala.

D-Stats that work directly on the genes, and not just of the numbers agree.

Davidski said...


How much do you know about Baltic ethnogenesis?

That was actually a rhetorical question, because, with all due respect, obviously not much.

I'll give you a brief outline; archeology and linguistics tell us that Balts are the direct descendants of the Fatyanovo-Balanovo people, who became the eastern Corded Ware or Battle-Axe people.

Now, unfortunately, we don't have any ancient genomes from the Battle-Axe Culture, but the German Corded Ware samples should be pretty close. So let's try them in your test instead of Yamnaya, which doesn't make much sense considering that we have something closer to Battle-Axe.

Lithuanian = 12% Loschbour + 12% Gokhem_MN + 69% Corded_Ware_LN + 6.99999999999999% Motala12 @ D = 0.008

It's a great fit, so we can say that genetics backs up archeology and linguistics in this case.

But could the fit be even better with Battle-Axe samples? Yes, almost certainly, because check out what Haak et al. say about their Corded Ware samples.

"Corded Ware can be modeled as 29.1% Esperstedt, 9.4% Samara_HG, and 61.5% Yamnaya, which suggests that the population of eastern migrants had a slightly higher proportion of EHG ancestry in its makeup than the Yamnaya sample from Samara. Such a conclusion might also be drawn from the f4-statistic presented in SI7 (Table S7.6) that shows f4(Corded_Ware_LN, Yamnaya; Karelia, Chimp) = -0.00001 (Z=0.0). If Corded_Ware_LN was a simple mixture of a population related to our Yamnaya sample and of Neolithic Europeans, this statistic should be negative. However, if Corded_Ware_LN is descended from a population that has a higher proportion of EHG ancestry than the Yamnaya population, then the dilution of EHG ancestry due to European Neolithic admixture (which would cause the statistic to be negative), would be counterbalanced by its increase due to this higher EHG ancestry (which would cause it to be positive). It is quite possible that the variable mixtures of EHG and farmer populations existed in the European steppe, and our Yamnaya population represents only a point in a continuum of such mixtures."

Page 116.

Unknown said...


"I'll give you a brief outline; archeology and linguistics tell us that Balts are the direct descendants of the Fatyanovo-Balanovo people, who became the eastern Corded Ware or Battle-Axe people."

With all due respect, Dave, there is no direct evidence for what you have stated. "Baltic ethnogenesis " occurred in the 13th century, AD!

But I know hat you're *trying* to say- the Gene Pool of Northern Europe was created by the Bronze Age; with more minot admixture and homogenization subsequently.

But the creation of a distinct Baltic ethnies falls in the Middle Ages, when the Slavicization and Christianization of Poland created the necessary socio-political conditions for Baltic Chiefs to see their fellow "Balts" beyond mere local competitors, but as fellow "us" versus a Polish, Rus or Swedish "them".
It didn't have anything to do with when R1a1a arrived in the Baltic rim

Alberto said...


This stat:

f4(Corded_Ware_LN, Yamnaya; Karelia, Chimp) = -0.00001 (Z=0.0)

Doesn't agree with the ones you posted showing quite stronger EHG affinity in Yamnaya than in CW.

And even then, if CW had the same EHG as Yamnaya, and Lithuanians are 70% CW, why do they show so much stronger affinity to WHG/SHG than to EHG?

If you know the ethnogenesis of Baltic people so well, maybe you have some good suggestions about it that can explain it.

Davidski said...


The quote I posted is referring to the steppe ancestors of the Corded Ware, not to the German Corded Ware samples.

The German Corded Ware samples are of mixed origin and so are Lithuanians. Both have admixture from areas of Europe where WHG-like foragers and farmers with their admixture lived.

Unknown said...

Ok so where are we with the current state of debate ?
Is the issue that modern Balts have too much WHG to be Yamnaya or even Corded derived ?

Helgenes50 said...


Lithuanian = 12% Loschbour + 12% Gokhem_MN + 69% Corded_Ware_LN + 6.99999999999999% Motala12 @ D = 0.008

If I take your example with the 4 same populations

The target is my mother ( 100% Norman)

In the first solution, I take Gokhem like in your example= 0 % Motala
But in the third solution which seems the best in her case, we get 2% of Motala ???

Norman = 0% Loschbour + 49% Gokhem_MN + 51% Corded_Ware_LN + 0% Motala12 @ D = 0.0202

Norman = 0% Loschbour + 48% Spain_MN + 52% Corded_Ware_LN + 0% Motala12 @ D = 0.0118

Norman = 0% Loschbour + 46% HungaryGamba_CA + 52% Corded_Ware_LN + 2% Motala12 @ D = 0.0099

a said...

@ capra internetensis said...

"@ a

Did you post in the wrong thread or something?

Anyway, IJ split is almost certainly much earlier than 36 kya (Karmin says 45 kya, Y Full says 43 kya). And why would it be in Eastern Europe anyway? I mean it could be, but why not West Asia?"

IMO,there are not to many regions were the ancient gene pool could be derived and/or combine with the spread of Indo-European languages.
Have a look at this video [17mins]some interesting ideas that you may agree or disagree with. The point about Finns/Hungarians is interesting. Since they also occupy regions within Baltic.

"The Finno-Ugric peoples are any of several peoples of Eurasia who speak languages of the Finno-Ugric group of the Uralic language family, such as the Khanty, Mansi, Hungarians, Maris, Mordvins, Sámi, Estonians, Karelians, Finns, Udmurts and Komis.[1]"

National Geographic Live! - Spencer Wells: The Human Journey

Alberto said...


I don't know. For me the debate is open. I think that figures pointing to over 50% Yamnaya or up to 80% CW are overestimated. But other people who know more than I do think they're right. So what can I say?

I see contradictory data, but mostly pointing to lower Yamnaya/CW than those estimates. I guess we'll have to wait for more DNA to know with certainty.

Davidski said...


I'm not really sure where the Normans came from, but it probably wasn't from Sweden.

We don't know what type of hunter-gatherers lived in Denmark and Norway, and if any of their genes survived.

Helgenes50 said...


I'm not really sure where the Normans came from, but it probably wasn't from Sweden

From what we know, most of them came from The British isles, not directly from Scandinavia, mainly from Danelaw and they probably were a mixture maybe with celtic wives like in Iceland.
But what is sure, the place names in Normandy are more Anglo-Danish

Matt said...

Really, something like Motala plus Unetice (or other post-Corded populations of Germany) seems to work OK for Lithuanians via Haak's models. (Haak's f4 outgroup models have both upsides and downsides relative to this ADMIXTURE based modelling. The downside of ADMIXTURE based modelling is that it's only as good as your ADMIXTURE values are at explaining within and between population variation). I don't really see that that's necessarily that implausible?

You can also get there via combining a MN population with a *lot* of WHG admixture (simulated by the Gokhem+Loschbour combo in Chad/David's example) with a Yamnaya like population with a more *EHG* than the one we have (simulated by Yamnaya plus EHG and essentially what David is talking about by Eastern Corded Ware).

Neither's particularly more compelling at the moment, based on the raw outgroup statistics (and ADMIXTURE it seems). But at least at the moment we know we have populations like SHG around the Baltic (and straight up WHG like populations probably weren't), we know populations like Unetice exist. And possibly some of the populations we call Corded Ware might have been more Unetice like than like the Corded Ware samples we have.
Of course, one convincing challenge to this, atm, would be that the derived EDAR variant may be a bit lower in Lithuanians than expected for a SHG mixed population.

Alberto and David: f4(Corded_Ware_LN, Yamnaya; Karelia, Chimp) = -0.00001 (Z=0.0)
Interesting stat. Suggests CW has less "ENF" admixture (or Neolithic at any rate) than Samara Yamnaya, as if both CW is as related to Karelia as Yamnaya despite having inevitably less EHG and more WHG as a proportion of its non-"ENF" ancestry.

Think of it this way if that doesn't make sense at first blush; if CW had logically exactly the same amount of HG vs ENF ancestry as Yamnaya, and all Yamnaya's was EHG while a bit more of CW's was WHG, then Yamnaya should be closer to Karelia.... which it isn't.

Despite that the K8 shows Yamnaya should having around 7% less ENF than CW.

This could be assessed further via f4 comparisons involving various Near Eastern populations and Yamnaya.

Davidski said...


Can you think of a way to compare the levels of Basal Eurasian admixture in Yamnaya and Corded Ware?


That would explain it. I don't think there was much Motala admixture among the Vikings moving west, and in fact Lithuanians might even have a little more Motala-like ancestry than most Swedes.

Chad said...

With only four pops, it's impossible to get every bit of admixture. There is surely more than Corded Ware involved. We have several groups from the LN,EBA that will be involved, plus Iron Age and possibly Uralic input.

As for Motala, it's without any good reason to assume that Motala people were in Lithuania. It is probably a mix of WHG and EHG. I think that Ajv52 would be a better candidate. They show EHG stuff, where Motala does not. The only thing I can get with Motala is Native American like stuff. Whether that was a first mix, prior to EHG stuff moving in, I'm not sure. I think it's possible that folks that are similar to Native Americans could've moved in first, with mtDNA C, and such. EHG could've then gone into Siberia after them. The same thing happened when reindeer herders went from the steppes, into the far north.

Unknown said...


Thanks for your insights
They make perfect sense. Certainly ""Motala plus Unetice (or other post-Corded populations of Germany) seems to work OK for Lithuamoans" is more parasiminous than arguing that the ancestral Lithuanuans were some ultra -WHG group which mixed some yamnaya group which was even more EHG than the currently samples Yamnaya examples (-from the Samara mind you: which is fairly Eastern). .

Davidski said...

Lithuanians as a mixture of EEF with unusually high WHG + Eastern Corded Ware is obviously the most parsimonious option based on all of the genetic data we have, as well as geography and archeology.

The other options mentioned aren't even plausible at this time.

You should at least wait until you have some evidence of a Motala-like population in the eastern Baltic.

Krefter said...

Davidski, can you put these pop averages in an ANE K8 PCA? I labeled what color and shape I want them to be.

Krefter said...

"Lithuanians as a mixture of EEF with unusually high WHG + Eastern Corded Ware..."

In ANE K8 you'd need an EEF pop with almost 70% WHG. Either way it's excess hunter gatherer ancestry.


"Spanish = 55% Spain_MN + 20% HungaryGamba_HG + 24% Syrian + 1% Kalash @ D = 0.0035"

That's about as good as my best fit for Spanish_Murcia. It probably isn't literal because what are the chances Spanish are a mixture of a native pop who was close to 60% WHG and a pop like Syrians?

A 100% West Asian source for ANE doesn't work for Iberia because West Asians have so little WHG.

Unknown said...


"Lithuanians as a mixture of EEF with unusually high WHG + Eastern Corded Ware is obviously the most parsimonious option based on all of the genetic data we have, as well as geography and archeology.

The other options mentioned aren't even plausible at this time.

You should at least wait until you have some evidence of a Motala-like population in the eastern Baltic."

Sure. I can only speculate, but if EHG admixed groups existed in Sweden, then they surely were so along the Baltic, just down the road from Karelia(?)

Chad said...


Motalas don't look like a straight EHG mix. They're different. They get the Amerindian part, in the EHG hunters, but none of the other component, or whatever it is. Not even supervised runs make the Motala as part EHG. The hunters on Gotland do however show EHG. The way it looks right now, Motala contributed next to nothing in modern Europeans.

Unknown said...

Thanks for your explanation
It makes sense
I understand that you're saying that Motala shows some older, perhaps Palaeo north eurasian component, and not the (perhaps more novel) EHG propper. But what is this based on? Surely not the original K15 which shows them slightly west of a midpoint between Laschbour and Karelia.

Now my points, and I think Aberto's, is not that modern europeans have a large component of Motala- dervived ancestry, but Motala-like ancestry, ie from outside Scandinavia.

Alberto said...

Yes, exactly. When we're saying Motala-like, it doesn't mean they are direct descendants of the Motala guys. We're talking more generally about a population that was quite WHG-like, but with some amount of ANE.

It would be strange that there was a line separating 100% WHG from EHG (60% WHG - 40% ANE). There must have been a gradient. The exact places and proportions we don't know, but from the Baltic to maybe the north Pontic seems possible.

Alberto said...


"Spanish = 55% Spain_MN + 20% HungaryGamba_HG + 24% Syrian + 1% Kalash @ D = 0.0035

That's about as good as my best fit for Spanish_Murcia. It probably isn't literal because what are the chances Spanish are a mixture of a native pop who was close to 60% WHG and a pop like Syrians?"

Yes, obviously not literal. It only shows the best end result, but not the genesis that took place to get there.

The South and East Mediterranean admixture in Spaniards makes it complicated to find good matches by this method. Probably Basques would be a better starting point, and then add some Mediterranean (but non-European) to it.

Davidski said...


In regards to this...

Can you post the names of the colors you want?

But, they must be from the colors list available in this program...

- Download
- Unzip
- Double click on Past3.exe icon
- Tick the box "Row attributes" in the top left corner
- Double click on one of the "black" panels to activate the colors drop down menu
- Choose the colors you want from the drop down menu

Unknown said...


"It would be strange that there was a line separating 100% WHG from EHG (60% WHG - 40% ANE). There must have been a gradient. The exact places and proportions we don't know, but from the Baltic to maybe the north Pontic seems possible."

That's A Bingo .
If one had to guess- it's be modern Poland, perhaps along the Vistula to the Dniester.

Davidski said...

You can cross the Vistula in winter in some places by walking across it. I doubt it was any kind of barrier, in winter or otherwise.

EHG seem to have been far Eastern European/West Siberian foragers who only moved west across the north, and perhaps expanded within Scandinavia from Lappland.

I see KO1 as a really big hint that EHG were never anywhere near East Central Europe. KO1 has 0% ANE.

Unknown said...

I agree that EHG admixture was probably skewed northward.

But Ko1 was from the west Balkans (Jugoslava Vincovici in Croatia). So Ko1 is probably not a good proxy for foragers from Poland or Moldavia.

Unknown said...

And I'd imagine for foragers drifting East across the flatlands of Northern europe was easier than to the southeast, across the Carpathian massif.

This certainly dovetails with the evidence from the "Swiderian culture".

Alberto said...


What is your thought about the North Pontic steppe? The Dnieper is about half way between Samara and Hungary. But it probably had stronger ties with the eastern steppe than with the Hungarian plains. So should be find pure EHG, pure WHG or something in between?

Unknown said...

I'm not David
But we know there was mtDNA C there (c/- Alexi Niktin et al). So there must have been at least some EHG. So that's a limitus ante quem for EHG.

Davidski said...

The mtDNA C from the Neolithic sites on the Dnieper suggest that EHG was present there at that time, but if so, we don't know when it got there and how much further west it was able to expand before the Bronze Age. Probably not much.

None of the ancient Hungarian samples until BR1 carry any traces of ANE/EHG, and the steppe runs from southern Ukraine to the Hungarian Basin, so that speaks volumes IMO.

Unknown said...

Mike ThomasMay 16, 2015 at 4:14 AM
It certainly speaks volumes for Hungary, Croatia and Austria, but doesn't say much about the remaining 70 % of Eastern Europe which lies outside the Carpathian Arc - from northern Poland to Lithuana, Moldova and Ukraine, IMO.

Alberto said...

But you see, if EHG where in the Dnieper, that makes things complicated.

Whenever they got there, did they find a WHG population and mixed with them?

Or where they there since the LGM and expanded north and east from there? (but in this case they should be all over EE).

In any case, if they were there in the Neolithic, what about late CT? CT reaches the Dnieper and they had contact with the people there. And CT did have a big population that somehow dispersed. If this population had ANE already, where did they disperse? It doesn't seem they went west to Hungary, so maybe north? Or south?

Davidski said...

CT was a massive Neolithic horizon in terms of population. I doubt that the Neolithic foragers on the Dnieper had much of an effect on it.

The reason EHG/ANE made such a big impact on Europe during the Bronze Age was because of a rapid population expansion from an area that initially had very low population densities, so that EHG foragers weren't swamped by farmers.

Unknown said...


The current (archaeological) thinking is the CT "ended" becuase it dispersed (ie secondarily colonized)- mostly east to the western steppe and north to Poland etc.

But as David has rightly pointed out, at least from an mtDNA perspective, it looks just like another neolithic central european pop.

BUT this data is low res and meagre. So we don;t know what it looked like overall, autosomally and Y DNA.

Alberto said...


"CT was a massive Neolithic horizon in terms of population. I doubt that the Neolithic foragers on the Dnieper had much of an effect on it."

Yes, I can agree with that. But then you go on with:

"The reason EHG/ANE made such a big impact on Europe during the Bronze Age was because of a rapid population expansion from an area that initially had very low population densities"

Don't you see a basic contradiction there? With such low population densities they couldn't have much impact on a massive population like CT, but suddenly a rapid expansion made them have a big impact in half of the known world.

Simon_W said...

Physical anthropological data suggests that Balts are not simply the descendants of the local Battle Axe people (the local variant of the Corded Ware). The Battle Axe people were more massive and hunter-gatherer-like than the central European Corded people. And during the Bronze Age, that is: after the Corded Ware period, more gracile, smaller faced people entered the Baltic, probably from the south.

Of course you can doubt the value of such non-genetical data. But then again, why is there considerable EEF admixture in the Baltic? There were no Neolithic farming cultures in the Baltic prior to the Corded Ware. And if the origin of the Corded people was EHG + something from the Caucasus, then actually there should be no EEF admixture in the Baltic. If the story had ended with the Battle Axe culture and if there were no Bronze Age migrations of LNBA people from the south.

Unknown said...

Simon W

Excellent point. Your evidence from physical anthro matches that obvously seen in PCAs etc. Modern Balts are not simply Corded Ware people. There were ongoing admixtures in periods after the early Bronze Age. Thus, it is to be doubted the local Baltic EBA variant of CWC was actually "Baltic".

Krefter said...


I added colors from that program. Thanks.


This PCA should help with finding good mixtures for Spanish. Also, the "sink" program used here.

Matt said...

Davidski: Can you think of a way to compare the levels of Basal Eurasian admixture in Yamnaya and Corded Ware?

Really not sure. Only ideas I have are based on the outgroup relationship where Basal Eurasian is supposed to be the only thing decreasing affinity to Ust Ishim or ENA in the absence of recent ENA admixture.

Similar to the regression I did comparing ENF from K8 to D Test Yamnaya Ust Ishim Chimp based on the D stats you ran for me.

Of course, those run into problems if affinity to Ust Ishim or an ENA outgroup turns out to be affected by other factors than Basal Eurasian.

Running with the idea that they aren't for now.

So you could use the outgroup D stats:

Corded_Ware_LN Yamnaya Ust_Ishim Chimp
Corded_Ware_LN Yamnaya Papuan Chimp

to get a basic idea. The group with more basal should have less affinity to the UI / ENA, so those stats would be negative if Corded Ware was further from UI / Papuan, or positive if Yamnaya is further from UI / Papuan.

Last time you ran the Ust Ishim stat for me, it was

Corded_Ware_LN Yamnaya Ust_Ishim Chimp 0.007 Z=1.814

indicating Corded_Ware slightly closer to Ust_Ishim than Yamnaya (less Basal).

Other ancient populations with D stats of the same form gave: Alberstedt LN -0.0015, Bell_Beaker 0.0017, BenzigerodeHeimburg_LN -0.0044, Karsdorf_LN 0.0062, Halberstadt_LBA 0.0037, Unetice_EBA 0. So again, Karsdorf was less Basal (closer to Ust Ishim), and I think that sample was your reference for what the real new population moving into Europe may have been like. Bell Beaker and Alberstedt are more or less no different to Yamnaya in their Basal-ness, I would say based on the Z score.

(Using Dai as a control replacing Ust Ishim in a stat of the same form, which should work no long as no ENA admixture, gave Alberstedt -0.0042, Bell Beaker -0.0004, BenzigerodeHeimburg_LN -0.0078, Corded_Ware_LN 0.0048, Karsdorf_LN 0.0051, Halberstadt_LBA -0.0038, Unetice -0.0005. Some differences, still the same general pattern with the Dai based stat usually being -0.002 less than the equivalent Ust Ishim stat.).

That's just a couple stats though, another thing which might make it stronger maybe, I think, simply by the use of the law of averages would be to

a) compute sets of (Test) Yamnaya Ust Ishim Chimp (e.g and (Test) Corded Ware Ust Ishim Chimp (e.g.

b) then for each pair (Test) Yamnaya Ust Ishim Chimp - (Test) Corded Ware Ust Ishim Chimp

c) average these differences out

if the final averaged out difference is positive then the Samara Yamnaya group is less Ust Ishim like and thus more Basal Eurasian than the Corded Ware from Germany.

Marnie said...


"That's just a couple stats though, another thing which might make it stronger maybe, I think, simply by the use of the law of averages would be to . . ."

Could you define mathematically, the term "law of averages", as used in the above statement.

Please discuss in terms of sample size, randomness, and bias.

Krefter said...

I've looked through the trends in the results below and now it's starting to make a lot of sense.

It's looks like Spanish_Murcia is 75% or more LN/BA+Spain_MN, and 25% or less something North African+West Asian. Considering they have 35% WHG you need 75% something ~5,000YBP European(I think LN/BA is apart of that).

Here's what I learned.

Here are the Middle Eastern pops I used in the analysis for reference.

And the LN/BA pops I used are here.

I expect other Iberians to follow the same pattern but be more LN/BA+Spain_MN and less Middle Eastern.

Davidski said...


Here's that plot. Keep in mind that too much SSA skews the results in this analysis, so the position of the Tunisians is irrelevant.

Krefter said...


Can you save that map somewhere. Because I might want to add pops to it in the future.

Davidski said...


Here are those D-stats:


Sure, here's the Past3 dataset if you need it:

Krefter said...

I've been able to get very good fits for French and all Iberians using SouthWest Asians(sometimes also NW Africans), Spain_MN, and LN/BA in ANE K8. Looks like there is a lot of Neolithic survival in Iberia(Euro side is more EEF than LN/BA), especially in Basque who are probably something like 70%.

I haven't looked into Italy yet. SouthEast Europe looks complicated because of their high ANE.

For the rest of Europe.

North sea and NorthEast Euros look basically like LN/BA. Continental Germans(not all), Hungarians, CzechSlovakians, etc. look like a mix of something SouthEastern(Mid east or southeast Europe) and LN/BA.

Krefter said...

Check this out!

I learned how to use "sink" and it's incredible. You plug in 14 ancestor pops, and the machine gives you every possible 4-way mixture using those 14 ancestors for test.txt. It gives you 100s of results in 10> minutes.

If you already know how to use 4-mix, it'll be easy to use sink with this link.

Here are the top 10 results for Spanish_Mucia.

The ancestor pops I used are in the sink instructions link.

I didn't use Loschbour in this one, because it is unlikely people like him lived in Spain in the last 5,000 years. I think the results confirm significant LN/BA in Iberia and significant native Neolithic ancestry.

Other Spanish will probably score higher in Spain_MN and less in Middle Eastern.

Alexandros said...

Pretty good prediction! Guess the Target population..

Target = 45% Lebanese_Christian + 22% Georgian_Imer + 33% Stuttgart + 0% Yamnaya @ D = 0.0064

A suggestion, when trying to model ancient Near Eastern in your prediction model, use a combination of the following 4: Samaritan/Lebanese Christian (closest to ancient Levantines) and Armenian/Georgian (closest to ancient Anatolia-South Caucasus). All the Neolithic action was taking place in these regions.

Matt said...

A little more on the Basal Eurasian affinities of LNBA populations, on a tangent. I was looking at the Haak K16 ADMIXTURE graph to see if there were any obvious patterns there in the components that might suggest anything.

I was kind of annoyed at how the blue European HG (Motala / Loschbour) component was on the stacked underneath the teal component (which I guess is we could call "Other Yamnaya ancestry" for now), making it harder to judge absolute amounts of blue component in the populations, so I flipped some of the parts of the bar graphs.

Kind of confirms visually why it would be hard to generate the LNBA populations in this K16 ADMIXTURE *just* from e.g. Esperstedt_MN and the Samara Yamnaya. The level of European blue is fairly identical in LNBA Germans and Samara Yamnaya, while being lower in MN Europeans, so introducing Yamnaya into MN Europeans to give them teal alone wouldn't work - as it would reduce the level of overall blue "European".

If you used Esperstedt_MN and Samara Yamnaya to model LNBA in Haak's admixture run, you'd need to "top up" the blue European via a HG contribution. At least on the basis of their K16 ADMIXTURE (Haak's outgroup modelling is a little different).

The mix that would work with minimal or no additional contribution for LNBA would only really be BR1 and BR2 (Hungary Bronze Age sample 1) and Yamnaya. Something like 66% Hungary Bronze Age plus 33% Yamnaya / Corded Ware would seem OK for Bell Beaker, then 50:50 Hungary Bronze Age and Yamnaya for other LNBA groups like Unetice, visually. But then we've no evidence that BR1 populations were widespread at all, at the moment.

Unknown said...

Yet again, interesting incites.
Well, I wouldn't be surprised if the Carpathian basin was central to it all. In fact, if go so far as calling it the 'capital' of Bronze Age europe .
It was the centre of metallurgy throughout the period, from where secondary subsidiaries fed into / out of- from the Nordic region, and Central Germany, to the Upper Dnieper

Simon_W said...

Hard to tell which ADMIXTURE analysis is the right one. As I said, using David's K8 data, Bell_Beaker_LN seems to be best approximated by
47% Corded_Ware_LN + 29% Alberstedt_LN + 19% Spain_MN + 5% La_Brana-1.

Though it doesn't really matter which MN and WHG populations are used. This approximation makes sense because at least some admixture from the local Corded People has to be expected. And Alberstedt_LN may be just a late local Corded derived group with some farmer admixture. MN with slight extra WHG may be from local farmers popping up again, or, what's more likely, from a west-> east movement of farmers. At least with this K8 data, the addition of HungaryGamba_BA to the mix doesn't result in better approximations, as far as I have tested.

Simon_W said...

As for Alberto's suggestion that Unetice might be a mix of Motala-like people with Georgian-like ones: The earliest Proto-Unetice is from Moravia and Lower Austria. The epicenter was in southeastern Moravia. That's both far from Scandinavia and from the Caucasus. And moreover it is close to where BR1 has lived, who had 0% West Asian-teal-Georgian-like admixture.

Matt said...

@ Simon, yeah sure, if you use another LNBA population and a WHG like La Brana then I imagine those can together add enough extra WHG to allow a Corded Ware plus Spain MN model to work.

It's just if you try to use Corded Ware plus Spain MN in a 2 way mix, without anyone else, for Beakers or other LNBA it seems that you'd have a bit of a problem getting a fit. The combined population wouldn't be able to get enough WHG (or EuroHG at any rate, remember these ADMIXTURE can't really distinguish HGs very well) while having not too much teal and the right amount of EEF.

If you broke down Corded Ware, Alberstedt and an 80:20 Spain_MN:La_Brana mix, all three would have about fairly close levels of the blue HG component from Haak (Corded a little more, the last 80:20 mix a little less).

HungaryGamba_BA and Corded Ware just looks like a relatively easy 2 way mix for most LNBA populations, like Beakers, certainly for the Haak admixture. Looks like the simplest 2 way pair. I don't doubt that combinations of other LNBA plus 3 other populations could get even closer proportions matches than that pair, particularly for how other ADMIXTURE vary from what was in Haak.

Unknown said...

My best fit, being Scottish.

75% Bell Beaker Late Neolithic + 10% Corded Ware Late Neolithic + 15% Baalberge Middle Neolithic = 0.001

Simon_W said...

Well, if German Bell Beakers were mainly a Corded-BR1-like mix, then they're in line with David Anthony and Marija Gimbutas on the origin of eastern Bell Beakers and Italo-Celtic language.

Krefter said...

Best 4mix results for Tuscan from Sink.

Here are the 14 ancestor pops I used.

There were at least over 100 0.00 results, so I I'll have to look back for all the good fits for Tuscans.

All the fits seem to be telling the same story. Tuscans fit best as about 50% west Asian with significant ANE and maybe some WHG, and 50% something similar to Bronze age Hungarians.

So their non-West Asian-like side is about as Middle Neolithic as it is Bell Beaker or Unetice-like. So we could be looking at significant Neolithic survival in Italy, or massive replacement by immigrants similar to Bronze age Hungarians.

Cyriot having significant WHG and less ENF than Saudi makes Tuscan's Middle Eastern higher than Spain_Murcia.

Considering Tuscans score about 30% WHG they can fit as over 60% mainland 5,000YBP European. Otzei though had significantly less WHG than other Europeans from his time. So, Tuscans may have even more native Neolithic blood than what these fits give. Add to that we don't know exactly who lived in Italy 5,000 years ago, all we have are mainland proxies.

Anyways I think this confirms West Europeans are mostly LN/BA+MN.

Krefter said...

Here are my best fits using Sink. My 5% Ameridian and African makes it hard to get good fits.

33% Unetice_EBA + 42% Bell_Beaker_LN + 10% Spain_MN + 15% Syrian @ D = 0.0101

39% Unetice_EBA + 35% Bell_Beaker_LN + 10% Gokhem_MN + 16% Syrian @ D = 0.0102

When I added Yamnaya.

46% Spain_MN + 7% Loschbour + 33% Yamnaya + 14% Syrian @ D = 0.0138

My Yamnaya K6 results.

Yamnaya: 36.5151
Pre-Yamnaya: 46.883
Middle Eastern: 11.3951
WHG-extra: 2.436
East Asian: 1.7989
Sub Saharan: 0.9719

Alexandros said...

Krefter (or anyone else), can you please provide some more info on using the 'Sink' option? Should I be expecting all possible combinations of 4, for a given specific list of populations? I do not seem to get it right following your instructions.

Alexandros said...


Are you sure that by including 'Syrian' you get the best possible fit for you? I would be curious to see what happens if you replace Syrian with Lebanese_Christian and then with Cypriot. I suspect the the model will fit slightly better.

Krefter said...


First you download this.

Then follow the instructions Davidski gave in this article(not in comment section).

Then you copy and paste what's in this link.

Feel free to ask if you don't understand.

"Are you sure that by including 'Syrian' you get the best possible fit for you?"

Sink created every 4-way combination of 14 pops to find the best fits for me. Most of the 14 I chose were West Asian, including Lebanese_Christian and Cypriot, but Syria always worked better.

Chad said...


Did you try the Druze?

Krefter said...


No I didn't.

But I did without Sink and they give a slightly worse result.

10% Spain_MN + 64% Bell_Beaker_LN + 13% Unetice_EBA + 13% Druze @ D = 0.0134

Anyways, SE_English though do better with Druze.

10% Spain_MN + 16% Bell_Beaker_LN + 67% Unetice_EBA + 6.99999999999999% Syrian @ D = 0.0054

9% Spain_MN + 18% Bell_Beaker_LN + 66% Unetice_EBA + 6.99999999999999% Druze @ D = 0.003

And even better with Tuscan.

6% Spain_MN + 25% Bell_Beaker_LN + 55% Unetice_EBA + 14% Tuscan @ D = 0.0021

7% Spain_MN + 64% Norwegian + 18% Hinxton4 + 11% Tuscan @ D = 0.0034

Chad said...


How about Iraqi Jews?

Krefter said...


11% Spain_MN + 18% Bell_Beaker_LN + 65% Unetice_EBA + 5.99999999999999% Iraqi_Jewish @ D = 0.0023

Here's SE_English's best fits using Sink.

These are the pops I used.

Chad said...

You should do one with EN, WHG, EHG, and Iraqi Jews.

Krefter said...


23% HungaryGamba_EN + 24% Karelia_HG + 28% Iraqi_Jewish + 25% Loschbour @ D = 0.0227

21% Stuttgart + 24% Karelia_HG + 29% Iraqi_Jewish + 26% Loschbour @ D = 0.0226

56% HungaryGamba_EN + 34% Samara_HG + 0% Iraqi_Jewish + 10% Loschbour @ D = 0.0221

54% Stuttgart + 34% Samara_HG + 0% Iraqi_Jewish + 12% Loschbour @ D = 0.0222

I'll do a sink with ENs, HGs, and West Asians later.

Krefter said...

Keep in mind both EHGs are scoring above 1% in ASI, East Asian, and Oceania(only Samara_HG).

Garvan said...

Looking at the range of values for individuals it appears that modern Europeans are very close to the the bronze age samples.

I am southern Irish and I get good fits using a few of the Bell Beaker samples, taking the values for the individuals from an earlier spreadsheet that I downloaded from this blog.
[1] Target = 64% BB_I0112 + 9% BB_I0806 + 0% BB_I0113 + 27% BB_I0108 @ D = 0.0058
[1] Target = 64% BB_I0112 + 9% BB_I0806 + 0% Hinxton4 + 27% BB_I0108 @ D = 0.0058


Krefter said...


SE_English can't get good fits using WHG+EHG+EN+West Asian. I've ran through other Euros and they do get decent fits and the percentages resembles Yamnaya and Euro_MN.

These are SE_English's best fits.

These are the pops I used.


Krefter said...


Can you post your ANE K8 results? There's no Irish reference in the spreadsheet.

Chad said...

Use these..
Stuttgart, Spain_EN, Esperstedt_MN, Baalberge_MN, Spain_MN, Loschbour, Karelia_HG, Samara_HG, Iraqi_Jewish, Druze, Iranian_Jewish, Lebanese, Nganasan, Yakut

Garvan said...

Irish with 8 GG Parents from Co. Cork.

ANE 0.149019
South_Eurasian 0.000994
Near_Eastern 0.374092
East_Eurasian 0.004447
WHG 0.471154
Oceanian 1.00E-005
Pygmy 1.00E-005
Sub-Saharan 0.000274

Also using the Bell Beaker LN samples plus an EN sample you can get good result for SE_English

SE_English = 65% BB_I0112 + 21% BB_I0806 + 4% BB_I0108 + 10% HungaryGamba_EN @ D = 0.0024


Krefter said...


I'd rather use Unetice than Bell_Beaker in 4mix. There's alot of variation in Late Neolithic Germans, probably because the admixture event between Middle Neolithic farmers and immigrants from East Europe had just occurred.

"Looking at the range of values for individuals it appears that modern Europeans are very close to the the bronze age samples."

It's obvious now that Bell Beaker, Corded Ware Unetice, Urnfield, Tumulus, etc. were the predecessors of most modern European ethnic groups. One way or another modern Euro countries are Bell Beaker, etc. folk who've changed alot culturally.

Bell Beaker, Unetice, etc. genetic types and linguistic-type(IE) had been running around for 2,000 years by the time they were first recorded in writing, and were by that time were the Celts, Italics, Germans, etc.

Open Genomes said...

David and everyone, here is a update to 4mix called 4mix_multi which allows you to generate results in CSV file format for multiple targets in one run: Download

This will enable the simultaneous comparison of a large number of samples / populations, and sorting, filtering, and graphing the results in spreadsheets.

Simon_W said...

I think the K8 approximations with just slight Near Eastern and stronger Cypriot-like admixture describe my ancestry better than the K9 approximations with strong North African admixture.

Because it's against common sense to think that any region of Italy has up to 50% North African admixture. Slavery and free migration in Roman times probably resulted in rather random mixture than in the strong presence of one particular ethnic group.

On the other hand admixture with a Cypriot-like element seems to be present throughout mainland Italy and Sicily, in varying proportions. This must be related with the strong presence of y-haplogroup J2.

Modern y-haplogroup frequencies backup my opinion: The Romagna is dominated by R1b, J2a, E-V13 and G2a. Haplogroups typical for the southern Levant and for Northern Africa amount to 6.1% in Rimini (4.1% E-M123 + 2% E-V65) and to 10.3% in Bologna (6.9% J1 + 3.4% E-M81).

Alexandros said...

I have a simple technical question which may be obvious to most, but still unclear to me. Is the 4mix test utilizing data at the admixture components (i.e. K8) level or within the levels?

In other words, if 2 hypothetical populations have exactly the same K8 admixture components, will they be giving exactly the same results if you include them alternatively in the model? Apparently, 2 populations that have identical K8 components may have different genetic diversity within these components. My understanding is the such genetic diversity is not captured by the 4mix analysis. Correct?

Garvan said...

This is how I see the variation in the samples, looking at a selection of 33 results.

1. Early Neolithic (EN) to Middle Neolithic (MN) to Late Neolithic on a WHG cline

Stuttgart (7500 ybp) to Loschbour (8000 ybp)

1. HungaryGamba_EN = 97% Stuttgart + 3% Loschbour @ D = 0.003
2. LBK_EN = 96% Stuttgart + 3.99999999999999% Loschbour @ D = 0.0042
3. Spain_EN = 90% Stuttgart + 10% Loschbour @ D = 0.0063
4. Sardinian (Modern) = 90% Stuttgart + 10% Loschbour @ D = 0.0067
5. Esperstedt_MN = 80% Stuttgart + 20% Loschbour @ D = 0.0097
6. Baalberge_MN = 77% Stuttgart + 23% Loschbour @ D = 0.0198
7. Spain_MN = 75% Stuttgart + 25% Loschbour @ D = 0.0106
8. Gokhem_MN = 73% Stuttgart + 27% Loschbour @ D = 0.0177

Should get roughly the same result with HungaryGamba_HG or La_Brana-1 instead of Loschbour

9. Gokhem_MN = 72% Stuttgart + 28% HungaryGamba_HG @ D = 0.0168
10. Baalberge_MN = 77% Stuttgart + 23% HungaryGamba_HG @ D = 0.0184

Yamnaya is outside current day Europeans variation
Corded_Ware_LN is also outside these samples, near the edge.

Most modern north-central European populations should be explained by Corded_Ware_LN + MN population mix. Looking for something that does not fit.

11. Spanish_Andalucia = 60% Stuttgart + 32% Corded_Ware_LN + 8% Loschbour @ D = 0.024
12. Spanish_Cataluna = 54% Stuttgart + 35% Corded_Ware_LN + 11% Loschbour @ D = 0.0174
13. Spanish_Pais_Vasco = 54% Stuttgart + 27% Corded_Ware_LN + 19% Loschbour @ D = 0.0087
14. Basque_French = 55% Stuttgart 24% Corded_Ware_LN + 21% Loschbour @ D = 0.0067
15. French = 42% Stuttgart + 48% Corded_Ware_LN + 10% Loschbour @ D = 0.0081
16. German = 33% Stuttgart + 57% Corded_Ware_LN + 10% Loschbour @ D = 0.0058
17. Hungarian = 33% Stuttgart + 60% Corded_Ware_LN + 7% Loschbour @ D = 0.0088
18. North_Italian = 64% Stuttgart + 33% Corded_Ware_LN + 3% Loschbour @ D = 0.005
19. Bosnian = 38% Stuttgart + 59% Corded_Ware_LN + 3% Loschbour @ D = 0.0056
20. Bulgarian = 52% Stuttgart + 48% Corded_Ware_LN + 0% Loschbour @ D = 0.0225
21. Lithuanian = 7% Stuttgart + 73% Corded_Ware_LN + 20% Loschbour @ D = 0.0074
22. Polish = 17% Stuttgart + 69% Corded_Ware_LN + 14% Loschbour @ D = 0.0067
23. Estonian = 4% Stuttgart + 78% Corded_Ware_LN + 18% Loschbour @ D = 0.0194
24. SE_English = 31% Stuttgart + 55% Corded_Ware_LN + 14% Loschbour @ D = 0.0072
25. SW_English = 31% Stuttgart + 55% Corded_Ware_LN + 14% Loschbour @ D = 0.0069
26. West_Scottish = 25% Stuttgart + 62% Corded_Ware_LN + 13% Loschbour @ D = 0.0098
27. S_Irish (n=1)= 27% Stuttgart + 57% Corded_Ware_LN + 16% Loschbour @ D = 0.013
28. Ukrainian_Belgorod = 18% Stuttgart + 72% Corded_Ware_LN + 10% Loschbour @ D = 0.0071

Older samples
that fall into the modern range.
29. Hinxton4 = 22% Stuttgart + 61% Corded_Ware_LN + 17% Loschbour @ D = 0.0152
30. Bell_Beaker_LN = 22% Stuttgart + 66% Corded_Ware_LN + 12% Loschbour @ D = 0.0068
31. Unetice = 20% Stuttgart + 62% Corded_Ware_LN + 18% Loschbour @ D = 0.0139

Bad matches. Require more aDNA samples.
32. Greek = 76% Stuttgart + 24% Corded_Ware_LN + 0% Loschbour @ D = 0.07
33. Tuscan = 71% Stuttgart + 29% Corded_Ware_LN + 0% Loschbour @ D = 0.0305


Simon_W said...

@ Alexandros

Since what is entered as the target's data is nothing but admixture components, the 4mix analysis cannot but work with this, and so logically targets and populations with exactly the same components will behave exactly the same and are thus interchangeable.

Alexandros said...


Thanks a lot for the clarification. I thought that this was the answer but I just wanted to confirm with people who have more experience in population genetics.

By the way, very interesting analysis you present above. Note that in order to 'explain' the Greeks better, you need to throw in an additional Caucasus-like component which is not included in the ancient genomes. A good proxy population for that is Georgians. Check below:

Target = 56% Stuttgart + 24% Corded_Ware_LN + 0% Loschbour + 20% Georgian_Imer @ D = 0.0075

If you throw in a typical Eastern Mediterranean population like Cypriots, the model fits even better.

Target = 43% Stuttgart + 27% Corded_Ware_LN + 21% Cypriot + 9% Georgian_Imer @ D = 0.0063

Simon_W said...

Yes, I've noticed before that in central Greece and towards the north there is a higher Caucasus : Near Eastern ratio than on Crete or the West Anatolian coast, presumably because the early Greeks had a higher Caucasus admixture from their IE roots.

Simon_W said...

Regarding my own ancestry, my K8 based approximations are not really credible either. I could only be 62% HungaryGamba_BA if southern Germans were 100% HungaryGamba_BA, i.e. with 0% Germanic admixture, which is hard to believe. It's more credible that I'm 72% Bell Beaker, like the K9 based analysis suggests. But then what about the considerable North African admixture that is also suggested by K9? Perhaps it's not really North African but from a hitherto unsampled early southeast European farming population that was more basal than any EEF population sampled so far. Perhaps this was associated with the E-V13 that's rather common in the Romagna.

Krefter said...

Middle Neolithic Euros should be used as an ancestor proxy for Central-North Euros and Iberians(we don't know what was in other parts of Europe). Stuttgart's high ENF hides recent West Asian ancestry.

alobrix said...


Please, can you get these percentages for the Spanish_Galicia?
more or less corded ware? I guess more than basques and andalusians.

The lower Corded_Ware percentage in basques suggest more limited indo-european input in the setting of iberian populations.

Maybe these differences explain this:

Spanish_Andalucia = 60% Stuttgart + 32% Corded_Ware_LN + 8% Loschbour @ D = 0.024
Spanish_Cataluna = 54% Stuttgart + 35% Corded_Ware_LN + 11% Loschbour @ D = 0.0174
Spanish_Pais_Vasco = 54% Stuttgart + 27% Corded_Ware_LN + 19% Loschbour @ D = 0.0087

Unknown said...

The Sink works well. Cheers, able to pinpoint it.

West Scot = 47% Hinxton4 + 1% Bell_Beaker_LN_1 + 43% Alberstedt_LN + 9% HungaryGamba_BA = 6e-04

West Scot = 46% Hinxton4 + 2% Halberstadt_LBA + 43% Alberstedt_LN + 9% HungaryGamba_BA = 5e-04

West Scot = 47% Hinxton4 + 1% Corded_Ware_LN1 + 42% Alberstedt_LN + 10% HungaryGamba_BA = 5e-04

Garvan said...


These are the results for Spanish_Galicia that you asked for:

Spanish_Galicia = 57% Stuttgart + 34% Corded_Ware_LN + 9% Loschbour @ D = 0.0302

Garvan said...

Stuttgart, Corded_Ware_LN & Loschbour don't produce good fits for many Spanish Provinces (Spanish_Galicia, D=0.0302). Adding a Bedouin in the mix gives better fits.

Spanish_Galicia = 33% Stuttgart + 23% Bedouin + 27% Corded_Ware_LN + 17% Loschbour @ D = 0.0089

capra internetensis said...

What the heck is up with those Yamnaya:Ust'-Ishim D stats?

Yamnaya is equally basal as Starcevo_EN, Spain_MN, and the Iceman, and only marginally less basal than Spain_EN and LBK_EN; significantly *more* basal than Germany_MN and Corded_Ware_LN. Yet in K8 Spain_MN comes out as 54% Near Eastern, Corded_Ware as 31% Near Eastern, and Yamnaya as only 24% Near Eastern.

Corded Ware is only marginally more basal than East Asians and European hunter-gatherers.

Krefter said...

The D-stats I've seen say Yamnaya is as basal as Corded ware and much less than Middle Neolithic Euros.

Matt said...

@ Capra, yeah, you're right. I missed those new stats with CW when David posted them up, looking at them now. How does Germany_MN end up with an excess of more of an excess of sharing with Ust Ishim than EHG / Motala for example, following the model where all ancient Eurasian HG except Basal Eurasian and ENA are all equally related to UI, and Germany_MN we know has a lot of farmer ancestry (and not just via the K8 but other ADMIXTURE)? Puzzling.

Still, comparing the D(Test, Corded Ware, UI, Chimp) stats to D(Test, Yamnaya, UI, Chimp) as I said I would, there is pretty much always this -0.007 offset where the Corded Ware stat is lower than the Yamnaya stat. Especially if you discard the samples with low SNP overlap in the 10s of thousands, and especially from 300,000 SNP overlap up set.

And that's consistent with D(Yamnaya, Corded Ware, UI, Chimp) and D(Corded Ware, Yamnaya, UI, Chimp) evaluating to -0.007 and 0.007 respectively.

Graphed - (basically perfectly correlated at 1 with an intercept of 0.007, meaning the Yamnaya stat is always the equivalent CW stat +0.007).

Also, Capra re: patterns, I didn't want to clutter up an already too complicated or long post any more, but there are some weird patterns in the D (Pop, Yamnaya; Ust Ishim Chimp) stats and when compared vs the D (Pop, Yamnaya; Dai Chimp) for ancient populations. - copy into a text file and save as CSV to look at as a spreadsheet.

When you compare the D (Pop, Yamnaya; Ust Ishim Chimp) stats with the D (Pop, Yamnaya; Dai Chimp), then what you find is that the Ust Ishim stat is most positive relative to the Dai stat in Kostenki14 and the Early Neolithic and Middle Neolithic relative to EHG influenced populations. All the ancient populations other than EHG are a little less related to Dai than you would expect from their Ust Ishim based stat, compared to Yamnaya (or Yamnaya is a little more related to Dai compared to its Ust Ishim affinity).

The two D stats graphed -

I don't know if these patterns are real, or are caused by some low SNP overlap issue once you get three ancient populations together to test, and that really squeezes the available overlap in some instances. Some of that could be potentially tested by using D (Ust Ishim Dai, Ancient Pop, Chimp), D (Ust Ishim, San, Ancient Pop, Chimp), D (Dai, San, Ancient Pop, Chimp) stats to reduce the number of overlapping ancient populations. Dai San could provide similar information on Basal Eurasian in the presumed absence of ENA / African admixture. Or maybe rerun the stat set with some modern populations rather than Yamnaya as a check. Might try that if / when I can get my Linux running again.

But on the other hand, re: SNP overlap, many of the populations like Germany_MN and Iceman actually seem to have pretty good 300,000+ SNP overlap. Above two D stat graph pruned for only those above 300,000 SNP -

Unknown said...

I'm getting odd results in supervised too. It's like Samara EHG is already Yamnaya like. They come out 97-87% Samara and 3-13% Bedouin.

capra internetensis said...


Which D-stats?

If we use Dai instead of Ust'-Ishim, then Yamnaya is a lot less basal than Middle Neolithic:

Chimp Dai Yamnaya Spain_MN -0.008 -2.414

But Corded Ware is *still* less basal:

Chimp Dai Yamnaya Corded_Ware_LN 0.0048 1.449

alobrix said...




Do you know this paper?

capra internetensis said...


Thanks for the graphs, they really help to visualize the information.

So, using Ust' Ishim as our reference point for basalness, Yamnaya is about as basal as Spain_MN, Unetice, Bell Beaker, Norwegians, Basques, and Scottish. LBK_EN is about as basal as English, French, and Czechs. Most Southern Europeans are considerably more basal (but this includes African of course) than Neolithic farmers. (They had to get all that E and J Y-DNA from somewhere.)

Corded Ware comes up significantly less basal than Yamnaya.

I don't think this can be some kind of artifact of using ancient DNA, because it persists when you swap in modern populations, for instance Czechs and Basques.

Chimp, Ust_Ishim : LBK_EN, Czech 0.0004 0.165
Chimp, Ust_Ishim : Yamnaya, Czech -0.0023 -0.842
Chimp, Ust_Ishim : LBK_EN, Yamnaya 0.0028 0.813

Chimp, Ust_Ishim : Yamnaya, Basque -0.0004 -0.165
Chimp, Ust_Ishim : LBK_EN, Basque 0.0024 0.985
Chimp, Ust_Ishim : LBK_EN, Yamnaya 0.0028 0.813

This is very consistent across modern populations.

We already know that Yamnaya and EHG are closer to East Asians (couldn't find one with Dai):

Atayal, Ust_Ishim : Yamnaya, Loschbour 0.0114 2.169
Atayal, Ust_Ishim : EHG, Loschbour 0.0194 3.107

So either the ancient Europeans have some affinity to Ust'-Ishim that Yamnaya lacks, or Yamnaya is just closer to Dai, or both. I can't think of a really good test for it.

Chimp, Ust_Ishim : EHG, Loschbour 0.0037 0.548
MbutiPygymy, Ust_Ishim : Dai, Loschbour -.0003 -0.045
Chimp, Ust'-Ishim : Dai, Loschbour -.0064 -1.103
Chimp, Ust_Ishim : Dai, EHG -0.0094 -1.726

Some weird results there.

Krefter said...


I wouldn't take those results too literally. ENF/Ancient Near Eastern ancestry is very real and we know ho wit is distributed today and in ancient genomes.

Davidski said...

Just because Yamnaya has more Basal Eurasian ancestry than Corded Ware doesn't mean it has more Near Eastern ancestry. In fact, it has less.

See that's the pitfall that the f4 stats are falling into; they go so far back in the phylogeny that they can't estimate Near Eastern ancestry correctly.

capra internetensis said...


I expect the D stats would work better for determining Near Eastern ancestry if I had actually included Near Eastern references. ;)

Alexandros said...

I have a (naive) question regarding modelling Near Eastern ancestry.

First of all I have been trying to model Stuttgart as a mix between a Near Eastern ancestral population and a known HG individual (i.e. Loschbour, La-Brana, etc.). I found that the best model was the following:

Stuttgart = 22% Loschbour + 78% Samaritan @ D = 0.0697

Now in my mind this means that whenever I am trying to model a modern population using ancient genomes and I want to throw in a Near Eastern component, I should be using Samaritans for that. When I actually tried to do this, it was not always working as the ideal model. For example, when I try to model Greeks, as a mix between Near Eastern, EEF, Caucasus and an early European IE population like Corded Ware, I get:

Target = 52% Stuttgart + 5% Samaritan + 17% Georgian_Imer + 26% Corded_Ware_LN @ D = 0.0069

This is pretty good, but when I include Cypriot instead of Samaritan, the fit improves:

Target = 43% Stuttgart + 21% Cypriot + 9% Georgian_Imer + 27% Corded_Ware_LN @ D = 0.0063

I know that the differences are small, but why should Cypriot improve this model compared to Samaritan? Samaritan are definitely more 'pure' in terms of Near Eastern ancestry. What the Cypriots add is more WHG and more ANE, but the model already had lots of these from Corded Ware and Georgian, respectively.

I guess my overall question is should I be right sticking with 'Samaritan' when modeling Near Eastern ancestry or should I be checking what best fits each population/individual? I have seen people in the blog using all sorts of populations for modelling Near Eastern ancestry, like Bedouin, Druze, Syrians, Iraqi Jews, etc.

Krefter said...


"First of all I have been trying to model Stuttgart as a mix between a Near Eastern ancestral population and a known HG individual (i.e. Loschbour, La-Brana, etc.). "

One of the components in ANE K8 is "Near Eastern". Davidski created what lots of Neolithic Near easterns were probably like. It's pretty much the same thing as EEF, minus like 27% WHG which came from West Asia and SouthEast Europe.

So, there's no reason to try to find Near eastern ancestry percentages in Early European farmers. Modern Near easterns have mixed with ANE-types, Africans, and South Asians since EEF's ancestors left over 8,000 years ago.

For WHG I added my own to my spreadsheet which scores 100% WHG, instead of using Loschbour, La Brana-1, etc. because all their scores have a little noise.

"I know that the differences are small, but why should Cypriot improve this model compared to Samaritan? Samaritan are definitely more 'pure' in terms of Near Eastern ancestry."

Whether Samartian is more pure Near Eastern or not doesn't matter. This is all about numbers. Whether the mixture fits history or not doesn't matter to the calculator.

Cypriot is more similar to Greeks than other Near Easterns are, so that could be why.

Davidski said...

The genetic landscape of the Near East has probably changed significantly since the Neolithic, so it depends which period you're focusing on and which populations you're modeling.

Despite some WHG-related admixture, Stuttgart is probably still the best proxy for populations from Neolithic Anatolia and maybe even the northern Levant.

Georgians might be useful proxies for the Neolithic Transcaucasian population that mixed with Eastern European hunter-gatherers on the steppe to form the pastoralists of the Bronze Age who then invaded the rest of Europe.

However, keep in mind that southern Europe, especially southeastern Europe, has seen gene flow from the post-Neolithic and even post-Islamic expansion Near East, so using various modern Near Eastern groups to model the Near Eastern ancestry of, say, Sicilians and Greeks, might work best.

Krefter said...

"Target = 52% Stuttgart + 5% Samaritan + 17% Georgian_Imer + 26% Corded_Ware_LN @ D = 0.0069"

Target = 43% Stuttgart + 21% Cypriot + 9% Georgian_Imer + 27% Corded_Ware_LN @ D = 0.0063"

Those are good fits thanks for sharing.

EEFs I think are suppose to have originated around Greece. Maybe WHGs weren't there in the Neolithic, so throughout the Neolithic Greeks would have stayed Stuttgart-like.

All of Greek's ANE couldn't be from people similar to modern West Asians, some is definitely from Yamnaya-types north of the Caucasus. Whether or not Yamnaya-type blood in Greece today is from people who brought Greek language to Greece is debatele.

It would make sense that a lot of Iron age Greek blood is still there(majority?) and that Iron age Greek blood is a mix of Yamnaya-types and others.

«Oldest ‹Older   1 – 200 of 204   Newer› Newest»