search this blog

Saturday, June 25, 2016

D-stats/nMonte open thread #3

For the latest datasheets with D-stats of the form D(Chimp,Columns)(Mbuti.DG,Rows), featuring samples from Lazaridis et al. 2016, see here, here and here.

Datasheets with D-stats of the form D(Chimp,Rows)(Mbuti.DG,Columns) are available here, here and here. D-stats 1 and 1b include Iran_Chalcolithic in both the rows and columns, while D-stats 3 and 3b have Eastern_HG in both the rows and columns.

The interesting question is, which of these sheets is the best for estimating admixture proportions, primarily in populations from West Eurasia?


Matt said...

Not using these for modelling at the moment, however, some quick observations:

- If you take Iran_Chalcolithic (column) - Kotias (column) then the populations most at the Kotias end are Steppe_EMBA / Stepp_LNBA, WHG, EHG, Caucasians, while populations at the Iran_Chalcolithic end are Levant_Neolithic, EEF, BedouinB, and to a lesser extent Iran_Neolithic. And these and some Mediterranean populations are the only populations who share more with Iran_Chalcolithic than Kotias, really.

So yeah, the Iran_Chalcolithic has too much Levant_Neolithic in it to form such a strong dimension of its own as Kotias, just as we'd expect from the PCA. Shame the Iran_Neolithic isn't in good enough shape to form a column as well as a row. Also CHG probably does have some extra EHG in it to attract it to the Steppe.

- Conversely to this, only BedouinB, Israel_Natufian, Somali, Moroccan, Masai, Tunisian, Loschbour, Esan and Yoruba share more with BedouinB than Iran_Chalcolithic. Seems Bedouin_B is very isolated and has some probably quite low African ancestry?

- Iran_Chalcolithic vs Georgian drift isolates West Europeans, some Steppe and WHG at one end, vs the ancient Middle East at the other end, with only ancient Iranians sharing more drift with Iran_Chalcolithic than with Georgians.

- Finally, comparing Anatolia_Neolithic to Iran_Chalcolithic, all populations are closer to Anatolia_Neolithic, except the Indians, Satsurblia, and the ancient Iranians themselves. Even present day Caucasians come out closer to Anatolia_Neolithic.

Euro HG admixture, of any sort, seems to have a very large effect on increasing drift sharing! I guess this reflects that Euro HG, whether east or west, were a relatively small population size, closely related grouping, and any ancestry from their clade draws a sample quite strongly to them and each other.

Seinundzeit said...



Great stuff, this allows a more intensive exploration of possibilities, compared to PCA.

63.55% ancient Eastern Europe/Steppe
20.40% Iran_Neolithic
16.05% Munda

56.35% ancient Eastern Europe/Steppe
22.05% Iran_Neolithic
21.60% Munda

I'll try some other combinations.

Matt said...

To maybe inform models to test, here are some PCA of the data:

WHG are probably somewhat stretched away from other samples by use of Bichon (which they share more with than other Villabruna / WHG clade do) and EHG and ANE are probably closer due to lack of columns in this datasheet. Still kind of interesting.

Made me think that if the admixture into Yamnaya were direct from a very basal Iran_Neolithic clade, then the kind of sex biased explanations for their levels of R1 might not be required. If you only had 10% from a very basal farmer population into Steppe Eneolithic then 10% more into Yamnaya, it's much more possible for their y lineages to simply go extinct, without any sex bias. But I guess that wouldn't explain Neolithic mtdna. - First PCA from above with an South Asian cline superimposed

It actually sort of shocks me how distant the earliest farmers appear on some of these PCA made from these stats (I used the first sheet, btw). On the last I posted up, the Levant Neolithic appears about as distant from Iberia_EN as Villabruna is (and the Natufians are more distant from Iberia_EN than Villabruna). Iran_Neolithic is far further from the modern Caucasus than Yamnaya is.

On the last note, Sein, if you have time, would you mind trying a model of Lezgin as Steppe_EMBA plus Iran_Neolithic and Levant_Neolithic?.

Seinundzeit said...

Assuming a model in which ASI=ENA, these are interesting fits.

60.45% ENA (mostly Ami, with some substantial Papuan)
26.15% Iran_Neolithic
13.40% ANE

36.15% Iran_Neolithic
33.05% ANE
16.55% WHG
14.25% ENA (only Ami, doesn't receive any Papuan)

38.7% Iran_Neolithic
29.80% ANE
16.05% WHG
15.45% ENA (only Ami, doesn't receive any Papuan)

38.35% Iran_Neolithic
29.9% ANE
21.70% ENA (only Ami, doesn't receive any Papuan)
10.05% WHG

This seem very reasonable.

If we assume that GujaratiB are comparable to UP Brahmins, and ASI=ENA, then David is right that the Indians on his West Eurasian plot are probably less than 25% ASI.

But that is assuming that ASI is ENA, which might not be the case.


Sure thing.

I tried it, but the model seems odd:

72.1% EMBA Steppe
24.6% Levant_Neolithic
3.3% Ulchi
0% Iran_Neolithic

It's not a great fit, in terms of distance.

Just for comparison, the same setup, but with the Kalash. Oddly, this works slightly better for the Kalash than it does for Lezgins :

59% EMBA Steppe
20.3% Iran_Neolithic
15.35% Ulchi
5.35% Levant_Neolithic

This is a much better model for Lezgins, in terms of distance:

37.25% Armenian_Chalcolithic
25.60% Armenia_MLBA
20.70% EMBA Steppe
11.75% Anatolia_Chalcolithic
4.70% Ulchi
0% Armenian_EBA
0% Iran_Chalcolithic

Matt said...

Thanks Sein, re: the first fit for Lezgin, while it's seems unintuitive, it would make a lot of sense given the relative positions of Levant_Neolithic and Steppe_EMBA on PCA 1+2 from these D-stats, to illustrate:

On that PCA (and therefore the underlying stats) Lezgin could be modelled as an admixture from Levant_Neolithic to Afanasievo and really is much closer along that line to Afanasievo, so no surprise it comes out in the proportions in nMonte that it does, when given those choices. (Although when looking at the higher dimensions of data it's probably not an ideal fit, as you describe).

By these methods "The World's First Farmers" seem extremely distant. Even from present day people of the Near / Middle East.

Seinundzeit said...


Good points.

For whatever it's worth, I tried to create models comparable to what we see in the preprint:

59.6% BA Steppe (mostly Andronovo, with substantial Afanasievo)
27% Iran_Neolithic
13.4% ENA


50.60% Iran_Late_Neolithic
37.65% Samara_Eneolithic
11.75% ENA


56.65% Iran_Late_Neolithic
31.15% Eastern_HG
12.2% ENA

Very similar to what we see in the paper.

The ENA levels here seem much more reasonable, the paper's estimates are somewhat higher than what is usually seen.

Also, Andronovo/Sintasha do well for Central/South Asians, with this method, even though the paper found a preference for EMBA steppe populations.

Matt said...

So, just trying a couple of models for Yamnaya_Samara and Iberia_MN.

First using as calc the Euro HGs, the early Neolithic Near East, Satsurblia, BedouinB and Ami as an East Asian extra:

Yamnaya_Samara: Eastern_HG - 54.5, Satsurblia 36.7, Hungary_HG 6.65, BedouinB 2.15 distance% = 2.438 %

(Biggest differences look model is more Kotias column, underfitted to

India_South, Iran_Chalcolithic, Georgian, Anatolia_Neolithic, BedouinB)
Iberia_MN: Hungary_HG 44.2, Levant_Neolithic 35.85, Satsurblia 19.95 - distance% = 4.9998 %

(particularly underfitted to Iberia_EN2, BedouinB, Anatolia_Neolithic column, more slightly overfitted to Bichon).

Taking out Satsurblia and BedouinB as calc populations:

Iberia_MN: Levant_Neolithic 48.5, Hungary_HG 45.3, Eastern_HG 5.4, Ami 0.8 - distance% = 5.4208 %

Yamnaya_Samara: Eastern_HG 74.25, Levant_Neolithic 18.65, Iran_Neolithic 5.15, Hungary_HG 1.95 - distance% = 4.235 %

Also, modeling Satsurblia, Anatolia_Neolithic and Samara_Eneolithic under the same conditions:

Satsurblia: Iran_Neolithic 59.75, Eastern_HG 30.05, Hungary_HG 7.25, Levant_Neolithic 2.95 - distance% = 8.4268 %

Anatolia_Neolithic: Levant_Neolithic 73.9, Hungary_HG 21.55, Eastern_HG 4.55 - distance% = 5.1847 %

Samara_Eneolithic: Eastern_HG 85.45, Levant_Neolithic 7.7, Motala_HG 4.75, Ami 2.1 - distance% = 2.736 %"

So some results sort of as expected there, some quite surprising (Yamnaya Samara having a preference for Levant_Neolithic, not Iran_Neolithic, but Satsurblia has a preference for Iran_Neolithic). Fits generally seem quite bad for using the first farmers only, indicating drifts really need later / more proximate farmers to work, with these columns which include later Neolithic and post-Neolithic West Eurasian pops.

I think the EHG may be slightly overfitted in these relative to if you had a Samara column and using Karelia in calc.

Couple moderns with the last set of ancestor populations:

Basque_Spanish: Levant_Neolithic 36.1, Hungary_HG 35.25, Eastern_HG 27.55, Ami 1.1 - distance% = 5.9668%

English_Cornwall: Eastern_HG 38.7, Levant_Neolithic 34.55, Hungary_HG 25.7, Ami 1.05 - distance% = 5.2278%

Finnish: Eastern_HG 47.9, Levant_Neolithic 22.8, Hungary_HG 22.65, Ami 6.65, distance% = 4.6266 %

Moderns don't seem to have any need for Iran_Neolithic, which would be expected via the PCA models where they can be fitted within a triangle of Levant_Neolithic, EHG and WHG. Thinking about it, Yamnaya's result makes sense from this perspective as well, since it also fits in the same triangle. Pretty distant fits though. The actual real populations varied slightly from those samples (Euro HGs plus First Farmers), and then experienced some drift history of their own, even if the proportions from the rough major groups may be roughly correct.

Matt said...

Modeling the recent Levant, Arabian Peninsula, North Africa with the populations:

AG3-MA1, Ami, Denisovan, Eastern_HG, Esan_Nigeria, Iran_Neolithic, Israel_Natufian, LaBrana1, Hungary_HG, Levant_Neolithic, Loschbour, Masai_Kinyawa, Motala_HG, Neandertal_Altai, Villabruna:

Palestinians: Levant_Neolithic 64.3, Iran_Neolithic 26.8, Eastern_HG 5.4, Ami 3.1, Loschbour 0.4 - distance% = 1.9323 %

BedouinB: Levant_Neolithic 87.3, Iran_Neolithic 7.95, Ami 4.1, Eastern_HG 0.65 - distance% = 4.0007 %

Cypriot: Levant_Neolithic 62, Eastern_HG 28.35, Hungary_HG 6.65, Ami 3 - distance% = 5.2898 %

Druze: Levant_Neolithic 68.75, Eastern_HG 27.75, Ami 3.3 - distance% = 4.4076 %

Tunisian: Levant_Neolithic 62.15, Masai_Kinyawa 11.8, Iran_Neolithic 11.75, Esan_Nigeria 4.2, Loschbour 3.55, Ami 3.1, Eastern_HG 2.7, Israel_Natufian 0.75 - distance% = 1.6545 %

Moroccan: Levant_Neolithic 55.35, Masai_Kinyawa 13.55, Iran_Neolithic 7.7, Esan_Nigeria 7.45, Israel_Natufian 6.9, Loschbour 5.15, Ami 2.6, Eastern_HG 1.3 - distance% = 1.5674 %

Slightly unexpected behavior by Cypriot and Druze with the EHG fractions. Bit suspicious of the Ami fractions. Had a go at including Munda as well, in case it was a South Asian effect, but nothing happened there.

Shaikorth said...

Levant_N also took all the SSA in Palestinians. Does it work for Jordanians and BedouinA too?

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Davidski said...

Yeah, I saw that. I'll post a few models in the blog entry when I get around to it. Anatolia Chalcolithic is preferred by Yamnaya to all Iranian and other Near Eastern samples.

Ryukendo K said...
This comment has been removed by the author.
Davidski said...

I just discovered a couple of small errors in the datasheets; the same Karitiana and Kinh_Vietnam were in both the rows and columns, hence they each had an empty cell.

I've made the corrections, and also uploaded a third sheet with Karelia_HG in the rows and Eastern_HG in the columns.

Please let me know if there are any other problems with the sheets and I'll correct them.

Matt said...

Karelia_HG's stats seem a little disordered. There are entries in all columns but it has 0.4512 for Mota and only 0.1264 for Mixe?

Davidski said...

OK, hang on.

Davidski said...

Alright, second attempt. This sheet has Eastern_HG in the rows and Eastern_HG2 in the columns.

a said...

In light of the new cluster's/ancient samples. Are there any plans to update Eurogenes K7-for Gedmatch?

Davidski said...

I'm still getting to know these samples. Once I figure out what they're about I'll be able to use them accordingly in a new test. The Eurogenes selection at GEDmatch does need updating, so yeah.

Olympus Mons said...


"Anatolia_Chalcolithic" 24.45" .... I am telling eveybody. Calcolithic/bronze age was after population revolution made by Ubaid expansion. It was a population "nuclear explosion".

It favors AC for the same reason that the highest variance on r1b is precisely where this new R1b sample (I1635) was found and even bigger variance precisely in Anatolia... see this map.

Just read the abstract on Hovhannisyan et al.

Olympus Mons said...

Just to be even clearer...
R1b L23 was born in Western anatolia by 4.900 bc (ish) (where highest variance exist) and the moved... I dont care.
R1b L51 was born in the Delta of Nile by 4.300 bc (ish) and by 3500 bc was in Iberia.

Davidski said...

Just to be even clearer...
R1b L23 was born in Western anatolia by 4.900 bc (ish) (where highest variance exist) and the moved... I dont care.
R1b L51 was born in the Delta of Nile by 4.300 bc (ish) and by 3500 bc was in Iberia.


Ryukendo K said...
This comment has been removed by the author.
Olympus Mons said...

@Davidski - Nope?

Oh, we will see my friend, we will see. Unfortunately, as with anything else thses days we seem to have to rely on germans to clear the shit out (they are the ones running things in Merimde..).

Olympus Mons said...

Minor correction --- NOT western Anatolia... but really eastern Anatolia... really eastern Anatolia (as per map), because that is were buckloads of diverse R1b cluster by 4.900 bc before moving.

Ariel said...

Davidski said...


Is it possible to include Ari Blacksmith and Egyptians in the most recent Dstats sheet?

Only Egyptians.

Actually, I forgot to add Balochs and Brahuis to these sheets. I'll fix that tomorrow.

Ryukendo K said...
This comment has been removed by the author.
Matt said...

Using new data sheet and only the in theory least admixed calc populations:

AG3-MA1, Ami, Eastern_HG, ElMiron, Hungary_HG, Iran_Neolithic, Israel_Natufian, LaBrana1, Levant_Neolithic, Loschbour, Motala_HG, Villabruna, Yoruba:

Yamnaya_Samara: Eastern_HG 41.85, Motala_HG 21.05, Iran_Neolithic 16.4, Levant_Neolithic 10.7, Hungary_HG, 5.8, Ami 4.2 - distance% = 4.9913 %

For Yamnaya Samara, compared to the same models, adding EHG as a column somewhat reduces EHG in favour of Motala and Levant Neolithic in favour of Iran Neolithic.

Iberia_Chalcolithic: Levant_Neolithic 57.3, Hungary_HG 41.85, Ami 0.85 - distance% = 3.8422 %

Esperstedt_MN: Levant_Neolithic 52.8, Hungary_HG 45.45, Ami 1.75 - distance% = 6.1163 %

Cypriot: Levant_Neolithic 63.5, Eastern_HG 16.05, Hungary_HG 14.3, Ami 6.15 - distance% = 5.4246 %

Basque_Spanish: Hungary_HG 42.75, Levant_Neolithic 37.75, Eastern_HG 15.25, Ami 4.25 - distance% = 6.0891 %

English_Cornwall: Levant_Neolithic 35.9, Hungary_HG 33.35, Eastern_HG 23.25, Ami 4.8, Motala_HG 2.7 - distance% = 5.3971 %

EHG goes down for all.

(The "problem" (or not) of ENA going up with more remote ancestors goes up with using these more ancient populations).

Modifying the above set of calc pops to include Munda and testing South Asia:

GujaratiD: Munda 54.85, Eastern_HG 13.25, Levant_Neolithic 12.85, Iran_Neolithic 12.3, Motala_HG 6.75 - distance% = 3.6981 %

Kalash: Eastern_HG 26.6, Levant_Neolithic 24.05, Munda 22.95, Iran_Neolithic 10.3, Motala_HG 9.45, Ami 5.6, Hungary_HG 1.05 - distance% = 5.1863 %

Using instead Munda plus ANE plus various ME / Steppe cultures:

Kalash: Steppe LNBA-IA 40 (Scythian_IA 28.95, Poltavka 12.05), Near East CHL-MLBA 37(Iran_Chalcolithic 20.85, Armenia_MLBA 16.15), Munda 22 - distance% = 1.5259 %

GujaratiD: Munda 50.55, Near East LN-CHL 31(Iran_Late_Neolithic 17.05, Armenia_Chalcolithic 14.35) Steppe LNBA 18 (Poltavka 16.5, Andronovo 1.5) - distance% = 1.7413 %

GujaratiA: Near East CHL 36 (Armenia_Chalcolithic 22.1, Iran_Late_Neolithic 9.65, Iran_Chalcolithic 4.45), Munda 34.7, Steppe LNBA 29(Poltavka 21.6, Andronovo 7.5) - distance% = 1.2648 %

Assuming ASI % matches Munda then models for ANI and ASI by regression equation for above with latest datasheet, then D-stats for ANI and ASI are:



ANI models with set of least admixed populations as:

ANI - Eastern_HG 36.6, Levant_Neolithic 34.5, Motala_HG 10.15, Ami 8.95, Iran_Neolithic 6, Hungary_HG 3.8 - distance% = 6.3287 %

PCAd -

huijbregts said...

How do you guys use nMonte to calculate the admixture percentages from Dstat sheets?

The simplest way is just to use all the columns and apply nMonte in the same way as with a calculator sheet.
nMonte presupposes that the columns are orthogonal (independent).
This is guaranteed in Davids datasheets which are PCA-scores of raw DNA composition.
It is also safe with calculator sheets, because orthogonality of the columns is highly valued by the authors.
But the columns of D-stats sheets are not orthogonal.
This is a problem because nMonte presupposes that the columns are orthogonal.
If they are not, the calculation of the distance is incorrect.
And as the distance is used to guide the Monte Carlo process, the resulting estimation of the mixture composition is also incorrect.
This is worse than a negligible estimation error, the result may be way off.

Fortunately, using a PCA, the columns of the Dstat sheet can be transformed to (a smaller number of) orthogonal PCA scores.
So IMO the correct workflow is:
1. Choose a set of relevant rows, containing one target row and a number of reference rows. Use all the Dstat columns.
2. Calculate the PCA. I have worked with k=5 and no no centering or scaling.
3. Collect the scores with k columns.
4. Use the scores as input for nMonte. The resulting distances are very small.
Realize that these are not distances between DNA percentages, but distances between Dstat values which are far more homogenous.

Bell_Beaker_Germany -1.845426e+00 9.013090e-03 1.246191e-02 -3.611150e-03 -1.538591e-04
fitted -1.845420e+00 9.007781e-03 1.245808e-02 -3.609276e-03 -1.470620e-04
dif 5.866561e-06 -5.308716e-06 -3.831517e-06 1.873857e-06 6.797083e-06
[1] "distance%=0.0011 / distance=1.1e-05"

"Poltavka_outlier" 72.85
"Esperstedt_MN" 13.45
"Yamnaya_Samara" 4.55
"Iberia_EN" 4.25
"Anatolia_Chalcolithic" 2.95
"Motala_HG" 1.55
"Loschbour" 0.3
"Iberia_MN" 0.1
"Hungary_EN" 0
"Yamnaya_Kalmykia" 0

I did this exercise to see whether this workflow is succesful, but I was surprised by the result.
I had expected a lot of Yamnaya, but not Poltavka_outlier.
Has anybody done a Dstat on Bell_Beaker/Poltavka_outlier?
Did I forget an important row? Or is an important column missing in the Dstat sheet?

Can I get a row Baalberge_MN?

Davidski said...


Bell Beaker Germany and Poltavka outlier are closely related samples with similar genetic structures, so using Poltavka outlier as a reference for Bell Beaker Germany eats up much of the ancient components that make up Bell Beaker Germany.

It's like using Irish as a reference for English, and seeing most of the ancient ancestry proportions disappear.

huijbregts said...

@ Davidski
Thanks. I did not know that they were this closely related.
It confirms my idea that this is the correct way to estimate mixture proportions with Dstats/nMonte even though the columns are not orthogonal.

Alberto said...

These sheet are with the outgroups switched, I guess? I see the same effects of underestimating SSA. Who knows why.

The Armenian_Chalcolithic samples don't really seem to have much, if any, EHG after all:

"Anatolia_Chalcolithic" 65.8
"Iran_Chalcolithic" 24.9
"Loschbour" 5.5
"Ami" 2.6
"Eastern_HG" 0.75
"Satsurblia" 0.45
"Anatolia_Neolithic" 0
"Yoruba" 0
"Iran_Late_Neolithic" 0
"Iran_Neolithic" 0
"Israel_Natufian" 0
"Jordan_EBA" 0
"Levant_Neolithic" 0

What Anatolia_ChL and Armenia Chalcolithic do is to eat up the Anatolia_Neolithic and CHG, probably expected since they're more modern samples. Though they take most of the Yamnaya from southern Europe too, even if Yamnaya is more modern than they are:

"Armenia_Chalcolithic" 45.1
"Anatolia_Chalcolithic" 31.65
"Anatolia_Neolithic" 9.1
"Loschbour" 8.9
"Eastern_HG" 3.5
"Ami" 1.75
"Satsurblia" 0
"Yoruba" 0
"Iran_Chalcolithic" 0
"Iran_Late_Neolithic" 0
"Iran_Neolithic" 0
"Israel_Natufian" 0
"Jordan_EBA" 0
"Levant_Neolithic" 0
"Yamnaya_Samara" 0

(Same source populations, but not showing those with 0 for cleanness):

"Anatolia_Chalcolithic" 41.55
"Armenia_Chalcolithic" 36.45
"Loschbour" 10.55
"Anatolia_Neolithic" 8.25
"Eastern_HG" 1.8
"Ami" 1.4

"Anatolia_Chalcolithic" 71.2
"Loschbour" 16.1
"Armenia_Chalcolithic" 8.9
"Yamnaya_Samara" 2
"Ami" 1.8

"Anatolia_Chalcolithic" 44.8
"Armenia_Chalcolithic" 31.95
"Loschbour" 12.2
"Yamnaya_Samara" 8.3
"Ami" 2.3
"Eastern_HG" 0.45

"Armenia_Chalcolithic" 38.25
"Anatolia_Neolithic" 25.25
"Loschbour" 13.55
"Anatolia_Chalcolithic" 11
"Yamnaya_Samara" 9
"Ami" 1.65
"Eastern_HG" 1.3

"Anatolia_Chalcolithic" 47.95
"Loschbour" 18.3
"Yamnaya_Samara" 17.55
"Armenia_Chalcolithic" 10.25
"Eastern_HG" 4.8
"Ami" 1.15

"Anatolia_Chalcolithic" 52.35
"Yamnaya_Samara" 24.4
"Loschbour" 18.75
"Armenia_Chalcolithic" 3.2
"Ami" 1.3

"Anatolia_Chalcolithic" 37.2
"Yamnaya_Samara" 35.95
"Loschbour" 21.75
"Eastern_HG" 3.5
"Ami" 1.6

"Anatolia_Chalcolithic" 40.4
"Yamnaya_Samara" 22.6
"Loschbour" 18.45
"Armenia_Chalcolithic" 13.55
"Eastern_HG" 2.6
"Ami" 2.4

Davidski said...


Your models for Northern Europe aren't realistic, because they suggest that it was populated by pure hunter-gatherers until the Late Neolithic.

The problem is that you don't have any Middle Neolithic/Copper Age European samples in your reference list. If you add them, you'll see essentially the same old results for Northern Europeans.

Yamnaya_Samara 47.1
Esperstedt_MN 35.95
Motala_HG 8.7
Loschbour 6.2
Ulchi 2.05
Anatolia_Chalcolithic 0

distance%=1.7212 / distance=0.017212

It's likely that there are also similar problems with your models for Southern Europe, although these might not be possible to correct yet due to a lack of sampling from the Balkans.

Kristiina said...

Matt, it would be interesting to know what is your admixture analysis for the Samara hunter-gatherer and the Karelian hunter-gatherer using AG3-MA1, Ami, Eastern_HG, ElMiron, Hungary_HG, Iran_Neolithic, Israel_Natufian, LaBrana1, Levant_Neolithic, Loschbour, Motala_HG, Villabruna and Yoruba?

Alberto said...

Yes, Esperstedt_MN got dropped while testing Bell Beaker and seeing disagreements with previous sheets, with a significant pull towards Yamnaya. In general I keep seeing inconsistencies in this other method of running the double outgroup D-stats. It's not just that Spanish doesn't get SSA admixture, which is a minor problem for Europe, but not for the Near East where we have the new samples to finally model those populations, but it also seems to pull populations towards Euro_EH and ENA (probably that's why the high levels of Ami).

For a sanity check, I compared with the paper's models, and I see the same bias. For example, a simple 2 way admixture of Levant_Neolithic as Natufian + Anatolia_Neolithic shouldn't show any big discrepancy, since Natufians are pretty close to Levant_Neolithic. The paper has them as 67% Natufian + 33% Anatolia_Neolithic. Cross checking with the PCA based sheet I get:

"Israel_Natufian:I1072" 69.2
"Anatolia_Neolithic:I0707" 30.8

So quite close. But with this D-stats sheet:

"Israel_Natufian" 54.6
"Anatolia_Neolithic" 45.4

Which again shows a significant pull towards the more norther population.

Anatolia_Chalcolithic in the paper is modelled as 67% Anatolia_Neolithic + 33% Iran_Chalcolithic. Here I get this:

"Anatolia_Neolithic" 48.5
"Iran_Chalcolithic" 37.3
"Eastern_HG" 14.2

But this is a sample from Barcin, 3800 BC. So that strong pull toward EHG looks again a bit strange.

So I'm wondering if there's any particular reason for using this other method for the stats? Did you see any specific advantage vs. the previous one?

Shaikorth said...

"Palestinians: Levant_Neolithic 64.3, Iran_Neolithic 26.8, Eastern_HG 5.4, Ami 3.1, Loschbour 0.4 - distance% = 1.9323 % "

Does this give any SSA to Jordanians, Palestinians and BedouinA or does it all go to Levantine_n (or Natufian)?

Davidski said...

OK, I updated the sheets. See links in the post above. Now they include Baalberge_MN, Balochi, Brahui, Egyptians and a few extra Asian pops. I don't have a huge choice in this dataset, but it's the one I gotta run because it has the right markers for double outgroup D-stats like this.

Alberto, I'm currently running D(Chimp,Columns)(Mbuti.DG,Rows), which is the usual way to run these stats. I can't remember, what else did I run?

Ryukendo K said...
This comment has been removed by the author.
Alberto said...


Yes, that's what I supposed from the results. The old way was D(Chimp,Rows)(Mbuti,Columns) and didn't have these problems. When you first tested this other method I was comparing side by side both sheets that had the same columns and rows, only the method changed. And I was getting worse models, with significantly higher distances. Remember the case of Spanish_Extremadura not getting any Yoruba or even Moroccan as compared to the other sheet getting it with the same pops? It seems the effect is a more generalised pull towards Euro_HG away from Basal and African, or something like that.


I get this for Palestinian:

"Iran_Chalcolithic" 49.8
"Levant_Neolithic" 27
"Israel_Natufian" 12.8
"Anatolia_Chalcolithic" 3.65
"Ami" 2.9
"Yoruba" 2.15
"Loschbour" 1.7
"Anatolia_Neolithic" 0
"Eastern_HG" 0
"Satsurblia" 0
"Iran_Late_Neolithic" 0
"Iran_Neolithic" 0

But because of the problem I mention above I think this is underestimating SSA. The biggest negative residual (undefitting) is in the Yoruba column followed by BedouinB and Cypriot. While the biggest positive ones (overfitting) are for Iran_Chalcolithic and Kostenki14.

There's no BedouinA or Jordanian in this sheet. Here's BedouinB (same pops as above, only showing those that get some %):

"Levant_Neolithic" 37.8
"Iran_Chalcolithic" 32.3
"Israel_Natufian" 28.2
"Ami" 1.7

But similar pattern in the residuals: underfitting BedouinB and Yoruba, overfitting Anatolia_Neolithic, Iran_Chalcolithic, Kostenki14, EHG, Motala...

Davidski said...

OK, hang on.

Karl_K said...

Hanging... on...

huijbregts said...

@ Davidski
Thank you for adding Baalberge_MN to the sheet. It is is obviously an important component of the Bell Beaker admixture.

Bell_Beaker_Germany without Baalberge_MN
"Poltavka_outlier" 73.05
"Esperstedt_MN" 15.7
"Anatolia_Chalcolithic" 6.15
"Yamnaya_Samara" 3.3
"Loschbour" 1.7
"Yamnaya_Kalmykia" 0.1
"Hungary_EN" 0

Bell_Beaker_Germany with Baalberge_MN
"Poltavka_outlier" 38.75
"Baalberge_MN" 28.1
"Yamnaya_Samara" 17.5
"Yamnaya_Kalmykia" 6.75
"Anatolia_Chalcolithic" 4.3
"Esperstedt_MN" 2.7
"Loschbour" 1.9
"Hungary_EN" 0

I feel good about the less extreme relationship of Poltavka_outlier to Bell_Beaker_Germany.
But the volatility of these estimations is disappointing.

P.S. This sheet still contains a doublure of Georgian.

Matt said...

@ Kristiina, I can't fit either of those rows as they aren't available, but using those calc populations:

Eastern_HG: Motala_HG 50.8, AG3-MA1 49.2 - distance% = 7.1186 %

Underfitting / overfitting: (link to save space)

The combination basically underfits to columns Yamnaya, EHG, all recent West Eurasians, Native Americans and East Asians, and overfits to Ust Ishim, Papuan, Mota, WHG.

Modeling without Motala_HG:

Eastern_HG: AG3-MA1 57.7, Hungary_HG 42.3 - distance% = 7.9113 %

Underfitting / overfitting:

Same issues

Motala_HG itself modelled with the same groups:

Motala_HG: Hungary_HG 87.35, AG3-MA1 12.65 - distance% = 4.7928 %

If you had the ANE (AG3-MA1) as a column as well as a row, then it would probably go down, but we don't have enough samples for anything like that still, I don't think (there was a third ANE in the Fu paper, but low quality?).

Kristiina said...

Matt, thanks a lot! I am just thinking about this Mesolithic/ Neolithic admixture between WHG and EHG in the EHG area. Am I right that there is not any Basal from Anatolia/Iran in Samara hg. How about CHG? Can we now determine if there is any CHG or rather Iran Neolithic in the Samara hunter-gatherer? Does Hungary_HG contain any CHG or Iran Neolithic? Am I right that in your fittings the West/"SHG" versus East/"ANE" ratio is nearly 1:1?

There is no straight route from Sweden to Volga, so Hungary HG is in a much better position to find its way to Russia.

There is no need to answer all my questions but these a the questions that puzzle me.

Davidski said...


Thanks, I got rid of the Georgian duplicate rows.

Btw, it's usually best to explore data with unsupervised tests like PCA, TreeMix or Admixture. On the other hand, supervised tests like D-stats/nMonte are best suited to carefully crafted models based on multiple lines of evidence, including unsupervised tests, uniparental markers and even linguistics and archeology.

Onur Dincer said...


Btw, it's usually best to explore data with unsupervised tests like PCA, TreeMix or Admixture.

But even in them choice of populations to analyze is a very important matter.

Davidski said...

Use a lot to start with, and these days, there's a lot to choose from.

Samuel Andrews said...

Just had my first look at the D-stat spreadsheets. I never expected the type of results we got from Iran_Neo and Natufian. It explains why it was so hard to get reasonable models for Middle Easterners with the genomes we had before Lazardis 2016. Natufian is close to Anatolia_N and Iran_Neo is close to CHG but exactly what the relationship each grouping has with each is a mystery to me.

Levant_N looking like a mixture of Natufian and Anatolia_N and Caucasus/Agean fitting better as Anatolia_N+something else than Levant_N or Natufian+something else suggests Anatolia_N-like people existed in the Mesolithic/Paeloithic in Anatolia and surroundings.

Samuel Andrews said...

I did a pretty detailed analysis of ancient/modern Iran. Here's a summary of the results. There's a link to it below.

Iran_Late_Neolithic: 53% North(35% CHG, 15% Iran_Neo), 30% South(Anatolia_N or Levant_N), 17% South Indian(Would reduce if you take out out their Iran_Neo ancestry).

Iran_Chalcolithic: 50% South(Best represented by Anatolia_N), 45% North(Best represented by CHG), 5% South Indian

Modern Iran: 70% Iran_Chal, 10% South Indian, 20% LNBA European, 3-5% Siberian(?)

I'm confident South Asian admixture was in Iran since the Late Neolithic. The EHG/Steppe admixture in modern Iran looks real. When I use Srubnaya the results are always 15-20%. There's also extra South Asian in modern Iranians, maybe Indo Iranian languages arrived with a South Asian/Steppe hyprid population.

Samuel Andrews said...

I remember some of you earlier predicting Burosho or Balochiwill turn out mostly Iran_Neolithic with little Steppe admixture. It doesn't look that way. Most of my predictions about Iran_Neo were wrong to. Their affinities were impossible to predict.

I was confused by this idea because I had seen Burusho and Balchi mtDNA and they have some Steppe mtDNA like other South Asians. Using the spreadsheets provided by this thread they're scoring about 30% LNBA European. They have more Iran_Neo than other SC Asians, but they're definitely a similar mix as other SC Asians.

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Rob said...

Interesting about the earliest steppe influence being found in Armenia_MLBA. This dates 2300 - 1500 BC ? It's too late for Anatolian languages, but I wonder if it explains the appearance of Mitanni.

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Unknown said...

ryukendo, can we get some ashkenazi results?

Rob said...

@ Ryu

Thanks for clarifying. I read " also the first that shows direct European-like/Steppe-like contribution" as 'the first' ;)

If this wasn't the first, what period was ?

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Rob said...

Surely this kind of EEF + steppe which keeps showing up suggests something from the west Black Sea (the Anatolian Chalcolithic is c4000 BC) so refugees from Varna-Karanovo, or something C-T like; or north Caucasus- Majkop, are what initially come to mind.

Rob said...

"Looks like its influenced by Armenia_Chalcolithic (is there signs of Kura-Araxes influence so far west? )"

Can you clarify this question, Ryu ?

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Rob said...


"Since Armenia Chl is Kura-Araxes IIRC, what do you make of the large fraction of Armenia_Chl in Anatolia_Chl?"

It makes sense. The KA expanded into Anatolia from the South Caucasus Piedmont. It dates from 3300 BC, so contemporary to if not slightly anterior to Yamnaya; but both are 500 years after the beginning of Majkop; which I suspect is the source of what we're seeing

The KA phenomenon was culturally and economically diverse, this differs to CWC or Yamnaya, say. I think this would translate into multilinguality, but would one such idiom be some early IE? Why not

Rob said...

"Also, since Armenia_Chl already has so much ancestry from Europe, do you think it spoke Anatolian? Or was it Anatolia_Chl that was responsible, or even Armenia_EBA with its 10% Yamnaya Kalmykia?"

I wonder if this "European farmer" like ancestry comes from Black Sea steppe / Majkop, instead of directly from Europe

For language, we'd need to ponder more and get a few More data points
As I've often said, there might not be a simple , linear picture like the arrows in books ;)

Ryukendo K said...
This comment has been removed by the author.
Rob said...

I think the Barcin samples were too early (? Pre 5000 BC). Kumtepe 6 was also late Neolithic.
Kum 4 is 3200 BC, this perfect, but Dave wasn't able to analyse it

Yep something was happening...
I'll say about this more in due course

Rob said...

At some point will you look at the reverse: European aDNA (BB, Yamnaya, BA Hungary, CWC) in light of the new , near eastern data ?

Ryukendo K said...
This comment has been removed by the author.
MfA said...

Kura-Araxes is Armenia EBA, not copper age.

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Gökhan said...

David could you add Turkish_trabzon Dtats in your sheet or just write down here? I will be appreciated if you do.

MfA said...

I don't know they didn't mentioned which culture in the paper.

On the other hand EBA samples are from the first phases of Kura-Araxes.

Talin necropolis (Aragatsotn Province, Republic of Armenia)
The necropolis is located at the limits of the city of Talin, and is distributed on both sides of the Talin-Gyumri Higway. The Early Iron Age remains are found in the northwestern part of the necropolis (north-western limits of Talin), in a cemetery occupying about 3 kilometers squared2. The Early Bronze Age and Late Bronze Age cemeteries occupy around one kilometer squared2. Systematic archaeological excavations at the site have been conducted since 1984, and over one hundred tombs dating from the last quarter of the 4th millennium BC through Hellenistic period have been excavated. The Early Bronze Age is represented by a ritualistic enclosure and four tombs. These are dated to the first phase of the Kura-Araxes culture, which overspread the region in the second half of the fourth millennium BCE to the early part of the third millennium BCE. The tombs are earth and stone tumuli, 0.4-0.6 meters high, but differ in their construction, with some having been built within pits, and others at ground level. Burial 115 was excavated as part of a group of 12 tombs in 2014, during rescue archaeology prior to road construction during the North-South Corridor Highway project 34,35.
• TA3/R8 (I1658): 3347-3092 calBCE (OxA-31874, 4492±29 bp). Early Bronze Age I, Burial 115,
petrous bone from skull N1.

Kalavan-1 burial ground (Gegharkunik Province, Republic of Armenia)
Kalavan-1 is an open-air site 1,640 meters above sea level on the southwest slopes of the Aregunyats Range north of Lake Sevan, Northeast Armenia. Archaeological and geological investigations were conducted here between 2005 and 2009 as part of a collaborative Armenian and French project. The excavation revealed two main levels of occupation dated to the Terminal Palaeolithic, overlain by an Early Bronze Age Kura-Araxes burial ground. The total excavated area approaches 70 meters squared. Five burial pits were uncovered, of which four, referred to as UF1, UF2, UF8 and UF9,
contained single primary burials, while the fifth (UF5) is a multiple burial that held the remains of at least three individuals. Six consistent radiocarbon dates on human skeletal material from UF5, UF8 and UF9 span 2900-2400 BCE, during the later part of the Kura-Araxes cultural horizon, and this is the range we use for the undated sample. Stone heaps rising to approximately 0.7m in height marked the graves of the adults. These structures were oval-shaped with a major axis of 1 meter, reaching 1.7 meters above the multiple burial. The position of the body in the pits varied: sitting, tightly flexed, and flexed. Post-sepulchral recovery of skulls and long bones occurred. The adult burials were furnished with the same assemblage of black burnished pottery that has the strongest association with
the Kura basin ceramics and UF9 also contained bronze ornaments: a ring and a bracelet found near the skull. The child burial was in flexed position on its right side and was adorned with a neck ornament composed of dog molars and two stone beads, one of which was made of carnelian36,37. The two human remains (petrous bones) used in ancient DNA analyses came from the Early Bronze Age III period burials UF1 and UF9:

Rob said...

The KA falls between late Chalcolithic to "EBA", but really, it's a copper age culture technologically, not bronze.

Whatever the case, our copper age samples are c 4200 BC, thus Alikemek & Sioni horizons.

Hhhhhm. That's very early; too early for Majkop (heck earlier than Majkop); and the steppe were still just foragers.
Changes things a bit (thanks MfA).

Alberto said...

I read several papers about the Kura-Araxes origins (sorry, no bookmarks right here), and it was more or less clear that the people who would become the Kura-Araxes culture started to arrive to the area around 4200 BCE, even if the Kura-Araxes culture itself takes over around 3700-3500 BC. The Areni-1 cave samples seem to support this.

For the Archaeological context of those Armenia_ChL samples:

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Davidski said...

New datasheets with D-stats of the form D(Chimp,Rows)(Mbuti.DG,Columns) are now available at the links above.

So which of these sheets is the best?

Davidski said...


Yeah, I'll run them tomorrow.

Rob said...


Quickly- Did you include Anatolia Neolithic in the analysis of Armenian Chalc & Anatolian Chalc ? (Becuase there is no Anatolian_Neolithic in Anatolian-Chalc ?!)

Ryukendo K said...
This comment has been removed by the author.
MfA said...

David is it possible adding Kurdish samples as well?

Davidski said...


Which are the two Kurds in the Turkish set in the Human Origins?

Rob said...
This comment has been removed by the author.
Alberto said...


Thanks for those new sheets. I know that's a lot of work and time, so let's try to make it worth it.

On a very quick test I can confirm that there is a difference and not the strong pull towards Euro_HG that I was seeing before. And Spanish gets a bit of SSA now too.

I'll test and report whatever seems to stand out more. Let's see if we can figure this out.

MfA said...

Turkish Adana23113
Turkish Istanbul20040

David, If single sample is enough you can use the Adana, Istanbul one seems like a bit mixed.

Alberto said...


Yes, those Armenian_ChL samples are important to figure out what was going on around 4000 BC and what came after. I'll look at those with the new sheet too to cross check RK's findings.

Rob said...

Thanks all

Samuel Andrews said...

"massive influx of European Neolithic plus a few percents of Steppe/EHG in Anatolia Chalcolithic'

There was no influx of European Neolithic because Anatolia was it's ancestral homeland.

Aram said...


That people who arrive in South Caucasus circa 4500-4000 BC are known as Uruk migrants.
But as I told many times here I don't think they were from Uruk. This high level of EHG proves that I was correct.

Onur Dincer said...


Turkish Adana23113
Turkish Istanbul20040

David, If single sample is enough you can use the Adana, Istanbul one seems like a bit mixed.

There was also one person with Kurdish-like results in the Behar et al. Turkish sample set of Cappadocia, do you remember which one was that?

MfA said...


That was "tur182", Unfortunately Behar samples aren't available in Human Origins set.

Gökhan said...

David i compared DStats 3 and Dstats1b for Turkish sample

It seems ds1b gives better results then ds3 as far as ds1b gives east eurasian ancestry but ds3 not.

Samuel Andrews said...


In D-stats Natufian is as distant from East Asians as modern Egyptians. More distant from East Asians than Iran_N, even though Iran_N appears to have more Basal Eurasian. To me this means Natufians had African ancestry. Lazardis 2016 didn't find affinity between Natufian and a large collection of modern Africans though. Saying Natufians have descent from an ancient African population with little affinity to most modern Africans sounds crazy but is possible.

This is what I get for Natufian when I model them as Basal Eurasian+UP North Eurasian+Yoruba+Ulchi+Nganasan. I took out all columns with Middle Eastern ancestry and my Basal reference has a 0.3 score with all Eurasians.

@ A=0.015741

I did the same test with all pre-Metal age Middle Easterners. The only other one who scores in Yoruba was Levant_N with 2.3%.

Chad said...

The KA samples are 2900-2400, BA. That 4200 date on the CA Armenians is still 300 years after the start of Khvalynsk. It's possible there's no R1b south of the Caucasus before 2900BCE, as well. We'll have to wait and see.

Also, there is no SSA in Natufians. Stats show that. Iran also doesn't have more basal than Natufians. I believe that's on table 7.4 or 9.4. They're fairly equal. I'll check for ENA/Onge in Iran tonight.

Grey said...

"Maybe there was an intense interaction between the Yamnaya and the Balkan Neolithics that started creating such cultural dynamism all around the Black sea, as you said. A bit strange how such ancestry reached the caucasus without us noticing though, don't the Barcin and Kumtepe samples close off the historical window?"


Samuel Andrews said...

"Also, there is no SSA in Natufians. Stats show that."

If they have no SSA how do you explain D(Chimp, Natufian)(Mbuti, East Asia)=0.31 and D(Chimp, Modern Levant)(Mbuti, East Asia=0.33?

Gökhan said...

Dstat1b nmonte results fo Turkish sample

Armenia_MLBA 45.25
Iran_Chalcolithic 18.75
Anatolia_Neolithic 18.65
Baalberge_MN 4.15
Armenia_EBA 2.85
Han 2.70
Nganasan 2.55
Jordan_EBA 1.60
Munda 1.55
Satsurblia 1.10
Bougainville 0.45
Ulchi 0.40

Dstat3 nmonte results for Turkish sample

Armenia_Chalcolithic 52.80
Armenia_MLBA 21.15
Anatolia_Chalcolithic 12.30
Iberia_EN 4.65
Esperstedt_MN 1.55
Poltavka_outlier 1.00
Ami 0.95

DStat1b makes much more sense. In Dstat1b nmonte detected east euroasian ancestry around %6 which is almost Turkish average in several calculators. In my opinion you should discard Dstat3.

huijbregts said...

@ Davidski
So which of these sheets is the best?

In the sheets 1,2 and 3 the Denisovan and Neandertal rows are outliers in the first Principal Component.
As a consequence the higher dimensions get less of the variance and the PCA seems 'flatter'
Paste the next lines in a spreadsheet:

sheet PC1 PC2 PC3 PC4 PC5
1 0.960339 0.026818 0.004684 0.002693 0.002547
2 0.959983 0.026903 0.004821 0.002785 0.002509
3 0.960220 0.026428 0.005067 0.002703 0.002449
1b 0.862578 0.102225 0.012318 0.010410 0.004781
2b 0.861910 0.102503 0.012028 0.010730 0.004966
3b 0.863855 0.100077 0.012681 0.010254 0.005010

This spreadsheet gives the percentage of the variance in the first 5 Principal Components.
It is obvious that in the sheets 1, 2 and 3 the first PC steals variance from the higher dimensions.
This is undesirable.

Shaikorth said...

@Samuel Andrews also this from Lazaridis et al 2016's supplementary table 3:

Fst(Natufian-Mbuti)/Fst(Natufian-Papuan) = 0.9522

Fst(BedouinA-Mbuti)/Fst(BedouinA-Papuan) = 0.9778

Fst(BedouinB-Mbuti)/Fst(BedouinB-Papuan) = 0.9901

This ratio is 1 or more for non-Africans without African admixture, and even for some Near Eastern populations that have low amounts of it (Jordanians etc). 1.03 for both Anatolian and Iranian Neolithic. Natufian affinities seem unresolved. MA-1 is also Onge-shifted compared to EHG and WHG according to the paper's figures which is something that doesn't seem to have come up before.

For the king said...

Can you guys model the modern Iranian populations (Lor, Persian and Mazandarani) ?

Gökhan said...

I vote for Dstat2b. I got best fits from that datasheet.

Olympus Mons said...

@Aram and Alberto,
Yes. That is the problem.
We have DNA for the population that lived in southern Caucasus by 9th millennia BC (Kotias/CHG) and we have DNA for the guys arriving by 4.500 BC (Kura-araxes)... But not the ones in between. - The shulaveri-Shomu. They arrived by 8th millennia BC and got kicked-out by 5.000 BC -- No DNA. but they are the KEY. Let the record show.
Once we got them --- you will see a Match with bell beaker and the birth of M269.

Olympus Mons said...

... and also , Shulaveri gave (at least in part) the CHG and the levant to Yamnaya, diluting their EHG...

Olympus Mons said...

@Chad Rohlfsen,
" It's possible there's no R1b south of the Caucasus before 2900BCE..."

Would bet you there are buckloads of R1b (and M269) by 5000BCE in southern caucasus.
Its in the only population that has not been DNA sampled - The shulaveri-Shomu.
And you know what ... as per latest papers, their cattle and sheep came from Anatolia and not Iran...

Matt said...

@ Davidski, while it's not possible to add the new ancients as columns in the double outgroup sheet, would the following sets be possible at all, to run?

D(Mbuti, Pop, Iran_Neolithic, Levant_Neolithic) -
D(Mbuti, Pop, Loschbour, Levant_Neolithic) -
D(Mbuti, Pop, Eastern_HG, Iran_Neolithic) -
D(Mbuti, Pop, Loschbour, Eastern_HG) -
D(Mbuti, Pop, Loschbour, Israel_Natufian) -
D(Mbuti, Pop, Levant_Neolithic, Israel_Natufian) -
D(Mbuti, Pop, Iran_Neolithic, Kotias) -
D(Mbuti, Pop, Levant_Neolithic, Anatolia_Neolithic) -
D (Mbuti, Pop, Loschbour, Anatolia_Neolithic) -

I'm interested particularly in whether recent, Bronze Age and later, populations tend to be closest to WHG, Levant_Neolithic, Iran_Neolithic, EHG and also whether they tend to be closer to Natufian or WHG. It's an interesting question to me whether present day people in Europe are closer to the earliest farmers in the Levant, or to European HG (models under nMonte suggest closer to European HG).

Gökhan said...

@for the king:

Here you go


distance: 0,6453

Iran_Chalcolithic 51,10
Armenia_MLBA 31,35
Baalberge_MN 5,80
Munda 5,00
Jordan_EBA 4,25
Nganasan 2,15
Denisovan 0,15
Yoruba 0,10
Bougainville 0,05
Masai_Kinyawa 0,05

Iranian Mazandarani

Armenia_MLBA 46.45
Iran_Chalcolithic 42.60
India_South 4.90
Munda 3.55
Nganasan 0.95
Satsurblia 0.65
Armenia_EBA 0.50
Han 0.25
Karitiana 0.15


Armenia_MLBA 42.35
Iran_Chalcolithic 33.15
Jordan_EBA 12.10
Munda 5.75
India_South 2.55
Ulchi 1.30
Baalberge_MN 1.05
Nganasan 0.85
Denisovan 0.30
Han 0.20
Israel_Natufian 0.15
Neandertal_Altai 0.15

Alberto said...

I've been testing D-stats3 vs. Dstats-3b, because those 2 have the EHG in the columns and share the same columns overall. For now only with the basic models for Europe based on the "big 4" European ancestors + Yoruba and Ami for the extra bits.

My impression so far is that 3b gives better results, with much lower distances and better distributed residuals. It tends to favour Anatolia_Neolithic over Loschbour (I'll check about this with EEF and MN farmers from Europe), and does better with the SSA and ENA (IMO). Just 3 examples:

With D-stats3:

"Anatolia_Neolithic" 54.95
"Satsurblia" 14.75
"Loschbour" 14.25
"Eastern_HG" 12.45
"Ami" 3.6
"Yoruba" 0

With D-stats3b:

"Anatolia_Neolithic" 58.6
"Eastern_HG" 14.35
"Satsurblia" 13.95
"Loschbour" 10.15
"Ami" 2.1
"Yoruba" 0.85

With D-stats3:

"Anatolia_Neolithic" 39.95
"Eastern_HG" 22.15
"Loschbour" 18.8
"Satsurblia" 16.25
"Ami" 2.85
"Yoruba" 0

With D-stats3:

"Eastern_HG" 29.4
"Anatolia_Neolithic" 26.1
"Loschbour" 19.25
"Satsurblia" 15
"Ami" 10.25
"Yoruba" 0

With D-stats3b:

"Anatolia_Neolithic" 35.3
"Eastern_HG" 32
"Satsurblia" 14.2
"Loschbour" 10.5
"Ami" 8
"Yoruba" 0

I'll move to some ancients now and West Asia.

Alberto said...

Sorry, English Cornwall here with D-stats3b:

"Anatolia_Neolithic" 47.4
"Eastern_HG" 24.35
"Satsurblia" 15
"Loschbour" 12.1
"Ami" 1.15
"Yoruba" 0

Alberto said...


Yes, I agree. Those people could not have come from the south. The paper I linked above has detailed information about them, but I'm not great at the details about crops, pottery, etc... to have an informed opinion of where could they have come from. I'll look into the models soon.

Samuel Andrews said...

@For the King,

Here's a link with results for Iranians.

Modern Iranians fit well as Iran Chalcolithic+Steppe+South India. Could be pre-IE Iranians+proto-Iranian speakers.

For the king said...

@Gokhan and @Samuel Andrews

Awesome work guys! I wonder if the extra south Indian in Iranians came from Post BMAC Indo Iranians ? or from undiscovered Iranian Neolithic/HG populations ?

German Dziebel said...


I remember that Stuttgart was shown to be closer to Amerindians than it was to East Asians. Are these new aDNA samples still closer to Amerindians than they are to East Asians?

Ryukendo K said...
This comment has been removed by the author.
Alberto said...

Continuing with sheet 3 vs. 3b, trying to see why 3 favours Loschbour clearly over Anatolia Neolithic compared to 3b, some relatively easy 2 way admixtures with European Neolithic farmers:

With 3:

"Anatolia_Neolithic" 91.7
"Loschbour" 8.3

With 3b:

"Anatolia_Neolithic" 94.35
"Loschbour" 5.65

This one I think it's clear. 3b seems like the better model. But with others things are less clear.

"Anatolia_Neolithic" 87.75
"Loschbour" 12.25

"Anatolia_Neolithic" 90.55
"Loschbour" 9.45


"Anatolia_Neolithic" 77.6
"Loschbour" 22.4

"Anatolia_Neolithic" 85
"Loschbour" 15

"Anatolia_Neolithic" 74.55
"Loschbour" 25.45

"Anatolia_Neolithic" 76.25
"Loschbour" 23.75

It seems that AN + Loschbour work fine for LBK, but not so well for others, so things become more blurry. Looking at the residual from Esperstedt_MN which has the biggest difference (also in distance), D-stats3 is underfitting AN and overfitting Bichon, and D-stats3b is overfitting AN while Bichon is almost spot on.

So not too clear. I'd still lean toward 3b as slightly preferable overall in these cases, but would need more complex models maybe to have a more definitive answer (adding other WHGs maybe, I'll try to check that).

Chad said...

There's an unsampled group in North Africa that has West Eurasian mtDNA. The Iberomarusian. North Africa has an ancient West Asian population, which will likely be further from ANE than WHG. They will likely have BE too, as most of their mtDNA is like farmers.

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Davidski said...


Kurd has just posted what looks like very strong evidence that ASI ancestry, whatever that is, existed in Iran N.

Nah, we ran stats using Onge with the Neolithic Iranians against Kotias and Neolithic Anatolians using 400-500K SNPs.

The Anatolians and Kotias were both (insignificantly) closer to Onge, probably because they're less basal.

So it doesn't look like the ancient western Iranians had any ASI. The difference between them and South Central Asians with only around 12% ASI in this respect in D-stats is huge.

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Dilawer (Eurasian DNA) said...

@ David

"Nah, we ran stats using Onge with the Neolithic Iranians against Kotias and Neolithic Anatolians using 400-500K SNPs.

The Anatolians and Kotias were both (insignificantly) closer to Onge, probably because they're less basal."

You may want to re-check because I believe the Onge have <100K HO overlapping SNPs.

Also, I use only the highest coverage Iran_N sample. With Iran_Chl I use the highest 2 coverage samples.

The net Iranian shift is consistent for S Indians.

result: Kotias Iran_N Andamanese Chimp -0.0124 -1.046 2264 2320 34330
result: Kotias Iran_N Onge Chimp -0.0123 -1.008 2231 2286 34330
result: Kotias Iran_N Paniyas Chimp -0.0135 -1.132 2255 2317 34330
result: Kotias Iran_N Palliyar Chimp -0.0173 -1.474 2257 2337 34330
result: Anatolia_N Iran_LN Andamanese Chimp -0.0203 -1.816 1360 1416 20275
result: Anatolia_N Iran_LN Onge Chimp -0.0236 -1.959 1340 1405 20275
result: Anatolia_N Iran_LN Paniyas Chimp -0.0222 -1.902 1354 1415 20275
result: Anatolia_N Iran_LN Palliyar Chimp -0.0236 -2.045 1353 1419 20275
result: Kotias Iran_LN Andamanese Chimp -0.0307 -1.960 1306 1389 20275
result: Kotias Iran_LN Onge Chimp -0.0305 -1.873 1295 1377 20275
result: Kotias Iran_LN Paniyas Chimp -0.0282 -1.749 1304 1380 20275
result: Kotias Iran_LN Palliyar Chimp -0.0232 -1.493 1312 1375 20275
result: Anatolia_N Iran_Chl Andamanese Chimp -0.0125 -1.757 2471 2533 37427
result: Anatolia_N Iran_Chl Onge Chimp -0.0110 -1.487 2468 2524 37427
result: Anatolia_N Iran_Chl Paniyas Chimp -0.0132 -1.789 2459 2525 37427
result: Anatolia_N Iran_Chl Palliyar Chimp -0.0144 -1.964 2473 2545 37427
result: Kotias Iran_Chl Andamanese Chimp -0.0217 -2.019 2435 2543 37422
result: Kotias Iran_Chl Onge Chimp -0.0169 -1.534 2430 2513 37422
result: Kotias Iran_Chl Paniyas Chimp -0.0162 -1.452 2454 2535 37422
result: Kotias Iran_Chl Palliyar Chimp -0.0154 -1.419 2459 2536 37422

Davidski said...

There's an Onge set that has almost 600K overlapping SNPs with the Human Origins.

Davidski said...


The following would still be interesting to run though:
Chimp Iran_N Onge Dai
Chimp Iran_N Onge Ami
Chimp Iran_N Onge Korean
Chimp Iran_N Onge Ulchi
Chimp Iran_N Onge Eskimo_Naukan
Chimp Iran_N Onge Karitiana

We looked at these sorts of stats as well using a lot of markers. Iran_N is closer by something like 4-5 Z scores to Eskimos and Amerindians relative to the Onge. East Eurasians that don't have much ANE show Z scores of around 2.

Ryukendo K said...
This comment has been removed by the author.
Davidski said...

Gokhan & MfA

Here are those Turkish stats.

Rob said...

So something simply WHG appears to have admixed in Anatolian farmers ?
What about Natufians (modelling with basal ghost and earlier European UP)?

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
MfA said...

Thank you Dave.

@Krefter, Ryu

Which Dstats file have you used?

Ryukendo K said...
This comment has been removed by the author.
Aram said...


"""Maybe there was an intense interaction between the Yamnaya and the Balkan Neolithics that started creating such cultural dynamism all around the Black sea, as you said. A bit strange how such ancestry reached the caucasus without us noticing though"""

Yes that's strange that until now we didn't notice that. Recently I wrote in Anthrogenica about the Balkanic influence on Armenia. I notice that when analysing the Y DNA data meticulously. And the most amazing thing is that influence came straight from the North of Black Sea not via Anatolia.
Something happened when Steppe "touched" North Balkans/Carpathian region.

Aram said...

But I must say that archaeologists knew that.
Aegean connections of Trialeti culture. Cyclopian masonry of fortifications in Armenia, Greece and Crimea, coloured ceramics, Balkanic deities in Hayasa and other stuff. It was not massive but it is sufficient to explain some linguistic and archaeologic issues.

Davidski said...

I'll take a look at the Bell Beakers we have with TreeMix using the new samples from the ancient Near East.

Ryukendo K said...
This comment has been removed by the author.
Rob said...
This comment has been removed by the author.
Rob said...

Blogger Rob said...
@ Ryu

Thanks again some interesting findings. Some things which caught my eye, in addition

1) For BB;
I guess no one is surprised by the Yamnaya input, but the thing which catches my eye is the near absence of input from MN Germany. Rather this has been replaced by Copper Age Iberia. This is perplexing.

Whilst it might signify that ‘out of Iberia’ component long talked about, it might be an artefact of sample choice, and the tricks of terminology, again. I.e. Copper Age Iberia is more contemporary to BB _Germany than MNE_Germany, with the former dating as late as 2200 BC, whilst the latter is as early as 5000 BC.

The Hungarian input is not surprising, given its geographic & cultural centrality, and the fact that we have an early R1b from Vucedol. Neither is the “Morrocan connection’.

Maybe Frank can comment on this if he’s around

(NB : terms can be trick us. So we should always note absolute dates for historicity. Example: 3500 BC would be “Copper Age” in Hungary, “Eneolithic” in Austria, “terminal Neolithic” in Greece, and “proto-Bronze Age” in the steppe or Bulgaria).

2) Nothing too shocking in Copper Age Iberia (which dates to 3200 – 2200 BC, depending on def.) The latter half is contemporary with BB phase, and the current samples show continuity from the middle Neolithic western European milieu (although these Copper Age Iberian samples aren’t actually from Beaker contexts).

3) EBA Hungary

Did you include Starcevo or LBK here ? If so, its preference for MN Germany is notable, somewhat surprising at first glance, but it does make sense (because the Balkan Neolithic collapsed, and I suspect new Neolithic ancestry came from central Europe – where the earliest proto-Boleraz assemblages are found (Slovakia/ nth Hungary; and also corroborated by increasing appearance of I2a2).

20% steppe influence is consistent with previous estimates.

What happens if you throw Co-1 from Baden into this also ?

4) The modelling of LBK & Hungary EN. I am confused by the use of Iberia EN. Isn’t this ‘ahistoric” ? Iberia EN came after LBK and Hungary EN, chronologically & spatially.

For here, i think it would be more important to see what meix of barcin/ Kumtepe 6 (ie inland NW Anatolia) vs Greek Neolithic ('seaborne route') does. Same for Iberia EN.

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
Rob said...
This comment has been removed by the author.
Alberto said...

Not to bore with models comparing D-stats3 vs. d-stats3b, just a summary. The SSA admixture I thin kit's quite clearly better handled by 3b. For example, Palestinians with 3b get 3.6% Esan (with Natufian included in the pops) while with 3 they get 1.6% and undefitted Yoruba column.

Then there is the tendency of a bias toward Euro_HG and ENA that I first noticed. Not always easy to say which is more correct, though in general the distances in 3b are quite lower and the residuals better balanced. For example, a model for Bell Beaker with Yamnaya and MN + WHG, with 3 they get 51% Yamnaya and a distance of 0.014524, while with the same populations with 3b they get 43.5% Yamnaya + 3.5% Satsurblia, which seems more in line with what we've seen before, and the distance goes down to 0.0043 (BTW, also they get 0.75 Yoruba with this last model, while in the other they don't, but Yoruba column stays underfitted).

So overall I didn't find any model that with D-stats3 looks clearly better, but I did find many that with 3b do look clearly better. So for now, unless someone else is seeing something different, I'll stay with 3b for the models I post. I have RK's first models posted above for some comparison, so I hope that will suffice.

Rob said...
This comment has been removed by the author.
Rob said...


Yep, what we're seeing with BB makes sense, and certainly has ramped up the former models. I suspect we'll see similar things with other periods, esp when we start getting later Bronze Age & Iron Age samples in the future.

About LBK, etc, I see; it wasn;t your main aim to uncode the individual components of the central European neolithic. But I think this, too, would be interesting- which it prefers out of Greek vs Anatolian, same with EN Iberia, in the future. I think the actual Greek Neolithic paper attempted a similar thing.

Davidski said...


huijbregts said...

@ Ryu
I agree with Rob that Iberia_Chalcolithic is an odd reference population for Bell_Beaker_Germany.
Especially since you yourself found 20% Baalberge_MN+Esperstedt_MN in Iberia_Chalcolithic; that is a heavy Bell Beaker smell.
By the way which Dstats sheet did you use? I hope it was not D-stats1.

Ryukendo K said...
This comment has been removed by the author.
Alberto said...

A first look at the Armenia_Chalcolithic samples. Starting with this model:

"Iran_Chalcolithic" 44.25
"Anatolia_Neolithic" 31.6
"Eastern_HG" 16.05
"Satsurblia" 6.65
"Israel_Natufian" 1.45
"Hungary_HG" 0
"Loschbour" 0
"Esan_Nigeria" 0
"Esperstedt_MN" 0
"Iran_Neolithic" 0
"Motala_HG" 0
"Ami" 0
"Levant_Neolithic" 0

So here the high EHG does appear. The paper has it best modelled as 52.5% Anatolia_N, 29.2% Iran_N and 18.3% EHG. Iran Chalcolithic is older than Armenia_Chalcolithic (going back to 4800 BC), but trying with it:

"Anatolia_Neolithic" 49.2
"Iran_Neolithic" 21.55
"Eastern_HG" 16.65
"Satsurblia" 12.6

This comes relatively close to what the paper shows, and still showing the Iranian input. But to check what is Iran_Chalcolithic:

"Iran_Neolithic" 44.35
"Anatolia_Neolithic" 38.8
"Satsurblia" 14.85
"Ami" 1.2
"Eastern_HG" 0.8
"Hungary_HG" 0
"Loschbour" 0
"Esan_Nigeria" 0
"Esperstedt_MN" 0
"Israel_Natufian" 0
"Motala_HG" 0
"Levant_Neolithic" 0

Like a mix of Iran_N and something more northern, from Anatolia/Caucasus area. The mystery is where did this high EHG come from.

And what I'm not seeing is the high "European" ancestry in the above RK's models. The 20% Esperstedt_MN doesn't show at all. Adding Iberia_EN and the ahistorical Bell_Beaker and Afanasievo:

"Iran_Chalcolithic" 40.65
"Anatolia_Neolithic" 31.55
"Afanasievo" 12.6
"Eastern_HG" 9.8
"Satsurblia" 3.5
"Israel_Natufian" 1.8
"Esan_Nigeria" 0.1
"Hungary_HG" 0
"Loschbour" 0
"Esperstedt_MN" 0
"Iran_Neolithic" 0
"Motala_HG" 0
"Ami" 0
"Levant_Neolithic" 0
"Bell_Beaker_Germany" 0
"Iberia_EN" 0

It does take Afanasievo, but still keeping a good part of EHG. No Bell Beaker or Iberia_EN. So Basically these samples look a 3 way mix of something Iranian, something Anatolia/Caucasus and something EHG.

Rob said...

Thanks Alberto

In Ryu's model yesterday, I was very surprised to see the absence of ANF in Anatol-Chalcolithic, but thought it within the realm of possibility, as the "Western Farmer" population did appear to have shifts & declines in population. The migration of Balkan like farmers was perplexing too, as I;d not ever imagined that.

What does Anatolian Chalc look like ?

With the EHG in Armenian Chalcolithic, it prefers Afansievo & "EHG" over group like Khvalnysk or Yamnaya ?

Ryukendo K said...
This comment has been removed by the author.
Kristiina said...

"No use on carrying coals to Newcastle"
In the end, it is somewhat amusing if it turns out that proto-IE was not spoken on the steppe or Caucasus or Near East but was spoken in a culture such as Vinča culture (c. 5700–4500 BC) which provides the earliest known example of copper metallurgy, or Globular Amphora Culture ca. 3400–2800 BC in the proximity with Vinča culture. Maybe it is in Globular Amphora Culture that we will see R1a1 and R1b together.

Of course, this is only one of the several possible Europe-centered models. In the future it will be rejected, modified or confirmed as other models.

Karl_K said...


"this analysis seems to suggest that there was a movement from Iberia"

I feel like many people here have been saying this for a very very very long time. This is expected. The Bell Beaker people have never fit well with a 2 population model.

I really like your analysis, but who is surprised?

Alberto said...


Yes, it seemed to me that the very big shift towards "Europe" was at least partially due to a technical problem. Things look more balanced with this other sheet.


Yes, if I add Samara_Eneolithic it takes a good part of EHG:

"Iran_Chalcolithic" 41.2
"Anatolia_Neolithic" 31.65
"Afanasievo" 11.15
"Samara_Eneolithic" 9
"Eastern_HG" 3.05
"Satsurblia" 2.65
"Israel_Natufian" 1.2
"Esan_Nigeria" 0.1

Not surprising since Samara_Eneolithic is very EHG (but a mix of 3 different samples, so I'm not usually including it as a source).

For Anatolia_Chalcolithic (pops that get 0% not shown):

"Anatolia_Neolithic" 55.25
"Satsurblia" 23.95
"Iran_Chalcolithic" 7.75
"Eastern_HG" 6.35
"Levant_Neolithic" 5.65
"Esan_Nigeria" 1.05

By the 1% SSA and the not great distance it seems that the sample is a bit noisy (it is only one and low coverage). But otherwise it looks quite less "Iranian" and "EHG" than Armenia_ChL. If we had Kotias instead of Satsurblia it would probably be mostly Anatolia_Neolithic and CHG, but probably still with some bit of extra EHG. Adding Armenia_ChL to the source pops (historically correct, since they are a few centuries older):

"Anatolia_Neolithic" 45.55
"Armenia_Chalcolithic" 34.6
"Satsurblia" 18.4
"Esan_Nigeria" 1.1
"Eastern_HG" 0.25
"Levant_Neolithic" 0.1

Ryukendo K said...
This comment has been removed by the author.
Rob said...

Expected but it needed to be demonstrated.

Rob said...

Koodos Ryu


Thanks. It's curious that Armenian Chalcolithic prefers Afansievo and older type EHG over nearer Yamnaya Kalmykoa, but again chronology could be responsible.

More importantly, if there was steppe type input in Chalcolithic Armenia (4000 BC); we should bank on it being present in Late Neolithic Eastern Europe (eg C-T, post Varna groups); & the Baltic.

About Anatolian Chalcolithic: Hhhmm
That extra CHG again. What does that mean if it prefers archaic CHG over more contemporary choices ?

But basically massive input from / via Armenia .

Alberto said...


I had only added Afanasievo as an ahistorical steppe population because that's what showed up in RK's model above. Yamnaya_Kalmykia does work better and takes more EHG when added:

"Iran_Chalcolithic" 40.1
"Anatolia_Neolithic" 29.75
"Yamnaya_Kalmykia" 20.7
"Eastern_HG" 5.9
"Israel_Natufian" 1.85
"Satsurblia" 1.7

But these samples are some 1500 years after the Armenia_ChL ones. And also in the paper's f3 stats for admixing populations, Armenia_ChL didn't show sings of recent admixture, which makes it more mysterious (though it could be a technical problem, but I have no reason for thinking it is).

In any case, yes, probably the EHG-CHG like admixture (or Yamnaya-like) was present along the Black Sea long before Yamnaya. It would be interesting to get samples from Varna, C-T and other cultures from the area to see if it had reached there already. Also awaiting those Globular Amphora ones.

Kristiina said...

Are you so interested in Armenia_Chalcolithic (4300–3400 BCE), because your idea is that the Anatolian IE branch was introduced by people who brought EHG to Armenia 4300 BC? As many of you connect yDNA R with proto-IE, do you think that it was R1b men who brought EHG to Armenia 4300 BC, but it just accidentally happened that the Chalcolithic samples were L1a and the later sample with more WHG and less EHG was R1b and not the yDNA that in reality decreased the steppe affinity? Of course this is not impossible.

In any case, if you look at this map: you see that proto-IEs may have been more Balkans-shifted than previously presumed.

Olympus Mons said...

Are you reading ryukendo kendow n BB? - Oh, yes i am here, and I already have the popcorn.

Olympus Mons said...

@ryukendo kendow,
I don't know who you are, But if ever drop by Lisbon, just send me an email. Lunch is on me!

Rob said...

Individual lineages are of secondary importance, but are still relevant. But really, what matters more is full analysis of all regions.
Armenia is highly interesting for 2 reasons; we have a darn good view of it now; from Neolithic to Iron Age. So that's an amazing overview to contribute toward a pan-Eurasian understanding

Secondly, South Caucasus position is a connection between Anatolia-Balkans, the steppe and Central Asia.

MfA said...

There is also U4a in Armenia_ChL, steppe marker.

Armenia_ChL could be a higly drifted tribe, That's probably why doesn't show recent admixture. AFAIK noone checked IBS between ChL samples yet. All males are L1a, some of them even could be 1st degree relatives according to carbon dates and two K1a8, doesn't seem like there was much diversity.

Olympus Mons said...


On southern caucasus, you do not have the most important "people". From 8th millennia (when they arrived) to end 6th millennia (when they were kicked out). completely.
Prior to them was CHG after them was a mix of lots of different people that form the calcolithic and the Bronze age you see in here. but not "them".

Olympus Mons said...

So, BB did had some SSA? I thought it was a "defect" of sampling. :)

Not only do they had SSA, but they picked up with L3 women in Egypt (merimde and el-omari)

Recap "real" history from Shulaveri2BellBeaker.
1- 7th millennia in southern Caucasus as Shulaveri-Shomu, where M269 was born, apparently coming from Anatolia (because of cattle and goats DNA).
2- By 4.900 “they” were completely kicked out – by then they were a mix of EHG, CHG and Anatolia Neolithic. All their settlements were abandoned and some have a layer of ashes to the one that replace them (sioni going to Kura araxes) with different pottery, different architecture, etc.
3- By 4.800 BC they where in tell tsaf north Israel. So suppose the place they were kicked out to by Ubaid or L1a from Iran, was to west and that is why the 2 places with higher variance of r1b is the eastern Anatolia and …. The place in Armenia where that r1b was just found, near sevan lake.
4- By 4.700 BC they were in Nalchik north Caucasus, and so forth that is why Yamnaya is so close to bell beaker. They, the Shulaveri, diluted the EHG in them and gave them CHG and Levant DNA.
5- By 4.700 BC were settling heavily in Merimde and el-Omari in the Nile delta in Egypt, and having cattle binge parties in Fayum, near the lake. Was L51 born there?
6- 4.000 BC Again as the same as with Ubaid, the pre dynastic pharaonic Egypt with the crazy Badarian on south Egypt moving north, kicked them out.
7- By 3.700 BC were arriving to Iberia, kicked out by the 5.9 kiloyear climatic event that made the Sahara desert, along side with berber (E1b1) guys.
8- By 3.300 were amassing in large cities in Iberia, porto torrão in the lowlands of Iberia as big as Ur city.
9- By 3,000 where building the Zambujal city where the bell beaker actually arised.
10- By 2.700 BC had crossed the pyrenes. … And that is the bell beaker story.

Isn’t it what the DNA is telling? – At least they told every one. When Periplus, the 700BC greek mariner met them there. They told him, who they were.
“We are the people had have been living here for a long time, but were kicked out of our homeland (southern caucasus) by an attack of serpents (Ubaid/uruk). … We are the Oestrimni!

See chapter – Those o fled the serpents.

Davidski said...

So, BB did had some SSA? I thought it was a "defect" of sampling.

Not of sampling, but of post-mortem deamination damage.

Some of the Bell Beaker samples aren't UDG treated, and this is often expressed as very minor Sub-Saharan admixture.

Olympus Mons said...


"...Since I was most weirded out by the Armenia MLBA (4%) and Moroccan (.5%) percentages, I then dropped these two..."

Hey, don't really drop them so quickly... they (Bell beaker Stock of people) were in Armenia up until 4.900BC and went by Morocco by 3.500 BC... that is where Gibraltar straight is. :) - So, do you really have to drop them?!

Kristiina said...

Rob, yes, I agree. There are so many languages spoken around Caucasus that it is not really at all easy to sort out the linguistic history in the area.

Maybe the first metallurgists in the Balkans spoke a Northwest Caucasian type language and the Caucasian substrate in proto-IE comes from there. In my model above, Globular Amphora Culture spoke proto-IE and replaced the earlier Corded Ware language in the North. Corded Ware area overlaps in the east with the Uralic area and that could explain the similarities between Uralic and IE languages.

In any case, I think that the carriers of the expansive languages must have had a technological/ political advantage with respect to groups speaking other languages.

Kristiina said...

Globular Amphora Culture is interesting from the IE point of view:

"A further highly interesting aspect is the connection between some human burials and cattle burials or deposits. In particular, regularly observed deposits consisting of two animals in antithetic crouched position, which are widely interpreted as a harnessed bovine team, seem to be characteristic of the time period for the GAC. These findings underline the extraordinary status enjoyed by domestic animals, which is often used to argue that the agricultural practices of the GAC were mainly based on cattle breeding."

"What were the reasons for the GAC’s integration with other local groups and its widespread expansion? One possibility could be the desire to access local raw materials, such as salt, amber, copper or flint. Or perhaps it was the complementary system of agriculture? The agricultural system in question permitted the opening up of previously unpopulated areas with less fertile soils. With the climatic decline, it offered the local cultural groups the acceptable alternative of subsistence agriculture, which then caused the further expansion of the GAC in those regions."

It seems that also wheel was present in Bohemia/Moravia at the same time:
The number of inhabitants started growing with the spread of new agricultural techniques c. 3500-2000 BCE (the wheel, the lister or sulky plough cattle breeding).
Central Europe in the High Middle Ages: Bohemia, Hungary and Poland, c.900–c p. 45

According to Wikipedia: "The first evidence of wheeled vehicles appears in the second half of the 4th millennium BCE, near-simultaneously in Mesopotamia (Sumerian civilization), the Northern Caucasus (Maykop culture) and Central Europe (Cucuteni-Trypillian culture), so the question of which culture originally invented the wheeled vehicle is still unsolved.

The earliest well-dated depiction of a wheeled vehicle (here a wagon — four wheels, two axles) is on the Bronocice pot, a c. 3500 – 3350 BCE clay pot excavated in a Funnelbeaker culture settlement in southern Poland."

"Cow" and "wagon" are maybe the two most important words reconstructed into the proto-IE, and they are attested early in this area.

Matt said...

@ Davidski: Thanks for these -

I see you've already noticed that:

D (Mbuti.DG Ami Iran_Neolithic Levant_Neolithic) = -0.0134 -3.216 19361 19888 394093

D (Mbuti.DG Munda Iran_Neolithic Levant_Neolithic) = -0.0185 -4.66 19200 19922 394093

implies a stronger connection to ENA and particularly Munda as ENA+South Indian for Iran_Neolithic than is present for Levant Neolithic.

The strongest stats for Iran_Neolithic vs Levant_Neolithic are:

D (Mbuti.DG Iran_Late_Neolithic Iran_Neolithic Levant_Neolithic) = -0.075 -11.972 10330 12005 220049

D (Mbuti.DG GujaratiC Iran_Neolithic Levant_Neolithic) = -0.0244 -6.694 19318 20283 394093

then the recent Mediterranean and Early European Farmers are at the other end.


D (Mbuti.DG Samara_Eneolithic Iran_Neolithic Levant_Neolithic) = -0.0022 -0.417 15359 15427 302806

D (Mbuti.DG Yamnaya_Samara Iran_Neolithic Levant_Neolithic) = -0.0083 -2.124 19878 20212 393394

Right direction but not very significant or relatively weak, poss because dominance of EHG ancestry.

Also interesting:

D (Mbuti.DG Masai_Kinyawa Levant_Neolithic Israel_Natufia) = 0.0018 0.577 9717 9682 238817
D (Mbuti.DG Somali Levant_Neolithic Israel_Natufian) = 0.0032 0.986 10119 10055 238817

So doesn't seem like there is significance to Levant_Neolithic being more related to modern East Africans than Natufians are.

Also on

D(Mbuti.DG Pop Levant_Neolithic Israel_Natufian) shows most extreme highest for EEF, not WHG, which suggests that Levant_Neolithic does not work as WHG+Israel_Natufian, and there is important extra shared drift beyond that mix that favours Anatolians.

Generally, the stats for D(Mbuti.DG Pop Ancient1 Ancient2) for The Four (Levant_Neolithic, Iran_Neolithic, EHG, WHG), plus also Anatolia_Neolithic seem to show that drift sharing for moderns is greatest with Loschbour

(even though this is small compared to the sharing of Loschbour with other members of the WHG clade).


D (Mbuti.DG Spanish Loschbour Levant_Neolithic) = -0.0404 -10.932 18815 20401 388452
D (Mbuti.DG Sardinian Loschbour Levant_Neolithic) = -0.0233 -6.238 19197 20112 388452

but then even

D (Mbuti.DG Spanish Loschbour Anatolia_Neolithic) = -0.0111 -3.467 24662 25217 501891

D (Mbuti.DG Sardinian Loschbour Anatolia_Neolithic) = 0.009 2.875 25221 24769 501891

Comparing D(Mbuti.DG Pop Anatolia_Neolithic Levant_Neolithic), the top populations, most related to Anatolia compared to Levant (judging by Z), were:

D (Mbuti.DG LBK_EN Levant_Neolithic Anatolia_Neolithic): 0.0351 14.885 23077 21510 453450
D (Mbuti.DG Lithuanian Levant_Neolithic Anatolia_Neolithic): 0.0333 14.902 22682 21220 454017
D (Mbuti.DG Sardinian Levant_Neolithic Anatolia_Neolithic): 0.0331 15.303 22814 21353 454017

Less significant for others.

Kristiina said...

I correct myself again!

"The initial period of the Bronze Age is represented by Yamna cultural and historic community. Comparison of radiocarbon dates of the two main areas of this community, the western (territory of Ukraine) and the eastern (the Volga River and Ural regions), confirms the hypothesis about the eastern origin of Yamna culture. The western area of Yamna cultural and historic community covers the period from 3000 to 2300 BC, while the eastern one covers the period from 3500 to 2900 BC. The eastern origin and the further expansion to the west of the bearers of Yamna culture is also confirmed by the data on funeral customs and inventory."

Yamna Samara yDNA is mainly R1b-Z2103 and one R1b-L23. R1b-Z2103 is not typical for IE speakers.

On the basis of the current evidence, Globular Amphora Culture starts in Kujawy Region Poland 3400 BC (with only a differemce of 100 years to Yamna samara), and c. 2900 BC it transforms into Corded Ware "in a number of "centers" which subsequently formed their own local networks" (Wikipedia.)

According to Woidich, 2014, "The replacement of the Globular Amphora culture by another supraregional cultural complex—the Corded Ware culture—is already indicated by the trend towards cord decorations in its younger stage of development. The transition thereby did not occur abruptly but rather within in a gradual process. This process is reflected both in culturally mixed inventories36 and in transformation phenomena. In the second quarter of the 4th millennium the Corded Ware complex spreads successively into all regions occupied by the Globular Amphora culture. The Corded Ware culture might have benefited from the established large-scale communication network between the Dnieper region in the east and eastern Holstein in the west during its expansion."
It may well turn out that R1a1-M417 and R1b-L51 are found in the Globular Amphora culture.

The question is: can we find a culture from which Yamna Samara and Globular Amphora culture could be derived? We have Khvalynsk and Majkop but I do not know if they are/will be genetically or culturally a good fit. In any case, considering the probable Ural region origin of Yamna, it is not a surprise that Volga Uralics are genetically more Yamna than IE speakers.

I am sorry! I put my previous post in a wrong thread.

Simon_W said...

Very interesting those analyses run by rk.

I first tried to connect the different language families of the Caucasus with the different genetic influences. Northwest Caucasians (Adygei, Abkhasians) seem to be predominantly an equal mix of (or intermediate between) Anatolian and Armenian Chalcolithic. While South Caucasians (Georgians) have much more Anatolian than Armenian Chalcolithic. They also have a lot of Armenia_MLBA, but this they have in common with the Northeast Caucasian Lezgins who have even more of this. So I would tentatively associate Armenia_MLBA with Northeast Caucasian, even though the other Northeast Caucasians, the Chechens have only 3.1% Armenia_MLBA (elite dominance influence?). Armenians resemble South Caucasians in their predominance of Anatolian Chalcolithic over Armenian Chalcolithic, but they differ most of all by the strong Iranian Chalcolithic impact. They don't seem to have significantly more steppe ancestry than the Abkhasians and not more EEF ancestry than Georgians.

What I find striking is the apparent correlation between Armenia_MLBA and R1b! As per Eupedia Lezgins have 21.5% R1b, Chechens have 2%, and Georgians 10% (but with some strong local pockets where they even top the Lezgins in R1b frequency, according to a study I've seen). Judging from rk's analysis Lezgins have 38.45% Armenia_MLBA, Chechens 3.1% and Georgians 32.9%.

This gets even more fascinating when taking the German Bell Beakers into consideration. They seem to have 8.9% Armenia_MLBA that was neither in Yamnaya, nor in Iberia_Chalcolithic, nor in Hungary_BA. That's amazing! And 8.9% is surely too much to be a fluke or a result of deamination. So where did this come from? My guess: From Iberia, where it must have arrived shortly before.

Simon_W said...

I'm not sure what to think of the linguistic hypotheses trying to link Basque with Northeast Caucasian. But prima facie there seems to be some intriguing evidence, check this out:

Especially the comparison with the mystery languages is interesting:

Of course if there is a relationship it could hardly go back to the time of the Armenian_MLBA, that would seem much too recent.