Sunday, March 5, 2017

Scythians and Sarmatians in the Global 10

The Global 10 datasheet now includes the new Scythian and Sarmatian samples from Unterländer et al. 2017. They're freely available at the Reich lab datasets page. Here they are on the Global 10 genetic map.

This may have been pointed out in the paper, but what I find intriguing is that the Scythians from the Zevakino-Chilikta group look somewhat different from the rest, because instead of falling on the Europe-Siberia cline, they fall on the Europe-Central Asia cline. Not sure what that's about yet; might be worth investigating.

Nirjhar007 said...


The Scythian paper showed that Scythians cannot be modeled as LBK + Yamnaya, but as "Yamnaya" + East Asian.
If so, it means Z93 was already present further southeast.

Davidski said...

It might mean that Z93 moved out into Asia before the ancestors of Andronovo, Potapovka, Sintashta and Srubnya acquired EEF.

But those eastern Scythians are Z93(Z2124+), same as Sintashta, which suggests that either there were Sintashta groups identical to Yamnaya, like the early Baltic Corded Ware, or Z2124 was acquired by the eastern Scythians, who may have been mostly of Afanasievo origin, via a founder effect and little autosomal admixture.

Davidski said...

But I haven't tried modeling the eastern Scythians yet.

Nirjhar007 said...

Will be fascinating to see how Indians do , among others of course . I sense Z2124 is the Northern Branch of Z-94 . Further aDNA should clear it up.

Nirjhar007 said...

But its a shame that Tarim genome is not available .

Ryukendo K said...

After running the Scythian genomes, I simply do not think the authors' conclusions are correct. I think the authors are misled due to the overall autosomal similarity to Yamnaya and Afanasievo caused by disproportionate EHG/ANE ancestry in the Scythians, plus some Middle Eastern/W Asian/SC Asian ancestry, and a slice of 'old' MA-1 or outright ASI, in the Scythian samples, combining to give a very West Asian look to the genome. In run after run, the Scythians' IE steppe ancestry breaks down to Andronovo, Sintashta and Srubnaya, and Afanasievo and Yamnaya simply never appears despite always being present.

Using the following list as sources, with sequential exclusion:


Ryukendo K said...

The western Scythian immediately produces a very good model, so I did not go further:

[1] "distance%=0.108 / distance=0.00108"


Srubnaya 36.10
Samara_Eneolithic 23.25
Armenia_EBA 16.35
Ket 5.85
Mongola 5.20
Armenia_Chalcolithic 3.25
Karelia_HG 3.00
Iran_Hotu 2.50
Hungary_N 2.30
Chamar 1.00
Nganasan 0.65
Koryak 0.50
Poltavka_outlier 0.05

Ryukendo K said...

The high levels of Samara Eneolithic is a common theme in all the Scythian samples, if I drop them Afontova Gora or Samara/Karelia_HG immediately appears. So is a fraction of Armenia_EBA, Armenia_MLBA or Iran Chalcolithic ancestry.

For Scythian AldyBel:
[1] "distance%=0.1645 / distance=0.001645"


Andronovo 21.45
Karelia_HG 21.45
Nganasan 16.20
Mongola 15.95
Armenia_EBA 11.20
Pima 7.20
Kotias 2.55
Chamar 2.20
Ket 1.35
Iran_Chalcolithic 0.40

No Pima:
[1] "distance%=0.1652 / distance=0.001652"


Andronovo 23.10
Karelia_HG 21.20
Mongola 18.15
Nganasan 17.00
Armenia_EBA 8.20
Ket 4.35
Kotias 4.30
Chamar 2.90
Iran_Chalcolithic 0.80

No Karelia_HG and Chamar:
[1] "distance%=0.2248 / distance=0.002248"


Andronovo 40.80
Ket 16.55
Mongola 14.45
Koryak 13.15
Samara_Eneolithic 6.40
Kotias 4.45
MA1 2.00
Paniya 1.70
Nganasan 0.40
Iran_Late_Neolithic 0.10

Aram said...

The IA Hungary from Mezocsat was probably tge first wave of Scythians in Europe with Hg N.

Ryukendo K said...

The last model for Aldy Bel has a good fit with the history and archaeology I would think.

For the Pazyryk:
[1] "distance%=0.2756 / distance=0.002756"


Ket 30.75
Ulchi 22.25
Mongola 14.70
Karelia_HG 12.50
Iran_Hotu 10.35
Hungary_N 7.65
Pima 1.65
Iran_Chalcolithic 0.05
Saami 0.05
Samara_HG 0.05

Pushing out all the 'old' HG samples like Iran_Hotu and Eur HGs:
[1] "distance%=0.3099 / distance=0.003099"


Ket 32.85
Ulchi 22.75
Sintashta 16.40
Mongola 14.30
Armenia_MLBA 4.40
AfontovaGora3 4.20
Iran_Chalcolithic 2.85
Chamar 2.25

[1] "distance%=0.3017 / distance=0.003017"


Ulchi 29.10
Ket 28.45
Samara_Eneolithic 12.05
Mongola 10.65
Sintashta 6.30
Iran_Chalcolithic 6.25
Hungary_N 2.80
Chamar 2.20
Samara_HG 1.75
Armenia_MLBA 0.45

Chad Rohlfsen said...

Add Afanasievo. I'm checking all angles too.

Chad Rohlfsen said...

Afanasievo plus Ket and Nganasan would make more sense in the East. Add that to Andronovo.

Ryukendo K said...

The Central_Asian shifted sample is very interesting:
[1] "distance%=0.114 / distance=0.00114"


Ulchi 21.85
AfontovaGora3 18.15
Armenia_EBA 15.65
Srubnaya 11.55
Hungary_N 7.80
Mongola 7.25
Poltavka_outlier 5.80
Daur 3.85
Chamar 3.15
Ket 3.05
Sintashta 1.85
Karelia_HG 0.05

Pushing out the 'old' stuff again:
[1] "distance%=0.1833 / distance=0.001833"


Ulchi 32.85
Srubnaya 22.40
Samara_HG 11.40
Armenia_MLBA 6.90
Kotias 6.25
Chamar 5.35
Hungary_N 4.00
Iran_Late_Neolithic 4.00
Ket 3.05
Sintashta 2.95
Daur 0.85

Now running it with Karasuk and Okunevo. The results are quite pleasing:
[1] "distance%=0.1578 / distance=0.001578"


Ulchi 29.20
Armenia_EBA 21.30
Samara_HG 15.25
Okunevo 10.55
Srubnaya 7.90
Sintashta 7.15
Chamar 4.15
Daur 2.10
Hungary_N 1.90
Paniya 0.50

[1] "distance%=0.1704 / distance=0.001704"


Ulchi 30.35
Srubnaya 16.70
Armenia_EBA 15.45
Okunevo 12.35
Sintashta 11.50
Karelia_HG 6.65
Paniya 2.95
Iran_Late_Neolithic 2.10
Chamar 1.20
Kotias 0.55
Ket 0.10
MA1 0.10

[1] "distance%=0.2756 / distance=0.002756"


Ket 30.75
Ulchi 22.25
Mongola 14.70
Karelia_HG 12.50
Iran_Hotu 10.35
Hungary_N 7.65
Pima 1.65
Iran_Chalcolithic 0.05
Saami 0.05
Samara_HG 0.05

Ryukendo K said...

So common themes: even though Afanasievo and Yamnaya was present in all runs, it was not represented in any of the Scythians. The scythians show pronounced tendency to favour one of the five groups:

1) Sintashta, Srubnaya, Andronovo, Poltavka Outlier
2) Afontova Gora, Karelia/Samara_HG, Samara Eneolithic, Amerinds, Koryak
3) Ulchi, Ket, Nganasan, some get more Southerly populations like Daur and Tu (Mongolics at the Steppe-Sown frontier in China) but never Korean, She, Han_NChina etc.
4) Armenia_EBA, Iran_Chl, Iran_LN, Armenia_LNBA
5) Chamar and Paniya ~5% or ~.5% respectively.

I think the combination of 1, 2, 4 and 5 create an overall profile that strongly resembles Yamnaya due to strong 'Eastern' and West Asian shift, but does not in fact reflect reality. David, the only way to clearly solve this is with rare allele analysis and IBD based methods, the former can be used on low quality genomes too. Could you try this?


Samuel Andrews said...

The results I'm getting for the Sycthians are; Siberia+Andronovo/Sintashta+CHG/Iran Neo. Typical West and SC Asian mtDNAs; U7 and U1 and HV2 are documented in Sycthians.

Ryukendo K said...

@ Chad

Afanasievo is present in all the runs, but appears in none of the scythians.

I'm beginning to suspect that since these are complix 5-way mixes at high dimensions (PCA reveals they need small % of ME and SC Asian ancestry to adjust their positions, etc) formal stats which tends to be lower on resolution may not work as well, and high resolution methods like IBD and rare alleles may work better...

Chad Rohlfsen said...

Oh snap!!!

left pops:

right pops:

best coefficients: 0.476 0.161 0.246 0.030 0.087

Jackknife mean: 0.469910717 0.163594517 0.245377933 0.029827001 0.091289831
std. errors: 0.150 0.130 0.029 0.020 0.123

error covariance (* 1000000)
22641 -13076 1851 -219 -11198
-13076 16797 -1751 689 -2658
1851 -1751 853 -286 -666
-219 689 -286 392 -576
-11198 -2658 -666 -576 15098

fixed pat wt dof chisq tail prob
00000 0 13 9.668 0.720792 0.476 0.161 0.246 0.030 0.087
00001 1 14 10.268 0.742315 0.538 0.179 0.250 0.033 0.000
00010 1 14 11.924 0.612423 0.497 0.111 0.268 0.000 0.124
01000 1 14 11.346 0.658647 0.604 0.000 0.263 0.023 0.110
10000 1 14 22.100 0.0765857 -0.000 0.423 0.209 0.031 0.336
00011 2 15 13.127 0.592451 0.596 0.126 0.278 -0.000 0.000

Chad Rohlfsen said...

Second line down looks like it might be the winner. I'll have to do more tomorrow. I have to be up in 5 hours for work.

Davidski said...

I'm about to get stuck into this. I'll post some output in a couple of hours.

Ryukendo K said...

Nice Chad!

Gioiello said...

@ Nirjhar007
"Will be fascinating to see how Indians do, among others of course. I sense Z2124 is the Northern Branch of Z-94 . Further aDNA should clear it up".

Read again how I explained as the IE word for *swesor entered the Proto-Uralic, and everything will be clear: IE from Samara went Eastward till Central Asia, mixed with Uralics and Altaians, after they came back Southward to India, and Scythians are later people of the Russian steppes.

Seinundzeit said...

For what it's worth, I've been looking into how Central and Southern Asians stack-up, and it seems that I've verified a notion I've held for quite some time; mainly, Indo-Aryans derive their steppe ancestry from Yamnaya-related populations, while South Central Asian peoples (who, with the exception of small Dardic and Nuristani populations, are all Iranian speakers) have an affinity towards Iranian steppe populations from the historical era (Scythians).

A few quick examples should suffice.

This is just my standard setup, but with all of these new samples thrown into the mix.



42.80% Iran_Neolithic + 11.05% AG3
28.40% ASI
17.75% Yamnaya_Samara



48.3% Iran_Neolithic + 4.6% AG3
35.7% Yamnaya_Samara
11.4% ASI



Pakistani Pashtun, Waziristan

33.05% Iran_Neolithic + 12.60% Iran_Chalcolithic
29.50% Sarmatian_Pokrovka + 15.90% Yamnaya_Samara + 1.95% Scythian_Zevakino-Chilikta
7.00% ASI



28.15% Iran_Neolithic + 19.20% Iran_Chalcolithic
21.90% Yamnaya_Samara + 12.40% Sarmatian_Pokrovka + 9.25% Scythian_AldyBel
9.10% ASI



48.85% Iran_Neolithic + 2.65% Iran_Chalcolithic
29.80% Yamnaya_Samara + 8.00% Scythian_Pazyryk
10.70% ASI


In terms of recent ancestry, Pashtuns like myself should be mixtures between people like the Waziristani Pashtuns/Afghan Pashtun highlanders and South Central Asian Dardic highlanders like the Kalasha. So, the greater importance of the older "Aryan" Yamnaya-related element (in my case) makes sense.

So, I think Indian L657 will be found in Yamnaya-like populations, while Pashtun/Tajik Z2124 (which also happens to be my subclade) simply descends from Iranic steppe peoples.

Amazing times we live in.

Matt said...

I'm a little unsure about using a wide set of recent Siberian references to fit these samples (Ket / Mongola), as they're probably a bit admixed.

I'd also say that these are extremely dispersed and I think difficult to treat as population averages rather than sample-by-sample. Except maybe the two Samartians.

Here's a couple of visualisations of the PC2 (main East-West Eurasian PC) vs other PCs for these samples plus averages of other groups: and

Re: Ze6b from ZevakinoChilikta, it looks kinda Central Asian (not Steppe-Siberian?) in PC2 vs PC1 and to an extent PC2 vs PC4, but then no more South Asian at all than the others in PC2 vs PC3 (Oceanian vs North Asian cline), PC6 (Iran_Neolithic vs EuroHG), PC7, PC9 and PC10 (South Asia specific against Kotias / EEF).

May have a look at models later.

Shaikorth said...


Tu, Daur and Mongola are probably the least steppe/EHG/Beringian admixed among the relevant groups in the region, as suggested by Lazaridis 2016 fits, this SpaceMix plot has similar implications:

Arza said...


Scythian_ZevakinoChilikta:IS2 62.2
Daur 37.8
distance%=0.823 / distance=0.00823

Seinundzeit said...

Oh, also, I'll have to echo Matt here; it seems that these people were extremely heterogeneous.

South Central Asians prefer the least genetically East Eurasian samples (with exception to myself, I always gravitate towards Siberian-rich samples), so just keep that in mind, when looking at the labels in my modelling.

Coldmountains said...


Modern dna of Indo-Iranians will not tell us much about the exact genetic profile of early Indo-Aryans and Iranics. South Asians look closer to Yamnaya because they have much less EEF than Iranics and maybe they have some extra EHG from pre-Indo-Europeans. There is probably significant direct yamnaya/afanasievo ancestry among Central Asian Turkics and some Tajiks. Here Yamnaya/Afanasievo predated Indo-Iranics and R1b is still frequent. Pashtuns, Kalash and especially South Asians have probably not so much of it but we dont know with whom Indo-Aryans mixed before they arrived in South Asia. Indo-Aryans in Central Asia were likely steppe/BMAC/North Central Asian(EHG/Yamnaya/siberian) hybrids before they arrived in South Asia.

The earliest Indo-Iranian samples from Sintashta, Potapovka or Poltavka_outliner were the least yamnaya-like and L657/M780 expanded earlier than Z2124 so i actually think that if we find very early L657/M780 in the steppe it will be even less yamnaya-like than Sintashta because this archaic L657 had less time to mix with local Afanasievo/Yamnaya and EHG people.

Davidski said...

Potapovka is more Yamnaya-like than Sintashta, and one of the Potapovka samples clusters with Yamnaya.

So there's no reason at this stage why L657 won't be found on the steppe in samples like this Potapovka outlier, and in fact also like the early Baltic Corded Ware with Z645.

Nirjhar007 said...

I think L-657 is exclusively Indian , it has the most snp variation there also. With This Z2124 is also present in moderate degree .

Uh ,,Can anyone give any data ,on L-657+ outside the subcontinent here? .

I will not be surprised though, if it turns up in BMAC .

Seinundzeit said...


That model is pretty cool, but since Afanasievo is basically the same as Yamnaya_Samara, might it not be wiser to try that same sort of modelling, but without Yamnaya_Samara in the right pops? Everything else unaltered, if you think its worthwhile.

Thanks in advance.


Even once you account for extra ANE (over what Iran_Neolithic had), or just use Iran_Hotu, Indo-Aryan speakers just can't be modeled with Sintashta/Andronovo/Srubnaya, they only take Yamnaya/Afanasievo.

This can actually be seen in my models above.

UP Brahmins are 11.05% ANE + Iran_Neolithic, while Kalash are 4.6% ANE + Iran_Neolithic, yet they still only receive substantial Yamnaya-related admixture, nothing from the more West Eurasian of the Scythians/Sarmatians.

By contrast, all Pashtuns show a mix of Yamnaya and Scythian/Sarmatian, as do all Pamiri people, not to mention western Iranians.

It's something to chew over, but we'll only know for sure, once we have "Aryan" aDNA samples, perhaps from those Swat valley skeletons.

I'll bet they'll be rather similar to the Kalash, but with even more Yamnaya-related admixture.

Again though, who really knows. It's a matter of getting more aDNA.

Coldmountains said...

The earliest Z93 found yet was from the Poltavka_outliner who already was EEF shifted and was around 66% Yamnaya-like and 33 EEF-like he lived probably not long after Z93 and Z94 was born so i expect it also among earliest L657/M780 but in the steppe they could quickly change their autosomal dna so yeah maybe they became at some point in history very yamnaya-like.

Coldmountains said...

Maybe L657 is too young to be found in the steppe but M780 which is ancestral to L657 will be found at the steppe and there is actually one Ukrainian cossack who belongs to it so it is still present in East Europe

Coldmountains said...


Do you know how close is Pashtun z2124 to Scythian z2124 found yet? Most Pashtuns Z2124 is quite homogeneous and could be a founder effect from historical saka, hephthalites or kushan. Scythian or Scythian-like ancestry makes sense among Pashtuns and is expected. I guess the Saka groups of South Central Asia which were not tested yet had a significant genetic impact on South Central Asia among Tajiks and Pashtuns. I am more sceptical about Scythian/Saka ancestry among western Iranics like Kurds who are rather descendants of early West Iranics which resembled Scythians but arrived much earlier.

Nirjhar007 said...

Nah M780 is also very Indian .

Gioiello said...

@ Nirjhar007
"Nah M780 is also very Indian"

Of course, from 4200 years ago, but find a founder sample from India upstream:
R-Z93 Z2479/M746/S4582/V3664 * Z93/F992/S202 4500 ybp, TMRCA CI 95% 5400 4000 ybp" class="age"formed 5000 ybp, TMRCA 4700 ybp
⦁ id:ERS256938ITA [IT-CA]
⦁ id:YF07986ITA [IT-SA]
⦁ id:YF03565
⦁ id:YF01991RUS

Nirjhar007 said...

And also surely they didn't come from Italia . Unless brought by Romaka people here ;) .

Davidski said...

If you look hard enough you'll find markers upstream of Z93 in India. Same as in most places where R1a is very common, or even not that common, like Italy.

The trick is to find them in ancient DNA.

Gioiello said...

I didn't say that R1a-Z93 was born in Italy, but perhaps that he is old there and not come recently. I think that R1a-M420* migh be in the Italian Refugium, and we'll see if it will be found or not. Certainly R1a into India arrived recently from the Russian steppes, with IE languages.

Nirjhar007 said...

Yes, my dear Slav R1a brother ;) , that's the ultimate trick !...

Even Italian poetical dna analysis can't beat that!.

MfA said...

BTW There's a Saudi gentleman with Kurdi surname originally from what is now Turkey, belongs to Z2124* (Z2122-, Z2125-) YF08307.

Gioiello said...

"Yes, my dear Slav R1a brother ;) , that's the ultimate trick !...

Even Italian poetical dna analysis can't beat that!.
LOL... "

I am saying that from the beginning of my analysis. I don't agree with Davidski that R1b came from Samara to Western Europe and also the centum languages, but that happened the other way around, being the satem languages more recent than the centum ones, a mutation that didn't affect the "aree laterali", older.

Gioiello said...

@ MfA
"BTW There's a Saudi gentleman with Kurdi surname originally from what is now Turkey, belongs to Z2124* (Z2122-, Z2125-) YF08307".

Of course, Middle East has also the oldest subclades of R1a with Western Europe, but, as Davidski said, we'll see where they will be found in the aDNA.

Nirjhar007 said...

Dear Ratna ,

You were victorious about Mesolithic R1b In Italy . I always thought to ask , did you get the confirmation, that the R1b found is related to modern Europeans or is a dead end?.

Seinundzeit said...


Truth be told, I've never undertaken a proper exploration of Y-DNA data. So honestly, I know virtually nothing, with regard to how Scythian Z2124 relates to Pashtun Z2124.

Although, as you mentioned, I do know that Pashtun Z2124 is very homogeneous (we're all just "brothers from different mothers", lol).

It would be really nice, if someone more knowledgeable concerning Y-DNA could chime in.

I absolutely agree though, the notion of Pashtun linkages with Scythians/Hepthalites/Kushans/etc is very old.

In fact, it's been the standard opinion held by those handful of western scholars who've spilled ink on our historical roots.

Regardless, I don't think it's ever really been a question of debate that we're descended from a confederation of East Iranic tribes, because we still are a confederation of East Iranic tribes!

Anyway, I'll examine the West Iranian angle in more detail, and post what I find.

Although, I've tried the Mazandarani samples already, and they show a mix of Yamnaya + Scythian + Sarmatian.

Nirjhar007 said...

MfA ,

What is the M780,L-657 status among Kurds and other surrounding folks?.

Chad Rohlfsen said...


Just keeping it historically accurate. Yamnaya is needed in the outgroups to bring down the standard errors. I can make South Asians with Sintashta, but each group can be quite different and a pain in the ass. There's definitely extra ANE and ENA giving a Yamnaya impression.

Nirjhar007 said...


Pashtuns are likely mentioned in Rigveda as Pakth people , so they are an older archaic people, than the Scythians, I think I told you this before :) .

MfA said...


I'm not saying Z2124 came from West Asia. It might be related to West Iranics, Mitanni, Cimmerians or other steppe groups settled specific to Kurdistan.

Gioiello said...

@ Nirjhar007
"Dear Ratna ,

You were victorious about Mesolithic R1b In Italy . I always thought to ask , did you get the confirmation, that the R1b found is related to modern Europeans or is a dead end?".

I'm sorry that you didn't read my posts in the past, but I think having explained all that many times.

Nirjhar007 said...

Please explain once more , as I will keep the note this time :) .

Gioiello said...

@ Mfa

Of course, it is possible.

MfA said...


L657 is extremely rare in Kurds. There's so far only 1 L657 case found in Kurds (IR5_20, M576 from Underhill et al.) who's from what's now Iran. It should be around a few percent in Iran I think. I'm not well versed about the clade.

Nirjhar007 said...

Thanks bro.

Gioiello said...

In my bad English: I don't think that Villabruna is our ancestor, but very likely he belonged to a tribe of hunter-gatherers of the Italian Refugium where there were many linked R1b only one of them survived as R-P297* and is our ancestor. Only stupid people may think that they test only one Y aDNA in Italy and find the only one R1b1 living there. Ask why these PhD-s test hundreds of aDNA elsewhere and only 5 in Italy...

Nirjhar007 said...

I see . Okay I have saved your opinion :) .

Gioiello said...

@ Nirjhar007
"I see . Okay I have saved your opinion :) ".
I thank you, but add what I replied to Ted Kandell, who is a clever person, but he has some problem in understanding too:

Evidence? Everyone has understood that Samara was composed of hg. R1a and a little of R1b-L23. That they migrated to Baltic carrying the Balto-Slav languages (no R1b has been found amongst them) and migrated to Andronovo and Sintashta as Indo-Iranian and gave birth to Scythians of Iranian languages. The tiny R1b subclade, only belonging to the R-L23-Z2105 subclade with perhaps some extinct line, was in those migrations (above all carrying hg. R1a) till Mongol/Chinese/Turk people and after also to the Indian subcontinent where a few of those haplotypes may have survived. But from these samples survived in Eastern Europe, Caucasus, Middle East only a few subclades different from the Western European ones which anyway didn't derive from them. I have explained in my previous letters which subclades may have been derived from these haplotypes there and which not. Evidence? You lack:
R-V88 and all subclades (not older in Africa and Middle East than 5000 years)
R-L389+ (except the haplotype with YCAII=23-23 found in Armenia, whereas Italy has all the 4 hts known so far)
R-Z2109-Y4512 only in Western Europe
R-Z2110 and subclades found in Western Europe and back migrated Eastward as CTS9219
R-P312 and all the Western European subclades... Evidence?

Seinundzeit said...


I do agree with you, there is definitely extra ANE.

For example, I have the UP_Brahmins at 11% ANE, in addition to their Iranian_Neolithic-related percentages.

On top of that, I have the UP_Brahmins at almost 30% ASI, so obviously there is a lot of ENA in their case.

But despite the extra ANE + ENA combo, no South Asian population prefers Steppe_MLBA, they always choose Steppe_EMBA, despite accounting for extra ANE and ENA ancestry, and despite giving them a choice between the Steppe_MLBA and Steppe_EMBA samples.

The Indo-Aryans of South Central Asia (the Kalasha) have the same preference as Indians (despite having way, way less ASI compared to Indians, and despite having less of a preference for pure ANE), while Iranian South Central Asians prefer Scythians (the ancient Iranians of the steppe).

So, the differences show a very striking/clean correlation between Iranian vs Indo-Aryan language and Scythian vs Yamnaya affinity, and don't correlate with West Eurasian/ENA levels.

We'll figure it all out, once we see the requisite aDNA.

And, I actually like the model you've tried (it's very interesting), but my impression has been that having virtually identical populations in both your left and right pops is somewhat problematic?

I mean, Yamnaya_Samara and Afanasievo are basically identical when it comes to autosomal ancestry, and we now know that they even have the same Y-DNA.

Nirjhar007 said...

Thanks Ratna .

EastPole said...

Journal of Indo-European Studies Vol. 33: 3-4, p. 339

The Yamna Culture and the Indo-European Homeland Problem

D. Ya. Telegin

“Excavations between the rivers Orel' and Samara have uncovered burials of a syncretic nature that attest contacts between the spheres of the Corded Ware and Yamna cultures. It is suggested that these may indicate early contacts between proto-Indo-Iranians and the prehistoric ancestors of the Balts and Slavs.”

So maybe Yamna culture was pre-Indo-Iranian and proto-Indo-Iranians originated there after mixing with Corded Ware Balto-Slavs who introduced some R1a-Z645 there.

Davidski said...

There might well be something in this claim about the supposed Yamnaya-like effect in formal stats for the Scythians and also South Asians.

It's at least certain that the Scythians aren't a straight two-way mix between Yamnaya and East Asians, because as Samuel points out, they have West/Central Asian mtDNA hgs like U7, U1 and HV2 that Yamnaya and other European steppe groups lack.

But keep in mind that we can argue about this for months on end, and then a couple of key ancient genomes might totally contradict all our conclusions.

I've done some tests, and the results are very interesting, but I'm going to mail off the output to Iosif for now, instead of making any bold statements on the topic here.

Nirjhar007 said...

Interesting on what sense?.

Gioiello said...

@ EastPole

"So maybe Yamna culture was pre-Indo-Iranian and proto-Indo-Iranians originated there after mixing with Corded Ware Balto-Slavs who introduced some R1a-Z645 there".

I thought that happened the other way around: that people from Yamanya who migrated to the Baltic Sea brought the satem Indo-European which became the Balto-Slav, instead those who migrated Eastward became the Indo-Iranians, but I should study genetics and languages together,

Samuel Andrews said...

How trustworthy is the analysis which gave West Asians(Iran, Iraq, Caucasus) the most Western Scythian ancestry among modern people? If it is trustworthy couldn't proto-Iranian speakers be responsible of such ancestry not Scythians?

Davidski said...

Interesting in what sense?

It does seem like if excess ANE is not accounted for in the eastern Scythians, then they prefer Yamnaya instead of Sintashta. But when it is, by adding AG3 to the models, then Sintashta does pretty well. In fact, minor Esperstedt_MN also works, in tandem with Yamnaya.

Esperstedt_MN 0.062±0.034
Nganasan 0.636±0.020
Yamnaya_Samara 0.301±0.045
chisq 4.808 tail_prob 0.683415

Nirjhar007 said...

Okay . But when you will also publish ,your tests on modern pops? :).

Davidski said...

Later this week probably.

Nirjhar007 said...

Okay, can't wait :) .

Seinundzeit said...


Also, I should be more clear on this: I actually do not disagree with your analysis.

On the contrary, I completely agree with you, these people were not simple mixtures of Yamnaya and ENA.

For example, looking at the Sarmatian sample, this is what I find:

46.0% Srubnaya
32.3% Yamnaya
9.8% Okunevo
7.1% Iran_Chalcolithic
4.8% Mongola


The other samples are even more interesting/unique.

Rather, what interests me is the relationship these samples have to Central and South Asians.

What I've found is that Iranian South Central Asians prefer these Scythians, while Indo-Aryan South Central Asians and Indians prefer Steppe_EMBA. And, this is after extra ANE and ENA is accounted for.

Also, the same occurs when using Steppe_MLBA, instead of the Scythians.

I'm just struck by the linguistic correlation, although there could be confounding factors.

Chad Rohlfsen said...

I can use Sintashta for everyone. It just depends on finding the rest of what's not related to them, Iran_N, ANE, ENA, and ASI.

I'll have another whack at them after I get home. I think I can get the chi square and probability much, much better.

Alberto said...

I'm seeing the same: a good amount of Andronovo, a smaller amount of Yamnaya and a smaller one of Iran_ChL. And then Siberian admixture (some 8-10% in western Scythians, closer to 50% in Eastern ones). Their rather diverse, so to make a short version I had to make choices, but the fits remained very close, so I think it makes sense like this:

(Ah, this is using weighted values, so it might not be exactly reproducible with unmodified Global 10 datasheet).

By the euclidean distance (again, weighted values), the closest modern populations to Sarmatians:

Tatar Mishar
Tatar Kryashen
Tajik Shugnan
Tajik Yagnobi
Russian Kargopol
Tajik Ishkashim
Russian Kostroma
Tatar Lipka

The closest to Scythian_Pazyryk:

Yukagir Forest

For the king said...

Cool stuff. What do they get in Basal Eurasia K7?

For the king said...

@Alberto Any idea why do they prefer Iran ChL over Iran neolithic or CHG? IS2 is quite admixed with Iran ChL(20%). He/she also scores around 25% East Eurasian. I wonder if they had higher Iran ChL before mixing with east Eurasians.

Alberto said...

@For the king

The preference was not unanimous. They are diverse (not sure about their coverage/quality yet), so the preference was a bit divided between Iran_ChL, Armenia_EBA and Iran_Neolithic (hardly any Kotias). In the end I opted to just use Iran_ChL since the differences were quite small.

Initially they also didn't pick Andronovo. Instead it was more Karelia_HG + Anatolia_Neolithic. So these models are more "supervised" than "unsupervised", just based on what seemed to make enough sense for all the samples after trying many different combinations.

Coldmountains said...


After doing some research i found out that most Pashtuns seem to be Z2124<Z2125<YP413<M12280. I guess you also belong to it. It is also found among some Indians, Arabs and Armenians. There is also one bulgarian belonging to it. It was not found yet in ancient dna but 2125 was found in Sintashta, Andronovo and Karasuk. My feeling is that it is related to Hepthalites in some way.

Coldmountains said...


Yeah but i am talking about basal M780. The Ukrainian is M780+, M634- and L657- so likely he got it not recently from Romani which are L657+. Was there any M780+ L657- found in India?
Figure S36

Matt said...

@ Shaikorth: Tu, Daur and Mongola are probably the least steppe/EHG/Beringian admixed among the relevant groups in the region,

Hmm, well, the HGDP group designated Mongola tended to get around 5% in West Eurasian components in old school calculators like Dienekes Globe13; I haven't looked to see what it gets in other newer stuff. Tu (Monguor) are about 10% West Eurasian in classic Eurogenes K13 and again around 5% in Dienekes Globe13.

If I PCA a limited subset of the samples in Globe13, like so -

The Tu and Mongola samples seem similar in position on the North vs South East Asian line to to other locals like Korean, Japanese, but offset slightly towards West Eurasia.

Not as to any huge degree like Ket, true (based on above PCA about similar to comparing Thai vs Dai, not Dai vs Cambodian or Burmese vs Han or Malay vs Atayal). so I may have misspoke by putting them in the same breath as Ket. But I don't see any superiority to using them compared to just using Oroqen, Ulchi, Nganasen, Han, which are less displaced towards West Eurasia. Daur seems OK and a bit more useful in this context.

(It's also true the samples designated Mongola are usually much less admixed than the ones designated Mongolian used in some analyses. But this does not mean Mongola are at 0.)

Btw, based on the above, couple of nMonte minimal sets I may test to see if they can fit the Scythian / Samartians: 1 - Andronovo, Daur, Okunevo, Ulchi or 2 - Afanasievo, Andronovo, Daur, Sakha. See - or

Ryan said...

@Matt - "But I don't see any superiority to using them compared to just using Oroqen, Ulchi, Nganasen, Han, which are less displaced towards West Eurasia."

They have a tonne of ANE. If there was some extremely ANE-rich unsampled population that has since gone extinct, the Kets are the best proxy for it among modern samples.

Shaikorth said...


Remember that PCA positions are affected by sample homogenousness (endogamy). Ulchi is useful as a source because it shares the most drift with neolithic Northeast Asians, but if we don't use Tu or Mongola we shouldn't use Nganasans which seem to be basically a more drifted mix of their neighbours and not something that could have been ancestral to the Scythians, which is why in haplotype-based models they do not contribute to any South Siberian populations.

Mongolic-speaking populations may all have minor western steppe ancestry but that isn't much more significant than the ANE-shift detected in Ulchi compared to Devil's Gate, and they could plausibly represent Iron Age eastern steppe. In Lazaridis 2016 Onge-ANE models their fits are closer to Ulchi than Nganasans, Yakuts, Dolgans and comparable populations.

As Ryan says Kets are an apparent ANE relic ( , the difference between Goyetq116 IBS and MA-1 IBS in Eurasia) but Okunevo is available to be used instead as an actual ANE-rich Bronze Age South Siberian population.

Kurti said...

Samuel Andrews said

"The results I'm getting for the Sycthians are; Siberia+Andronovo/Sintashta+CHG/Iran Neo. Typical West and SC Asian mtDNAs; U7 and U1 and HV2 are documented in Sycthians."

It's not only West and Central Asian mtDNA there is also more typical "West Asian" yDNA among Steppe Iranics, such as J1, J2, G2a and if we coun't Huns to them (since Huns were basically Scythians) L too.

Matt said...

@Shaikorth, I don't totally follow regarding sample homogenousness argument, though, in any case for the Scythians and Samaritans, it looks to me like Tu and Mongola is too far "south" within NE Asia to be a good proxy, and we may as well use Daur who are fairly minimally admixed along east-west, are much more Siberian / northern than the Tu or Mongola and are Mongolic speakers to boot.

(Unless they're in there as a counterbalance the extreme "northness" of Nganasan or Ket and EHG, as it appears to me in some of Ryu's models).

@Ryan, hmm... Kets look a bit between Okunevo and Nganasan from what I can see. Looks like they may be probably a bit more ANE rich, than a simple mix of the two, but it seems like not enough to me to justify using a recent admixed population to model ancestry for an ancient one.

Anyway, so, fits with the groups I mentioned in my above post (and the normal Globe10 sheet with no weighting or alterations):

Set 1:
Sample:, Andronovo, Daur, Okunevo, Ulchi, Distance%
Sarmatian_Pokrovka:I0574, 86.4, 5.65, 7.95, 0, 1.7861
Sarmatian_Pokrovka:I0575, 92.6, 6.15, 1.25, 0, 1.3512
Scythian_AldyBel:I0576, 27.4, 0, 36.5, 36.1, 0.5866
Scythian_AldyBel:I0577, 56.6, 3.95, 27.35, 12.1, 0.8582
Scythian_Pazyryk:I0562, 31.9, 0, 22.05, 46.05, 0.7095
Scythian_Pazyryk:I0563, 19.35, 0, 24.8, 55.85, 0.8701
Scythian_Samara:I0247, 86.15, 9.2, 4.65, 0, 1.4456
Scythian_ZevakinoChilikta:IS2, 60.45, 10.6, 21.5, 7.45, 1.9766
Scythian_ZevakinoChilikta:Ze6b, 38.95, 45.65, 12.55, 2.85, 0.9841

Set 2:
Sample:, Andronovo, Daur, Sakha, Yamnaya_Samara, Distance:%
Sarmatian_Pokrovka:I0574, 39.4, 12.5, 0, 48.1, 1.2741
Sarmatian_Pokrovka:I0575, 55.85, 9.5, 0, 34.65, 0.9843
Scythian_AldyBel:I0576, 28.4, 20.45, 29.8, 21.35, 0.9366
Scythian_AldyBel:I0577, 36.6, 13.55, 14.65, 35.2, 0.5435
Scythian_Pazyryk:I0562, 27.75, 24.9, 30.15, 17.2, 0.703
Scythian_Pazyryk:I0563, 9.4, 31.8, 34.85, 23.95, 0.7312
Scythian_Samara:I0247, 51.9, 13.75, 0, 34.35, 1.1299
Scythian_ZevakinoChilikta:IS2, 28.15, 18.95, 9.95, 42.95, 1.6975
Scythian_ZevakinoChilikta:Ze6b, 28.35, 49.25, 4.9, 17.5, 0.9398

These aren't *great* fits, compared to many upthread. Does anyone have any better ones using 4pop and ancients + minimally admixed East Asians?

Rob said...

If you use Okunevo, why would you also need Ulchi ?

Matt said...

@Rob, it looks like Okunevo is displaced towards the "ANE" direction in PCA relative to what a combination of Ulchi+Andronovo or Ulchi+Afanasievo can produce. That would seem to make it useful for fitting the Aldybel+Pazyryk samples.

A combination of Yamnaya+Sakha could serve a similar purpose (as it does in Set2). A least this is how it looks to me from looking at the PCA dimensions - anyone can test can if using a set (Andronovo, Daur, Yamnaya, Ulchi) works better than Andronovo, Daur, Okunevo, Ulchi.

Matt said...

Giving up a bit on the nMonte modelling, to complement the post by Alberto, here is the Top 30 population euclidean distances for each sample, including each other and other ancients as well:

Scythians are always closest to another Scythian, and Sarmartians are always closest to a Scythian.

Hungary Iron Age and Altai also tend to be high on the list in closeness to Scythians.

Karasuk, Karasuk outlier is the closest pre-IA population (or in one case Okunevo). Per wiki, Karasuk Culture were "a group of Bronze Age societies who ranged from the Aral Sea to the upper Yenisei in the east and south to the Altai Mountains and the Tian Shan in ca. 1500–800 BC (with a distribution which) covers the eastern parts of the Andronovo culture, which it appears to replace."

Shaikorth said...


Daur should work, though they don't seem to be that different compared to Tu and Mongola.

The homogenousness argument basically is that Nganasans are a recently formed but bottlenecked population and not a plausible source for admixture in Scythians. They show all the patterns: elevated Ld compared to other northern Siberians (Pugach 2016), haplotype-based fits don't pick them as a donor but as a recipient (happens with one-way gene-flow, Hazara get Pathan donors but not vice versa)... Falush et al 2016 states population-specific drift can be incorporated into PCA dimensions like in ADMIXTURE components.

Davidski said...

Nganasan seem to be one of the best, if not the best, proxy for East Eurasian admix in the Scythians when using qpAdm.

Matt said...

Ah, I see what you mean now in theory, though no idea to tell how much it actually affects Globe10.

Re: comparing Mongola and Tu vs Daur, and whether they don't seem that different, in these dimensions, just calculating the simple euclidean distances in these dimensions gives the following:

- overall euclidean distance between Daur and Mongola is about the same as between HanNChina and Miao, or Han from Southern China and Korean.
- Daur and Tu is about the same distance as Han from Southern China vs South Vietnamese.
- Mongola and Tu are somewhat more distant to Daur than they are HanNChina.
- Distance of Han to Daur is about double in these dimensions the distance of Han from South China to Tu.

Not 100% sure that is accurate to the real genetic distances, but that's how it is in these. Judging by these, I guess you could say they don't seem that different if the other population pairs don't seem that different.

Of course, what makes them distant may not be as relevant in using them as a reference for Scythians.

Shaikorth said...


I'm pretty sure that won't hold with haplotype analysis though.
If you run them as the sole source in qpAdm which doesn't necessarily care as much about genealogical descent Nganasans might work well because they have the right ANE/ENA balance, but have you tried adding multiple sources?


OTOH Daur and Mongola in Lazaridis 2016 fits can differ by as little as 0,1%. But I'm fine with using whichever works best in nMonte really.

Ryan said...

@David - I think that must be because of whatever Paleo-Siberian group the Nganasans asborbed. Otherwise you'd think other Uralic or at least Samoyedic groups would behave the same, no?

Ryan said...

Or maybe it actually is Samoyedic. There are Tocharian loan words in proto-Samoyedic apparently. Hmm.

Nirjhar007 said...

I think so Coldmountains , a contemporary paper will clear it up . For India with a pop of 1.3 Billion , you equally need relevantly big sample size.

Even in IVC times the population was huge .

Nirjhar007 said...

Ryan ,

If I am not wrong Proto-Chinese also have Tocharian type words .

Karl_K said...

Are there genomes from people who speak Chukotko-Kamchatkan languages?

Aren't there a surprisingly high number of resemblances between Indo-European and Chukotko-Kamchatkan words?

Nirjhar007 said...

It depends on the words . I have found a few but they are not much surprising , they mostly belong to the Eurasiatic - Nostratic type inheritance .

Aram said...

Karl K

40% or even more of Chukchis belong to this 2500 year old branch.
The origins of this clade is probably somewhere in Wedt Siberia.

So Chukchis having IE and maybe Uralic words is quite normal.

Seinundzeit said...

Using the Srubnaya_outlier leads to much better fits, for the Sarmatians and western Scythians.

They don't need any Yamnaya admixture, when that sample is in the mix.


54.15% Srubnaya + 7.35% Srubnaya_outlier
26.45% Okunevo
12.05% Iran_Chalcolithic



60.30% Srubnaya + 20.95% Srubnaya_outlier
7.20% Okunevo + 4.50% Mongola
7.05% Iran_Chalcolithic


And the eastern Scythians are variable mixtures between Okunevo-related, Srubnaya/Sintashta-related, and Iran_Chalcolithic-related ancestries (0% from Yamnaya).

But one wonders; what sort of population does the Srubnaya_outlier represent?

Her presence in the western results, her absence in the eastern results, and her lack of ENA, makes for an interesting picture.

Later, I'll try to examine Central and Southern Asians, with Srubnaya_outlier in the mix.

Seinundzeit said...


A Hepthalite connection makes sense.


Sorry about that, I almost missed your comment about the Pakth connection.

There is a paper out there, in which it is shown that the resemblance between Pakth and Pakhtun/Pakhto is a coincidence.

Apparently, it's a etymological impossibility. Let me find the paper, then I'll post the link.

Arza said...

Population,Andronovo:RISE505,Armenia_EBA:I1633,Okunevo:RISE516,Ulchi,D statistic

Arza said...

Population,Andronovo:RISE505,AfontovaGora3:I9050.damage,Iran_Neolithic:I1945,Ulchi,D statistic

Jaydeep said...

There is a new paper at Biorxiv

Samuel Andrews said...

Neolithic East Germany, Hungary, and Spain have been beaten to death with DNA testing. That's an amazing study but I wish they'd get Neolithic genomes from other locations like Ukraine and Romania and Serbia and Italy.

The Neolithic Hunter Gatherers from Bl¨atterh¨ohl have Y DNA R1b1. One has yhG R1b1 and mHG U5b2a2 like me. Coincidence.

Ryukendo K said...

Upon the suggestion of Kristiina and the discovery that Pazyryk were N1b, I included some Samoyed groups into the set of sources as well, and produced quite interesting results for Pazyryk.

[1] "distance%=0.2692 / distance=0.002692"


Nenets_Forest 27.80
Ulchi 22.65
Mongola 15.95
Samara_Eneolithic 14.70
Iran_Chalcolithic 5.30
MA1 4.35
Hungary_N 3.75
Armenia_MLBA 2.70
Sintashta 1.85
Chamar 0.65
Tuvinian 0.25
Tu 0.05

With sequential pushing, the next few models:
[1] "distance%=0.2792 / distance=0.002792"


Nenets_Forest 27.50
Mongola 17.45
Ulchi 13.35
Sintashta 13.05
Tuvinian 10.05
Armenia_MLBA 7.35
AfontovaGora3 4.55
MA1 4.55
Iran_Chalcolithic 1.60
Chamar 0.55

[1] "distance%=0.2812 / distance=0.002812"


Nenets_Forest 28.90
Mongola 18.35
Tuvinian 12.35
Sintashta 11.00
MA1 10.45
Ulchi 9.40
Armenia_MLBA 9.30
Iran_Late_Neolithic 0.20
Armenia_EBA 0.05

[1] "distance%=0.2812 / distance=0.002812"


Nenets_Forest 28.90
Mongola 18.35
Tuvinian 12.35
Sintashta 11.00
MA1 10.45
Ulchi 9.40
Armenia_MLBA 9.30
Iran_Late_Neolithic 0.20
Armenia_EBA 0.05

[1] "distance%=0.2995 / distance=0.002995"


Nenets_Forest 23.00
Pima 22.35
Sintashta 19.65
Ulchi 15.05
Mongola 10.70
Iran_Late_Neolithic 3.70
Armenia_MLBA 3.35
Tuvinian 2.05
Chamar 0.15

Ryukendo K said...

Utilising a wider range of sources, it also seems like the 'Central-Asian-Shifted' 7th to 9th C BC genomes possess some ancestry that is similar to present-day Altaics.

[1] "distance%=0.1122 / distance=0.001122"


Tuvinian 21.45
Armenia_EBA 15.35
Sintashta 13.90
Poltavka_outlier 12.00
Ulchi 12.00
AfontovaGora3 10.00
Chamar 4.05
Karasuk 3.50
Tu 2.90
Mongola 1.75
Iran_Late_Neolithic 1.15
Loschbour 0.95
Daur 0.60
Armenia_MLBA 0.30
Paniya 0.10

[1] "distance%=0.1519 / distance=0.001519"


Tuvinian 31.80
Poltavka_outlier 20.70
Sintashta 11.75
Iran_Late_Neolithic 8.95
Karasuk 7.15
Ulchi 4.35
MA1 3.65
Chamar 3.50
Mongola 3.00
Kotias 2.50
Okunevo 1.90
Daur 0.50
Paniya 0.25

Pushing out Tuvinian:

[1] "distance%=0.1701 / distance=0.001701"


Ulchi 23.10
Poltavka_outlier 16.35
Altaian 13.25
Sintashta 11.30
Srubnaya 8.55
Okunevo 6.90
Iran_Late_Neolithic 5.40
Armenia_EBA 4.60
Chamar 4.25
MA1 3.20
Kotias 2.65
Ket 0.35
Armenia_MLBA 0.10

Pushing out Altaian:

[1] "distance%=0.1714 / distance=0.001714"


Ulchi 30.80
Srubnaya 17.55
Sintashta 11.35
Okunevo 9.90
Poltavka_outlier 8.80
Armenia_EBA 6.95
Chamar 4.45
Iran_Late_Neolithic 4.15
Kotias 2.25
MA1 1.90
Ket 1.55
Nenets_Forest 0.20
Armenia_MLBA 0.10
Buryat 0.05

Pushing out Ulchi:

[1] "distance%=0.1874 / distance=0.001874"


Buryat 23.70
Sintashta 14.20
Poltavka_outlier 11.65
Daur 10.80
Srubnaya 10.75
Iran_Late_Neolithic 7.80
Okunevo 5.25
MA1 4.20
Chamar 4.00
Ket 4.00
Armenia_EBA 3.15
Kotias 0.15
Paniya 0.15
Karasuk 0.10
Armenia_MLBA 0.05
Nenets_Forest 0.05

These genomes clearly demonstrate an attraction towards ENA ancestry in present-day Altai. Perhaps the ENA ancestry most characteristic of Altaic populations started to introgress at this time. Its also clear that there is no one ENA gene pool that supplied the source for ENA introgression for Eastern Scyths; there seem to be separate signals, one more Northerly and another more Southerly.

Samuel Andrews said...

On Balterhohle R1b1..
"R1b1a1a2 showed
both derived and ancestral alleles of characteristic SNPs."

He's probably a relative of R1b1a1a2 M269.

Matt said...

@Jaydeep, thanks. Have to read this later. Findings that jump out

- Looks like the Blatterhohle Cave groups who earlier were thought by researchers (Brandt?) to be an example of hunter-fisher HG living alongside farmers peacefully without sharing genes look genetically like a mixed Neolithic-WHG group, with 40-50% HG. (May have been sex biased if they were looking at mtdna?)

- They reproduce a HG structure of ElMiron+LaBrana one end and EHG at the other, which correlates expectedly with latitude (Bichon+Losch closer to LaBrana, KO1 closer to EHG and Villabruna intermediate). Comparing this to early Neolithic farmers they find a mild correlation with this and regional farmer ancestry... and a result:

"We find that almost all ancient groups from Hungary have ancestry significantly closest to one of the more eastern WHG individuals (either KO1 or Villabruna); the samples from present-day Germany have greatest affinity to Loschbour; and all three Iberian groups contain LB1-related ancestry (Figure 2C; Extended Data Table 2). This pattern implies that admixture into European farmers occurred multiple times from local hunter-gatherer populations. Moreover, combining the proportions and sources of hunter-gatherer ancestry, populations from the three regions are distinguishable at all stages of the Neolithic. Thus, any further migrations that may have occurred after the initial spread of farming were not substantial enough within the studied regions to disrupt the observed heterogeneity."

(How much does / doesn't this contribute to present day structure?).

Looks really worth reading if we're interested in trying to quantify how much survival of HG in Europe tended to be *really* local in the Neolithic and Chalcolithic, as opposed to being spread all around Europe by continuing diffusion of farmers.

Matt said...

Also from "Parallel ancient genomic transects reveal complex population history of early European farmers" We observed discrete signals of admixture in LB1 and KO1 via f3- and f4-statistics [29], and both fit best as admixed in the scaffold model, LB1 with ancestry from a deeper European hunter-gatherer lineage and KO1 with a small proportion of FEF admixture (Supplementary Information section 6).

Nice to see some corroboration in the literature of an effect which various of us (at least Chad I think?) have modeled from time to time.

Makes me wonder whether this is just KO1, or the Balkans HGs may have more..

Alberto said...


From my testing, restricting to 4 pops it would probably be Andronovo, Yamnaya, Iran_ChL and Ulchi. Adding Itelmen as a 5th improves fits further, but it's not going to be much more informative.

BTW, for the nMonte-like testing you might find useful a script I use but never released publicly (because newer versions of 4mix were said to be coming -but didn't, so far- and because I don't know R scripting any good). But if you're interested, drop me a line: alberto6674 at (Ryu, idem. Or anyone else who does many runs).

Shaikorth said...


If you're searching for optimal sources for Scythian, you could try adding these one by one to the fit and checking if they improve it:

Proto-Yukaghirs apparently originated closer to Baikal, and the other two have absorbed Paleosiberian substrates even if they didn't exist in their modern form back then.

Matt said...

@ Arza, using your set of Andronovo, Armenia_EBA, Okunevo, Ulchi, and the general pop rather than individual:


Only one where fits were comparable with my original sets were I0576 (Set 1) and I0577 (Set 2). But for the majority of the samples, it looks like the distinction between Andronovo and more Iran_Neolithic / CHG related populations is more important for explaining them than the Daur, Sakha vs Ulchi distinction (at least when Okunevo is in there).

@ Alberto, that set sounds plausible given above, but I may test whether adding Karasuk / Okunevo as a sub for one of those steppe related (Andronovo, Yamnaya, Iran_Chl) works better. Thanks for the offer of the nMonte script also.

Arza said...

@ Matt

Which software do you use for this runs?

Also can you check how CWC behaves? But only those two:


Population,Corded_Ware_Germany:I1536,Iran_Chalcolithic:I1661,Okunevo:RISE515,Ulchi,D statistic

Corded_Ware_Germany:I1536 65.4
Okunevo:RISE515 15.45
Iran_Chalcolithic:I1670 15.1
Okunevo:RISE516 2.35
Daur 1.1
Oroqen 0.6
Ulchi 0
Iran_Chalcolithic:I1661 0
Iran_Chalcolithic:I1662 0
Iran_Chalcolithic:I1665 0
Iran_Chalcolithic:I1674 0
Corded_Ware_Germany:I1538 0

distance%=0.3673 / distance=0.003673

Further zeroes dropped.

Corded_Ware_Germany:I1536 56.65
Corded_Ware_Germany:I1538 19.95
Iran_Chalcolithic:I1674 9.45
Okunevo:RISE515 7.4
Daur 6.55

distance%=0.3751 / distance=0.003751

Okunevo:RISE515 39.95
Ulchi 36.4
Corded_Ware_Germany:I1536 23.3
Iran_Chalcolithic:I1670 0.35

distance%=0.5802 / distance=0.005802

Corded_Ware_Germany:I1538 41.55
Ulchi 19.7
Okunevo:RISE515 15.6
Corded_Ware_Germany:I1536 8.8
Okunevo:RISE516 8.7
Iran_Chalcolithic:I1670 5.65

distance%=0.6438 / distance=0.006438

Ulchi 46.75
Corded_Ware_Germany:I1536 27.55
Okunevo:RISE515 14.35
Okunevo:RISE516 9.35
Iran_Chalcolithic:I1665 1.65
Iran_Chalcolithic:I1670 0.35

distance%=0.6802 / distance=0.006802

Ulchi 54.75
Okunevo:RISE516 26.65
Corded_Ware_Germany:I1536 13.4
Iran_Chalcolithic:I1670 5.2

distance%=0.661 / distance=0.00661

Corded_Ware_Germany:I1536 60.05
Iran_Chalcolithic:I1674 12.7
Corded_Ware_Germany:I1538 9.95
Okunevo:RISE515 9.1
Ulchi 8.2

distance%=0.248 / distance=0.00248

Okunevo:RISE515 36.45
Corded_Ware_Germany:I1536 34.2
Iran_Chalcolithic:I1670 20.05
Ulchi 9.3

distance%=0.5959 / distance=0.005959

Corded_Ware_Germany:I1536 26.9
Daur 24.75
Oroqen 20.05
Okunevo:RISE515 19.1
Iran_Chalcolithic:I1674 9.2

distance%=0.5441 / distance=0.005441

Sakha and Eskimo_Sireniki added:

Oroqen 36.85
Eskimo_Sireniki 28.35
Corded_Ware_Germany:I1536 21.3
Okunevo:RISE516 7.7
Iran_Chalcolithic:I1670 5.8
Sakha 0
Daur 0
Ulchi 0
Okunevo:RISE515 0
Iran_Chalcolithic:I1661 0
Iran_Chalcolithic:I1662 0
Iran_Chalcolithic:I1665 0
Iran_Chalcolithic:I1674 0
Corded_Ware_Germany:I1538 0

distance%=0.4226 / distance=0.004226 vs. 0.00661

Corded_Ware_Germany:I1536 39.15
Okunevo:RISE515 26.55
Iran_Chalcolithic:I1670 20.5
Eskimo_Sireniki 10.3
Oroqen 3.5

distance%=0.5178 / distance=0.005178 vs. 0.005959

ak2014b said...


"the notion of Pashtun linkages with Scythians/Hepthalites/Kushans/etc is very old. In fact, it's been the standard opinion held by those handful of western scholars who've spilled ink on our historical roots."

I'm more familiar with papers on modern Afghan DNA investigating the common claims of Greek ancestry and Jewish ancestry in Pathans. The Lacau et al 2012 paper could not find support for Greek admixture, but found indications of Khazarian admixture in Pathans:

Furthermore, the high frequencies of R1a1a-M198 and the presence of G2c-M377 chromosomes in Pathans might represent phylogenetic signals from Khazars

Although Greeks and Jews have been proposed as ancestors to Pathans,3, 4 their genetic origin remains ambiguous. The Lasithi Plateau isolate, in the highlands of eastern Crete, partitions relatively close to the Afghanistan populations in the CA graph (Figure 3a), which could be attributed to the elevated proportion of R1a1a chromosomes20 shared among them. However, the absence of the predominantly Greek E1b1b1a2 -V13 lineage39 in Pathans does not argue for genetic contributions from Greece.

We envision a plausible scenario in which the converted Khazars could have been absorbed by the early Pathans and that R1a1a-M198 drifted to high frequency in Afghanistan

Underhill and his co-authors above don't clarify how much of the R1a1a in modern Pathans or Afghans they think is from Khazars, although they do imply that Khazarian R1a1a input into the "early Pathans" may be connected to R1a1a-M198 drifting "to high frequency in Afghanistan".

Both Khazarian aDNA samples were R1aZ93, from a recent comment by Nirjhar.

And Sein, over in the other Scythian thread, you said
"So now, I think I can say with some confidence/certainty that a good portion of my ancestry can be traced to Eastern Scythians."

Since Khazars were a Turkic Central Asian population, they would trace to Eastern Scythians. And perhaps they may have picked up some western Scythian ancestry too in their history.

I think it would be worthwhile for a study to investigate how much of the (Eastern) Scythian signal in modern Pathans and other Afghans has actually been mediated via the Turkic Khazars.

Coldmountains said...


Not few Pashtuns show signals of siberian/east asian ancestry. Some of it is probably from recent intermixing with Hazara or Uzbeks but there is surely no connection with Khazars who never lived close to Afghanistan. Underhills conclusions were simply idiotic to be honest. Applying the same logic someone could say that Pashtuns got R1a from Slavs because both belong to R1a-M198. Neverthless there is a connection between Khalaj turks of South Central Asia and Ghilzai Pashtuns. So it would be interesting to see if Ghilzai Pashtuns show more signals of east skythian/early turkic ancestry. But as far as i know they not really differ from other Afghan Pashtuns and tend to be very similar to Durrani Pashtuns

ak2014b said...


The authors may have conflated the western Turkic Khazar Khaganate (Khazaria) with the eastern Turkic Khaganate. Or they might be suggesting backflow from west to east, to explain claims among some Pathans about having Jewish ancestry.

The genetic signatures of both the eastern and western Khagans may have started off similar. Wikipedia map of the Turkic Khaganate, which looks like it includes northern Afghanistan.

A page about Y Hgs in Turkic populations contains Another map demarcating the eastern Turkic Khaganate (which looks to overlap a part of Afghanistan) and the slightly younger and more western Khazar Khaganate.

So a study on how much of the (Eastern) Scythian signal in modern Pathans and other Afghans has been mediated via any Turkic Khagans would be useful to lay to rest or find support for the conclusions in the Lacau et al 2012 paper.

ak2014b said...


My count of the R1a subclades found in the 5 South Asian 1000 Genomes groups, taken from the R1a tree in page 20 of the Supplementary Information of Poznik et al 2016,

BEB Bengalis from Bangladesh: 2 F992 (Z93), 5 L657 and 1 Y7, 1 Z2123 (9 R1a samples)
GIH Gujaratis in Houston: 1 F992, 6 L657 and 7 Y7, 2 Z2123 (16 R1a)
ITU Indian Telugus in the UK: 0 F992, 6 L657 and 3 more Y7, 6 Z2123 (15 R1a)
PJL Punjabis of Lahore: 4 F992, 2 L657 and 10 Y7, 1 Z2123 (17 R1a)
STU Sri Lankan Tamils in UK: 2 F992, 5 L657 and 2 Y7, 5 Z2123 (14 R1a)

I can't guarantee I didn't miss any samples or miscount.

Seinundzeit said...


I was actually referring to historians/anthropologists, not geneticists.

Speaking of historians/anthropologists (and taking a brief break from genetics), unfortunately, there is virtually no serious scholarly work on Pashtun history.

Mainly, since Pashto has never been a language of "high culture", in sharp contrast to Persian.

I mean, Pashto literature is exceedingly young, it basically starts (rather abruptly) with Pir Roshan.

So, we have to rely on brief, and often very unsavory, references made by non-Pashtun writers.

And even those are pretty rare.

Which isn't surprising, as Pashtun tribes have traditionally inhabited isolated/inaccessible eyries and tracts, and have been adept at avoiding co-option into state systems based in both greater Iran and greater India.

Al-Biruni mentions Pashtuns very briefly, and just notes that they're a bunch of rebellious/violent tribes that inhabit mountain ranges, near the western border of India (Indus river).

That was the 11th century. And, as is typical for these early sources, he refers to Pashtuns as "Afghan".

Apparently, Ibn-Batutta wasn't a fan, and just says this:

"Their mountains are difficult of access, having narrow passes. These are a powerful and violent people; and the greater part of them are highway robbers..." — Ibn Battuta, 14th Century

That's almost everything we have on Pashtuns, before the first Pashto book, and before Babur's military campaigns.

In fact, Babur was the first person to describe Pashtun tribal geography in full detail, because he spent much of his time fighting tribesmen in what is now eastern Afghanistan/northwestern Pakistan.

Regardless, of those few western scholars who have made attempts to theorize on our "origins", most have tended towards the idea of Scythian, or Hepthalite, descent.

For some reason, Russian anthropologists have mainly pushed the Hepthalite connection.

And, when it comes to local Pashtun folklore, the story remains one of Isrealite origins, which obviously can't be true.


I have data for some Ghilzai Pashtuns from Afghanistan, and I've found that they are somewhat different from the Durrani.

The Durrani have more Iran_Chalcolithic, less steppe ancestry, and less ASI (the Ghilzai I have show the same amount of ASI as myself, and I'm a northeastern Pashtun with no Ghilzai ancestry), when compared to the Ghilzai.

And, the Ghilzai I have don't show any excess of Turkic affinity.

On top of that, the Ghilzai Y-DNA profile is identical to other Pashtun populations (mostly R1a, followed by G, Q, L, etc).

I still find this to be quite surprising, because I was basically sold on the Khalaj-Ghilzai connection.

Then again, it could still be real, but the Turkic ancestry is now diluted?

Interestingly though, on the topic of the Durrani, I've found that they almost look genetically intermediate between Pamiri Tajiks and Balochistanis.

You could draw a line from Tajik_Shughan and Balochistanis, and the Durrani Pashtun are almost perfectly on it.

Davidski said...

Seven out of the eight new Scythian/Sarmatian samples are now in the Basal-rich K7 spreadsheet.

Project "Magnus Ducatus Lituaniae" said...

I still believe that Ulchi-related type of ancestry in Zelvakino and Pazyrk samples is overestimated. If I were you I would try to add DevilsGate genomes to the list of outgroup (right) populations in qpAdm.

Matt said...

@Davidski, thanks for Basal K7 outputs.

For anyone interested, one thing I tried recently since I discovered Principal Coordinates Analysis was running the component Fst from Basal K7 through PCoA, which produces a nice set of 7 dimensional distances between them -

Then you can project all the rows from the spreadsheet on to those dimensions, like so:

PC1v2:, 3v4:, 5v6:, 7: (7v1 for convenience).

(Useful if you want to look at what the euclidean distances between populations should be based on the Basal K7).

Here's the output of the original PCoA on component Fsts, with the projection of the rows:, in case anyone wants to try using it for nMonte, etc. (Including the Scythians).

(Tree based on projected dimensional spreadsheet: vs ADMIXTURE output: