Friday, March 10, 2017

Bring it on

AdmixTools 5 is now available at GitHub (see here). I'm messing around with the latest version of qpAdm as I await the expected flood of new ancient samples. Based on first impressions, I'd say it's sharper than previous versions. Here's an attempt to hone in on Yamnaya's ancestral makeup; note that the best statistical fits are clearly those with the spatiotemporally closest genomes.


Caucasus_HG 0.534±0.022
Eastern_HG 0.466±0.022
chisq 42.494 taildiff 2.66849158e-06

Eastern_HG 0.569±0.016
Iran_Chalcolithic 0.431±0.016
chisq 31.790 taildiff 0.000216504253

Eastern_HG 0.572±0.018
Iran_Neolithic 0.233±0.027
Lengyel_LN 0.195±0.019
chisq 26.291 taildiff 0.0009363224

Caucasus_HG 0.361±0.036
Eastern_HG 0.518±0.021
Lengyel_LN 0.121±0.023
chisq 12.737 taildiff 0.121217144

Kotias_HG 0.367±0.047
Lengyel_LN 0.103±0.031
Samara_HG 0.530±0.027
chisq 9.531 taildiff 0.299484439

I also had a quick look at South Asia. The likely Eastern Iranian-speaking early Sarmatians from Pokrovka, Russia, recently published along with Unterländer et al., look like a decent enough reference for modern-day Eastern Iranians, but not for Indo-Aryans like the Kalasha and North Indian Brahmins. The latter prefer Ulan IV, the late Yamnaya/early Catacomb sample from Allentoft et al. 2015. It's an intriguing question why.


Iran_Neolithic 0.302±0.038
Onge 0.168±0.015
Sarmatian_Pokrovka 0.529±0.035
chisq 10.424 taildiff 0.107885899

Han 0.056±0.020
Iran_Neolithic 0.309±0.047
Onge 0.135±0.030
Ulan_IV 0.500±0.039
chisq 15.316 taildiff 0.00909576308

Han 0.059±0.015
Iran_Neolithic 0.276±0.045
Sarmatian_Pokrovka 0.665±0.048
chisq 4.603 taildiff 0.595657656

Han 0.098±0.015
Iran_Neolithic 0.266±0.052
Ulan_IV 0.637±0.052
chisq 12.971 taildiff 0.0434993304

Han 0.062±0.023
Iran_Neolithic 0.202±0.052
Onge 0.257±0.036
Ulan_IV 0.479±0.043
chisq 5.475 taildiff 0.360663358

Han 0.024±0.027
Iran_Neolithic 0.205±0.057
Onge 0.274±0.038
Sarmatian_Pokrovka 0.497±0.050
chisq 12.517 taildiff 0.0283534925

Han 0.045±0.022
Iran_Neolithic 0.263±0.052
Onge 0.145±0.034
Ulan_IV 0.547±0.043
chisq 8.424 taildiff 0.134346014

Han 0.004±0.026
Iran_Neolithic 0.261±0.055
Onge 0.159±0.036
Sarmatian_Pokrovka 0.576±0.048
chisq 15.002 taildiff 0.0103520115

As far as I can tell right now, the eastern Scythians from Unterländer et al. aren't all that relevant for South Asians. I'll wind things up here with models for a few more populations from Pakistan and India.

Han 0.023±0.021
Iran_Neolithic 0.520±0.047
Onge 0.081±0.033
Ulan_IV 0.376±0.039
chisq 6.495 taildiff 0.261028178

Han 0.028±0.022
Iran_Neolithic 0.563±0.047
Onge 0.061±0.034
Ulan_IV 0.348±0.039
chisq 4.247 taildiff 0.514456854

Han 0.095±0.032
Iran_Neolithic 0.151±0.064
Onge 0.696±0.047
Ulan_IV 0.058±0.053
chisq 5.942 taildiff 0.311882057

Han 0.060±0.033
Iran_Neolithic 0.333±0.066
Onge 0.464±0.049
Ulan_IV 0.144±0.054
chisq 2.644 taildiff 0.754673075

Han 0.022±0.024
Iran_Neolithic 0.692±0.068
Onge 0.013±0.037
Ulan_IV 0.269±0.052
Yoruba 0.004±0.009
chisq 3.455 taildiff 0.484741716

Han 0.020±0.022
Iran_Neolithic 0.202±0.047
Onge 0.421±0.033
Ulan_IV 0.356±0.039
chisq 8.263 taildiff 0.142329696

Han 0.023±0.020
Iran_Neolithic 0.320±0.044
Onge 0.229±0.032
Ulan_IV 0.429±0.036
chisq 6.431 taildiff 0.266528271


Nirjhar007 said...

Thanks bud, R1a didn't arrive in the Subcontinent with Scythians either.

Rami said...

Indo Aryans arrive in South Asia in the Bronze Age mixed with Oxus/BMAC people
Eastern Iranics arrive much later in Antiquity and different from the Yaz Iranians who are forebearers of Plateau Iranian languages.
Those Steppe numbers seem too high and Iran_N numbers way too low, unless there was some massive population displacement and replacement which is unlikely.

jv said...

I see the tested Yamnaya individuals admix. Are any of these results: Yamnaya Russia Kutuluk I, Kutuluk River, Samara [I0444 / SVP 58] M 3300-2700 BC 551,461 R1b1a2a2 (Z2103) CTS1078/Z2103+, L150+, M415+ mtDNA H6a1b. Haak 2015;Mathieson 2015; Sergey Malychev.........He is the gentleman in D. Anthony's, The Horse, The Wheel, The Language page 333. (He was buried in the middle Volga region with the heaviest copper mace or club in the Yamnaya horizon) Thanks for you help, jv

Samuel Andrews said...

David, doesn't Est Asian admixture in Sarmatain replace ASI in South Asians and therefore raise their Sarmatain score?

Gill said...

Davidski, do you suppose the pre-Neolithic inhabitants of India were 100% East Eurasian, perhaps very similar to the Onge?

Anthro Survey said...

Indeed, Rami, Iran_Neo seems quite low for all these groups based on previous estimates and likelihood of mixing with BMAC(as you said) prior to crossing the Khyber pass. ~50% are basically Udmurt levels.

Taymas said...

+1 on Iran_neo that low being unexpected, and I'm someone that considers the steppe hypothesis pretty well-suited to a variety of evidence. Totally prepared to update my expectations.

Davidski, what if there were HGs (with ANE?) in SC Asia that Iran_neo ended up mixing with, an eastern parallel to EEF/WHG? What kind of effect would that have on your tests? Or are your tests showing small enough residuals that you think this is unlikely? Sorry, I still have a lot to learn about the subtleties of these methods.

SC Asia and the subcontinent are climatically different enough I would expect the HGs to be quite different.

Thanks again for this blog, and to other commenters, endlessly fascinating.

Matt said...

High steppe numbers by about +20% and Iran_N numbers of -20% just seems like a generally unusual feature / question of the formal models at the moment. The same balance of Steppe vs Iran_N in SA was present in Lazaridis's 2016 paper.

Using Basal_K7 plus Fst distances together generates best fit with the following as:

Pathan_average: Iran_Neolithic_average 53.25, Scythian_Samara:I0247 37.4, Ami:NA13610 5.5, Andamanese_Onge:ONG-14 3.85

(which works as incrementally closer than Pathan_average: Iran_Neolithic_average 58, Afanasievo_Kuyum:average 29.35, Ami:NA13610 8.65, Andamanese_Onge_ONG-14 4).

Paniya_PNYD3: Iran_Neolithic_average 42.45, Ami_NA13610 32.65, Andamanese_Onge_ONG-14 18.35, Scythian_Samara_I0247 6.55

Matt said...

For models, I wouldn't mind seeing:

Outgroups: Chukchi, Han, Karitiana, Onge, Papuan, Kostenki14, Levant_Neolithic, Iran_Neolithic, Kotias, Villabruna, El_Miron, Karelia_HG, Paniya, Mbuti, Yoruba

"Ancestors": La_Brana, Bichon, Latvia_HG, KO1, Ukraine_HG, Barcin_Neolithic

Targets: Iberia_EN, LBK_EN, Hungary_Neolithic / Hungary_EN, Iberia_MN, Iceman_MN, Esperstedt_MN, Baalberge_MN, Sweden_MN, Greece_LN, Iberia_CA, Remedello_CA, Hungary_CA.

See if any of Lipson 2017 can be validated by updated qpAdm.

Iranocentrist said...

So is Iran Chalcolithic a good surrogate for the non EHG part of Yamnaya after all, David?

Davidski said...


Yes, one of the Yamnaya_Samara samples used above is I0444. He's a very typical example of this population.


David, doesn't Est Asian admixture in Sarmatain replace ASI in South Asians and therefore raise their Sarmatain score?

It replaces their East Asian/Siberian. But they may have got it from their Sarmatian-like Indo-Iranian ancestors. If so, then the model is correct.


Davidski, do you suppose the pre-Neolithic inhabitants of India were 100% East Eurasian, perhaps very similar to the Onge?

I think they belonged entirely or almost entirely to a closely related sister clade to East Asians and Onge. Not sure if this will be classified as East Eurasian when we see their genomes, but maybe.

@Rami, Anthro Survey and Taymas

Just to reiterate what Matt said, the ratios of Iran Neolithic to Bronze Age Steppe (in this case Ulan IV) are very similar to those in recent scientific literature.

However, new ancient samples from Iran and Central Asia may well shift these estimates. Even using Iran_Hotu instead of Iran_Neolithic lowers the steppe input for the Kalash (though not for the Brahmins). But note that the standard errors also shoot up, probably because of the low quality of the Hotu genome.

Han 0.070±0.034
Iran_Hotu 0.401±0.164
Onge 0.102±0.065
Ulan_IV 0.428±0.129
chisq 5.374 taildiff 0.371973057

Onge 0.259±0.084
Han 0.080±0.038
Iran_Hotu 0.186±0.205
Ulan_IV 0.476±0.155
chisq 6.217 taildiff 0.285698915

This is all, of course, part and parcel of the game, so I'm not sure why Rami keeps bleating like a wounded goat.


Ancestors": La_Brana, Bichon, Latvia_HG, KO1, Ukraine_HG, Barcin_Neolithic

You mean at the same time? Not possible. The correlations are too high between the hunter-gatherers, because they're all rich in WHG.

I'd have to run separate tests and check the fits. That would be a fairly big job. Might do it, but not right away. Maybe Chad can do it this weekend?


So is Iran Chalcolithic a good surrogate for the non EHG part of Yamnaya after all, David?

No, it produces a poor fit in the model above.

The most successful references for Yamnaya_Samara that I can find at this time are Kotias_HG, Lengyel_LN and Samara_HG.

For the king said...

Do west Iranics, Brahuis, Balochis, Afghan Pashtuns get any different results with the latest version of qpAdm? I wonder if the Extra East Eurasian in Afghan Pashtuns (compared to Pakistani ones) came from some Scythian groups.

Matt said...

@ Davidski, yeah, I was thinking about separate tests then comparing the chi-square, as in your main post, but messed up my post! I see, it would take longer than I thought. OK, if you (or Chad if he's interested) ever get the time and are intrigued, but understand equally if it's too time consuming.

I was hoping the outgroups should be good for that test, as there are plenty of ancients with information that should hopefully discriminate WHG subclades and offer better chi-square for, e.g. Iberia_EN with La Brana, Iberia_MN with La Brana / Loschbour, etc. Though mainly how El Miron should supposedly be to La Brana (based on f3 sharing), as many of the other ancients (Kotias) might be more of a clade to different WHG.

Balaji said...

These results for the people of the Indian Subcontinent make sense since they are not descended from people of the Steppe and certainly not from Scythians. Instead people of the Steppe such as Ulan_IV and Sarmatian_Pokrovka owe much of their ancestry to the Subcontinent. Sarmatian_Pokrovka are worse in modeling Brahmins than Ulan_IV because of the extra East Asian ancestry that Scythians have and of course the eastern Scythians have even more.

The recent paper on Scythians and the Tweet from Lazaridis makes it clear that the Reich Lab people have given up on trying to model Scythians and people of the Indian Subcontinent as descended from Andronovo. My hypothesis on the origin of the Scythians is that they learned the art of making iron tools from India, spoke Indo-Iranian languages and expanded to eastern Central Asia, mixed with the people there. Then perhaps with horse herding techniques, learned in Central Asia, together with their knowledge of iron metallurgy, they replaced and extinguished the European-derived bronze-age cultures of the Eurasian steppe. Of course, they also mixed to some extent with these bronze-age people such as the Srubnaya.

Davidski said...


Install the latest Ubuntu (you can set up a dual boot with your Windows) and e-mail me. I can send you a dataset and get you running the latest qpAdm within an hour.


Instead people of the Steppe such as Ulan_IV and Sarmatian_Pokrovka owe much of their ancestry to the Subcontinent.

That's not true. It's a fantasy. Ulan_IV and Sarmatian_Pokrovka have zero ancestry from South Asia.

Eneolithic Harappan North Indians have significant Ancestral South Indian (ASI) admixture. I've seen their results. So what you're suggesting is impossible.

Nirjhar007 said...

Eneolithic Harappan North Indians have significant Ancestral South Indian (ASI) admixture. I've seen their results. So what you're suggesting is impossible.

Yes Balaji , he has seen it .

Davidski said...

Then why does he and Jaydeep keep talking about South Asian ancestry in European steppe populations?

Rami said...

Stop being a whiny bitch David, your the one bleeding out of your ass.

There is no way North Indians have Steppe ancestry at levels comparable to NE Europeans you must be smoking a crack pipe if you actually think that, the same modelling has Paniya at 40% Iran_N and 6% Steppe , when its known fact they are 80-85% ASI. Clearly there is more to this. As far Proto Indo Aryans go, they are descended from the same stock as those Androvono Iranians, and their split occurred on the Northern Steppes not in the Yamnaya region.

Can someone please post model results for PJL , Baloch, Sindhis

Nirjhar007 said...

I don't know Dave , perhaps they don't have good sources .

Rami said...

Thanks for posting David :P, see my point. How do those Eneolithic Harappa samples look like.
Almost 40% steppe for Baloch and 35% Steppe Brahui, there is without a shadow of a doubt much more to this and only ancient genomes from SC Asia will resolve this.

Rami said...

Those Onge scores for Sindhis are high same as those UP Brahmins but Sindhis are much more West Eurasian shifted than UP Brahmins. Their Onge scores should be slightly higher than Kalash.

Davidski said...


Those Onge scores for Sindhis are high same as those UP Brahmins but Sindhis are much more West Eurasian shifted than UP Brahmins. Their Onge scores should be slightly higher than Kalash.

Sindhis have a lot of ASI. You must be mixing them up with Balochis. The PJL have even more than Sindhis. I'll post results for them and a few more Indian pops in a few minutes.


a said...

Can you use any of these new tools to run a side by side comparison of R1a/b samples separated by 4/5 thousand years. Khvalynsk R1a\b [M459/L754]------Scythian R1a-2123 and Sarmatian R1b-2109[CTS-1078] To see how they compare to each other in different time frames. They are relatively close in time and geography to each other.

Davidski said...

I've updated the post with a few more models. Check out Gond from South India. Paniya would have even less steppe.


On average R1a and R1b samples look the same when they're from the same populations, and on average different when they're from different populations. For example...

As I've said before, Y-chromosomes aren't linked in any way to genome-wide genetic structure. They only show correlations with it.

a said...

Did R1a/b evolve separate on the Steppe? Both Scythian and Sarmatian are similar in time frame as are Khvalynsk R1a/b[similar elite burial style].Scythian and Sarmatian>> R-Z2123Z2123formed 4100 ybp, TMRCA 3800 ybpinfo,[R-Y21707A12360 * A12369 * A12364+8 SNPsformed 4700 ybp, TMRCA 2800 ybpinfoid:ERR1347675RUS [RU-DA] comparing >>both have AIM marker SLC45A2-rs16891982

Seinundzeit said...

Interestingly, I can replicate these models, using nMonte.

Personally, I think this demonstrates a robust link between these models and historical reality.

South Central Asian Iranics:

Tajik Shugnan

65.45% Sarmatian
28.55% Iran_Neolithic + 2.40% Iran_Hotu
3.60% ASI


Tajik Rushan

59.05% Sarmatian + 4.60% Scythian_Pazyryk
16.50% Iran_Chalcolithic + 14.25% Iran_Neolithic + 3.65% Iran_Hotu
1.95% ASI


Pashtun, Pakistan (Karlani)

48.5% Sarmatian
32.9% Iran_Neolithic + 6.3% Iran_Hotu + 5.8% Iran_Chalcolithic
6.5% ASI


Pashtun, Afghanistan (Ghilzai)

43.25% Sarmatian
38.70% Iran_Neolithic + 9.55% Iran_Hotu
8.50% ASI


So, nMonte and qpAdm agree.

Basically, the eastern Iranic peoples of Tajikistan, Afghanistan, and northwestern Pakistan can be construed as being substantially derived (genetically) from the historical Eastern Iranians of the ancient steppe.

Now, with the Kalash, I get this:

33.8% Iran_Neolithic + 17.5% Iran_Hotu
37.4% Yamnaya_Samara
11.3% ASI


Compared to Lithuanians:

42.95% Yamnaya_Samara
31.20% Loschbour
25.85% LBK_EN


So, again, there is a correspondence between qpAdm and nMonte. Both nMonte and qpAdm have Lithuanians and the Kalasha at around the same percentage amount of Yamnaya-related ancestry.

The exact estimates differ, because nMonte gives lower Yamnaya-related admixture to Europeans, in comparison to qpAdm.

Rami said...

Could you post results for the SA groups David posted .
Kalash have the highest steppe ancestry among South Asians and if I recall the Lazardis paper mentioned them having levels , but I do not see that with North Indians or populations , east of the the Indus, where populations are far more ASI shifted.

Eastern Iranics arrive 1200-1500 years later in Antiquity. Getting results from Dardic groups in Afghanistan would be helpful

Seinundzeit said...


I can arrange for that, just give me some time.

Regardless, I just wanted to make a few quick remarks.

Earlier, you stated that the Paniya are a Negrito-like people, noted that they are 80%-85% ASI, and claimed that they are modeled as 6% steppe with qpAdm or nMonte.

For starters, the Paniya are not a "Negrito" population. They have never been described as such, in anything I've read.

In terms of physical appearance, they don't resemble Andaman Islanders, or the "Negrito" populations of Southeast Asia.

Instead, they look rather similar to "scheduled caste" South Indians.

I mean, this documentary is about them, you can see for yourself:

Again, they look far more West Eurasian-influenced in terms of phenotype, when compared to Andaman Islanders or Southeast Asian "Negritos".

Regardless, with nMonte, they are pretty consistently 60% ASI + 40% West Eurasian. And, the West Eurasian element is always Iran_Neolithic-related, but with more ANE. I don't know why you think this is problematic?

Also, I've never seen the Paniya modeled as 6% steppe, so not sure where you got that number from?

Anyway, as a demonstration, here are the Paniya, using four ASI references:

With Onge

61.4% Onge
33.0% Iran_Neolithic + 5.6% MA1


With Jarawa

59.50% Jarawa
35.55% Iran_Neolithic + 4.95% MA1


With Austroasiatic_Bonda

66.35% Bonda
26.10% Iran_Neolithic + 7.55% MA1


With my ASI simulation

60.15% ASI
31.25% Iran_Neolithic + 8.15% MA1


Beautifully consistent. Always 60% ASI + 40% West Eurasian (and the West Eurasian ancestry is always Iran_Neolithic-related, but with more ANE), no matter what you use.

And yes, the Kalash aren't even technically South Asian.

They actually live in Central Asia. The Pakistani provinces of KPK and Balochistan are, objectively speaking, situated on the "Eurasian plate", while Punjab and Sindh lie on the Indo-Australian plate. So, only Punjabi and Sindhi populations are geographically "South Asian".

Most anthropologists tend to describe the Kalash and the other Dardic peoples of Afghanistan (not to mention Nuristanis) as "Central Asian isolates".

Basically, isolated representatives of ancient Central Asia, with a minimal of later Iranic and Turkic cultural influence. The only times I hear these populations construed as "South Asian" is when geneticists talk about them.

But, the Kalasha steppe element is much more strongly linked with Indians, even ASI-rich Indians, rather than neighboring Pashtuns. It's that shared "Aryan" connection.

Which shouldn't be surprising, as the Dardic languages are "Indian", in the linguistic sense of that term.

So, the fact that they prefer Yamnaya (just like IE Indians), rather than Sintashta/Andronovo/Srubnaya/Sarmatians/Scythians, tells us something about the Indo-Aryans.

Rami said...

Well I don't want to get into semantics , of what should be South Asian and what should not be. South Asian wrt to this whole Yamnaya vs Sintashta thing. As the Indo Aryan sphere is mainly located in Northern India and Pakistan, this does tie Kalash to that and the ethnogenesis follows a similar pattern albeit with more ASI in the mix. Though classically speaking , West of the Indus the populations are definitely in that SC Asian sphere. Though in contemporary Pakistan, groups will mix and they are and this will only increase with time.

Well they are still negrito LIKE look if you look at many of Coon's plates. I guess till they find more archaic West Eurasian proxies from the region , they will have to use Iran_N . The 6% Steppe is from one of the model's Matt used.

jv said...

Thank you DAVIDSKI,
It may be impossible but I would like to find out just when my ancient H6a Grandmothers arrived in the Steppe. Was she part of the Ukrainian Mesolithic in a bone hut(along side R1a)? Did migrate with the Elshanka Culture from Central Asia into Samara in 7000-6500 BCE? Did she come up through the Caucus Cultures? Was she part of the Repin, Khvalynsk, Samara or Seroglazovo Cultures?.....may never know but I sure love the research & recreating Yamnaya/Corded Ware/Srubnaya jewelry.

Chad Rohlfsen said...

I can take a stab at it. It can take quite some time finding the right outgroups. Barcin wouldn't be good to use as it isn't ancestral to the first farmers of Europe. A merge of the Koros and Starcevo samples we have should do fine. I'll report back anything interesting.

Seinundzeit said...


On my part, I just wanted to make a note of the fact that (objectively speaking) the Kalasha and other Dardic peoples in Afghanistan/northwestern Pakistan actually live in Central Asia.

They are geographically Central Asian, not geographically South Asian (but the Yamnaya-Sintashta difference does link them with Indians, rather than with neighboring Central Asians). The same goes for Pashtuns in both Afghanistan and northwestern Pakistan; they are, geographically speaking, Central Asians.

And, in the anthropological literature, Kalasha and other Dards are usually described as being isolated remnants of ancient Central Asian culture, barely influenced by the later Iranian influx, and with virtually no influence from the much later Turkic expansions.

Also, I should note that mixture in Pakistan isn't as pervasive as you seem to think.

For example, Pashtuns are viewed rather suspiciously by some Punjabis, who often tend to associate Pakistani Pashtuns with terrorism/violence/supposedly "primitive tribal custom", and also have stereotypes of Pashtuns as being "all brawn, but no brains".

For their part, some tribal Pakistani Pashtuns often associate Punjabis with traits like "effeminacy", "softness" (whatever the hell that means. Not even kidding, "soft" is the literal translation of a Pashto term I've heard used often, in regard to Punjabis), "arrogance", "decadence", etc.

It's all very stupid, and totally nonsensical. But, that's how things are IRL.

No doubt, there is mixture in cosmopolitan/urban settings. You'll find many people of mixed Pashtun-Punjabi heritage, just like how in urban Afghanistan you'll often find people of mixed Pashtun-Tajik heritage, or mixed Pashtun-Uzbek heritage.

But, there is also no doubt that rural Pashtuns, rural Balochis, rural Punjabis, rural Sindhi, etc, have a very strong preference for marrying people of their own ethnic background.

Basically, in the Pashtun tribal belt, and in the villages of Punjab and Sindh, inter-ethnic mixture is extremely rare.

And, with regard to the Paniya, all I'll say is that they physically just look like other South Indian populations.

In my modelling, South Indians tend to be 45% ASI, while the Paniya are around 60% ASI, so it isn't surprising that they resemble other South Indians when it comes to facial features (only a difference of 15% extra ASI).

Anyway, enough with this sort of discussion; here are the results you wanted.


46.80% Iran_Neolithic + 12.85% AG3
33.35% ASI
7.00% Yamnaya


42.80% Iran_Neolithic + 11.05% AG3
28.40% ASI
17.75% Yamnaya

With these, I do see some divergence between qpAdm and nMonte.

Although both qpAdm and nMonte agree in having the Kalasha at around the same levels of Steppe_EMBA as Northern/Eastern Europeans, qpAdm shows much higher steppe admixture for South Asians proper, when compared to nMonte.

Personally, I don't think we can truly know which output is more accurate. It's a matter of more aDNA.

The aDNA will be coming sooner, rather than later. So, I don't see why you guys get so heated over this topic, and immediately start throwing around the expletives.

Matt said...

@ Davidski, offer is definitely much appreciated. I've had a few problems in the past with dual booting Linux before. I might look into getting that fixed and get back to you on that in near future.

(For the tldr, essentially, had a good setup with dual booting Ubuntu just to run basic ADMIXTURE and D-stats for myself. This was a little before the adna autosomal revolutions of the last few years, just for curiousity. Then some system problems prompted system recovery and I've had some problems getting a dual boot of Linux booting since then, so being lazy gave up on it.)

@ Chad, sounds good if you have time. Lipson suggests Koros_EN and LBKT_EN might be good as they are described to have respectively 0.0 +/- 1.2% HG and 0.8 +/- 0.9% HG where Starcevo has 2.3 +/- 1.1%. Might help for LBK_EN which only has 4.2 +/- 0.6% in their estimates.

Rami said...

@sein Whats becoming more and more apparent is the local South Asian hunter gatherer population was not just a monolith of just ENA, your own tests show that. Based of what Lazaridis said Paniya are 80-85% ASI , but clearly even that ASI has a good amount of very archaic West Eurasian component. Contemporary South Indians do not look like Paniya, they may share a similar skin tone but features wise they differ considerably,as Paniya facially are mainly a mix of Veddid,Paleo Mongolids and Negrito and lack the Indid/ the local mediterranid element.

I agree on populations like Kalash having high steppe ancestry, ditto Pashtuns/Tajiks and to a lesser degree some groups originating in the Potohar.

Those models for PJL and the Brahmin make more sense. Fits for SC Asians also are much better as they have much less ENA ancestry.

Yes , classically Pashtuns view Punjabis or people in the plains as "Dal Khors" , ironically because of poverty and war, many are eating it themselves. Urbanization and a common religion have allowed for much more mixing in cities, in villages, endogamous and tribal views remain, though I don't think it is not as rigid as the Caste system. Though Pashtuns do not have much issue with men marrying non pashtun women, but Pashtun women marrying non pashtun is very rare, and prohibited, even in Afghanistan. Pashtun/Uzbek marriages are becoming more common in the North, in cities Tajik/Pashtun marriages especially among educated people are fairly common.

Alberto said...

I seem to recall that the best fits for Yamnaya came when including WHG? With nMonte/PCA data it was always like that too: EHG + WHG/SHG + CHG. I wonder if including Latvia_HG in the last model (instead of Lengyel_LN) improves the fit as it does using Global 10.

Rob said...

@ Sein

Are you suggesting that sarmatians are actually the best fit for Iranians, or were you performing a hypothetical exercise ?

Rob said...

@ Alberto

Yes; with PCA the best fit for Yamnaya is simply

CHG, EHG & Latvia HG

Curiously, a paper by Shishlina has recently re-dated Khvalysnk to 4200-3700 BC.

The one CHG shifted individual was haplogroup Q; and he even shows some Afantova Gora admix .

Olympus Mons said...

Ah, Ok. so 4000...plenty of time from 4900bc to admix with EHG people. so a great, great great child of a shulaveri. :=) - Makes sense. 5000bc not that much.

Rob said...

@ OM
Yes it fits chronologically; but the chances of L23 being from south of the Caucasus look small at present

Seinundzeit said...


I find that all Iranian peoples (but especially Eastern Iranians) have a preference for the Sarmatian samples, or show a mix of Sarmatian + Srubnaya/Andronovo + Yamnaya.

By way of contrast, Indo-Aryans only gravitate towards Yamnaya, even the ones that are very different from South Asians proper (for example, the Kalash). They never take any Sarmatian/Srubnaya/Andronovo, if Yamnaya is in the mix.

Although, if my memory serves me right, when one adds Poltavka, Indo-Aryans prefer those samples to Yamnaya. That could be significant, I guess.

Although, I also want to eventually examine the picture using the Srubnaya_outlier sample.

I mean, it's an interesting sample, and it seems that people like her had an important (genetic) role to play in the ethnogenesis of western Scythians (and the Sarmatians), at least in the tests I've tried.

The only thing that gives me pause (when it comes to adding her into the modelling), the fact that we don't really have a solid handle as to what sort of population she represents.

At the moment, I think she might have been a mix between some ANE-related population (from anywhere between the Urals and Central Asia proper) and Steppe_MLBA (so, ANE + Sintashta/Andronovo/Srubnaya), but with some additional West Asian/Caucasus admixture.

We need to find more samples like her, in order to put her in proper context.

Rob said...

Yes for my end the best match with PCA data is still Srubnaya outlier for both IA and Iranians; although much attenuated for the latter (with a peak of 15-18% in Zoroastrian). Sarmatians really don't seem to feature- even with Subnaya Outlier left out ("regular" Srubnaya slots in).

Historically, i don't think Iranians deriving from sarmatians sounds very sound

Davidski said...

All Eastern Iranians, ancient and modern, seem to be very closely related. So even though Sarmatians aren't ancestral to Pathans and Tajiks, they're close in space and time to the ancestral group for all Eastern Iranians, and that's why they're a good reference pop for Pathans and Tajiks.

The Sarmatians don't produce very good models for Western Iranians, but that's to be expected. Western Iranians prefer Sintashta.

Rob said...

A good read , pp 47->

For the king said...

East Iranics like Pashtuns and Tajiks have much higher East Eurasian(Siberian-Mongol and possibly Saka related) ancestry than west Iranians, that's probably why west Iranics don't fit that well with east Eurasian admixed steppe groups.

Davidski said...

That's true, although keep in mind that since the Sarmatians produce good fits for modern Eastern Iranians, especially the least admixed, Pamir Tajiks, then not only is their West Eurasian ancestry a good match, but their East Eurasian ancestry is too.

So a fair whack of the East Eurasian admixture in Eastern Iranians is probably from their Eastern Iranian ancestors from the steppe.

Moreover, Western Iranians do have minor East Eurasian ancestry, but it's more East Asian shifted than what we see in the Sarmatians, probably because its derived from post-Sarmatian Turkic population movements.

Rob said...


Which particular sarmatian is best quality to use in runs ?

Davidski said...

I0575 is higher coverage than I0574, with almost twice as many markers available. So it's probably better to run the former in G10/nMonte analyses, or use the average coordinates of both.

Seinundzeit said...


David pretty much hit the nail right on the head; I couldn't have said it better myself.

For the king,

Tajik and Pashtun East Eurasian admixture is usually much more genetically "northern", compared to the later Turkic admixture seen in West Asia (more Siberian/Native American-like, rather than like the Mongola/Altaians), so it probably represents steppe Iranian ancestry.

Rob said...

@ Sein

I'm happy Dave articulates it so well for you :)

But I thought it would be interesting to compare apples with apples (i.e. PCA nMonte). I here used the weighted approach also, and have included the (good) Sarmatian individual as a source. The results are also the same without the Sarmatian. The Sarmatians doesn't form a part of Iranian ancestry, east or west:

Iran_Chalcolithic:I1665 57.6 %
Srubnaya:I0431 26.9 %
Jordan_EBA:I1706 7.85 %
Iran_Neolithic:I1290 4.55 %
Paniya 2.75 %

Iran_Chalcolithic:I1665 62.25 %
Srubnaya:I0431 15.05 %
Jordan_EBA:I1706 9.05 %
Srubnaya_outlier:I0354 8.85 %
Paniya 3.1 %

Srubnaya_outlier:I0354 28 %
Armenia_EBA:I1635 23.3 %
Paniya 16.05 %
Iran_Neolithic:I1290 14 %
Iran_Chalcolithic:I1665 11.55 %
Srubnaya:I0431 5.4 %
Dai 1.7 %

Srubnaya_outlier:I0354 40.3 %
Iran_Chalcolithic:I1665 18.75 %
Armenia_EBA:I1635 16.8 %
Paniya 14.95 %
Kotias:KK1 6.55 %
Dai 1.75 %

Throwing in a couple of I-A:

Paniya 33.65 %
Iran_Neolithic:I1290 31.95 %
Srubnaya_outlier:I0354 21.15 %
Srubnaya:I0431 13.25 %
Yamnaya_Samara:I0231 0 %

Srubnaya_outlier:I0354 30.3 %
Iran_Neolithic:I1290 27.35 %
Paniya 18.65 %
Armenia_EBA:I1635 9.65 %
Kotias:KK1 9 %
Srubnaya:I0431 5.05 %

So East Iranians, like I-A, prefer Srubnaya Outlier, whilst Western Iranians regular Srubnaya. Could this represent two migratory routes ?

The Sarmatians:

Srubnaya:I0431 48 %
Srubnaya_outlier:I0354 14.85 %
Andronovo:RISE505 11 %
Kotias:KK1 8.35 %
Okunevo:RISE516 7.75 %
Barcin_Neolithic:I1099 4.2 %
Itelmen 3.25 %

Srubnaya:I0431 36.9 %
Yamnaya_Samara:I0231 30.45 %
Itelmen 8 %
Iran_Neolithic:I1290 6.9 %
Hungary_N:I1496 4.85 %
Srubnaya_outlier:I0354 4.7 %

So whilst it's clear that they share large chunk of common steppe ancestry, the question is when Sarmatians & other Iranians diverge from each other. Probably before the Sarmatian period, more like the MBA..

Rob said...


Armenia_EBA:I1635 54.55 %
Iran_Chalcolithic:I1665 30.1 %
Srubnaya_outlier:I0354 9.9 %
Hungary_N:I1496 5.35 %

Seinundzeit said...


Just to keep things in proper perspective: I always find myself rather stunned, when it comes to your unbridled brilliance, and intensive analytic depth.

Sometimes, it just gets plain overwhelming. I mean, how do you properly manage that high a level of archeological/historical/anthropological/genetic knowledge, which you so obviously have right in-between your fingertips?

With that much intellectual heft, all in the hands of a single man, one is only (quite naturally) forced to grow envious. ;-)

Anyway, I can't replicate your results.

So, here is a quick analysis of different Iranian peoples, with the addition of an isolated Central Asian Indo-Aryan population (the Kalash), all for your viewing pleasure.

I took some time to pick the right references, and I've begun to implement Huijbregts' suggestions. Please, do refer to his comments, at Anthrogenica.

Also, this should go without saying, but I’ll say it anyway: everyone was tested under the same conditions (same reference populations, same dimensions, etc). And yes, I used the higher quality Sarmatian sample.

Please, see below.

Seinundzeit said...

Central Asian Iranians (all Eastern Iranian speakers):


37.5% Sarmatian + 11.5% Srubnaya_outlier

48.8% Iran_Chalcolithic
2.2% Altaian



20% Srubnaya_outlier + 14% Srubnaya + 13.3% Sarmatian + 6.65% Scythian_Pazyryk

41.5% Iran_Chalcolithic
2.9% ASI
1.55% Altaian

Distance=0.0303 (overfitting)


32% Sarmatian + 24.55% Srubnaya_outlier

35.7% Iran_Chalcolithic + 2.25% Iran_Neolithic
5.5% ASI

Distance=0.0767 (again, overfitting, but I was trying to replicate your results, via the use of multiple steppe references. So, I had no choice)


32% Srubnaya_outlier + 14.75% Sarmatian

38.95% Iran_Chalcolithic + 3.95% Iran_Neolithic
8.7% ASI
1.65% Altaian


Pashtun, Pakistan (Karlani tribal confederacy, speaks an archaic dialect of Pashto, one which has substrate influences from an older Eastern Iranian language. The older Eastern Iranian language is still spoken in the vicinity of his tribal territory)

24.7% Sarmatian + 14.2% Srubnaya_outlier + 4.35% Srubnaya

25.5% Iran_Chalcolithic + 23.4% Iran_Neolithic
7.85% ASI


Pashtun, Afghanistan (this individual is of the Durrani tribal confederacy)

22.5% Sarmatian + 13.65% Srubnaya_outlier

28.95% Iran_Neolithc + 26.4% Iran_Chalcolithic
5.9% ASI
2.6% Altaian


West Asian Iranians (all Western Iranian speakers)

Persians (no clue about the geographic origins. Anyone who knows should chime in)

76.1% Iran_Chalcolithic
15.7% Srubnaya
5.8% Altaian
2.4% Dinka



73.45% Iran_Chalcolithic
22.15% Srubnaya
2.9% Altaian
1.5% Dinka


The Aryan side of the picture (Kalash)

21.4% Srubnaya_outlier + 19.75% Poltavka

47.1% Iran_Neolithic
11.75% ASI


Seinundzeit said...


Lol Rob, jokes/sarcasm/ball busting aside, lets shift gears, and get serious for a moment (something which is exceedingly difficult for me. If you knew me IRL, you’d know that I got a tight lock on the world title for “sarcastic a-hole who doesn’t take anything too seriously”). This is the general picture I’m seeing.

Western Iranians have minor steppe admixture, and it is pretty much “Steppe_MLBA”-related, nothing else.

By contrast, Eastern Iranians have loads of steppe admixture. In most cases, they are largely steppe-derived, and they do have a preference for the Sarmatian/western Scythian samples, along with the Srubnaya-outlier.

This is quite interesting, because it seems that the Srubnaya_outlier already has some sort of relationship with the Sarmatians and western Scythians.

And, the closest modern populations we have to the ancient Indo-Aryans (the Kalasha of the Hindu Kush) are quite unique in this context. They have the same amount of “Steppe_EMBA” as Northern/Eastern Europeans, and it seems that their steppe ancestry is a combination of Yamnaya-like (Poltavka samples are carbon copies of Yamnaya_Samara, with exception to one Steppe_MLBA-like outlier) and Srubnaya_outlier-like ancestral streams.

Basically, Eastern Iranians, Western Iranians, and the Indo-Aryans have different kinds of steppe ancestry.

And, the western Scythians/Sarmatians do show a heightened relationship to all Eastern Iranians, whether they be Pamiri speakers, or Pashto speakers, or speakers of the Ormuri/Parachi langauges.

This should be of no surprise (at all), because Scythians and Sarmatians spoke languages very closely related to contemporary languages like Pashto, the Pamiri cluster, etc. This has been scholarly consensus for quite a while now.

Finally, with regard to this statement you made:

“The Sarmatians doesn't form a part of Iranian ancestry, east or west”

I do have to go back to what David said, because he really did hit the nail right on the head, and I can’t be as concise as he is:

“All Eastern Iranians, ancient and modern, seem to be very closely related. So even though Sarmatians aren't ancestral to Pathans and Tajiks, they're close in space and time to the ancestral group for all Eastern Iranians, and that's why they're a good reference pop for Pathans and Tajiks.

The Sarmatians don't produce very good models for Western Iranians, but that's to be expected. Western Iranians prefer Sintashta.”

In all seriousness, this sums things up pretty nicely.

Rob said...

Are you saying you were kidding about me being brilliant ? :(

"So even though Sarmatians aren't ancestral to Pathans and Tajiks, they're close in space and time to the ancestral group for all Eastern Iranians, "

I agree. It gels nicely.. and Ill look up for Huji's comments.

Did you know there is a 20 ky BP sample from Krgyzstan being analyzed ?

Seinundzeit said...


Of course not. :)

Rather, the first part of my comment was couched in overly exaggerated terms, so that I could generate a vibe of facetiousness (lol).

But, that doesn't change the fact that you really are brilliant. Obviously, you must be a smart dude, or else you wouldn't be able to discuss/debate the things you discuss/debate.

And yeah, I've heard about this sample. Can't wait to see the results. Right now, I'm betting on it being ANE-related.