Wednesday, August 8, 2018

The South Asian cline that no longer exists

Before the Indo-Europeans and Austroasiatics got to South Asia, probably well within the last 4,000 years, it's likely that all of the genetic variation in the region basically sat along a genetic cline devoid of any Bronze Age steppe and Southeast Asian ancestry, like the one in the Principal Component Analysis (PCA) below running from the Paniya to the "Indus Periphery" ancient sample Shahr_I_Sokhta BA2.

Note that almost all of the South Asian populations, including the Iron Age (IA) Swat Valley groups, are clearly peeling away from the said cline towards the Tajiks, in other words towards Central Asia. This is a reflection of the widespread presence of Sintashta-related steppe admixture among South Asians, especially those speaking Indo-European languages. Moreover, the Bangalis and Burushos are being pushed towards the top left of the plot as a result of East Asian-related ancestry. In the case of the former, this is largely due to gene flow from Austroasiatic groups.

It'll be interesting to see how ancient Harappans behave in this analysis. I'm betting that they'll be very similar to the Indus Periphery trio, although judging by the latest press report on the topic (see here), the Harappan samples from Rakhigarhi might be shifted much closer to the Paniya as a result of a higher ratio of indigenous South Asian ancestry.

The PCA is based on my Global25 test. If you're South Asian and in the possession of Global25 coordinates, you can add yourself to this plot using the datasheet available here. Plug the datasheet into the PAST program (freely available here), select all of the columns, and go Multivariate > Ordination > Principal Components (PCA).

Update 10/08/2018: I managed to almost reproduce my PCA with a graph based on D-stats of the form D(Mbuti,X)(Onge,Ganj_Dareh_N)/D(Mbuti,X)(Ganj_Dareh_N,Sintashta_MLBA). Admittedly, Gonur2_BA didn't want to cooperate by pushing slightly up and away from the ghost South Asian cline. But this may have been due to a lack of data or perhaps minor admixture (keep in mind that this sample is actually from Turkmenistan and not South Asia). However, combining all three of the Indus_Periphery individuals worked well enough. The relevant datasheet is available here.

Davidski said...

How would we show the same thing with formal stats, say, using a linear model? Any ideas? It should be possible.

Unknown said...


I do find the position of Saidu Sharif interesting in this model.

That aside, I find it interesting that there doesn't seem to be much pre-IA substructure outside the Iran_Neolithic cline. Does that indicate a fairly homogenous, or mobile, AASI population?

Davidski said...

I think you might be referring to Saidu_Sharif_IA_o, which is the outlier with a lot of AASI ancestry.

And it's hard to say much about AASI for now. But it seems to have been a fairly homogeneous population, as far as I can see from looking at groups like Paniya with a lot of AASI ancestry.

Purple Yellow Floral said...

" In the case of the former, this is largely due to gene flow from Austroasiatic groups."

this is a good bet. but i'm not sure this is totally true. when i looked with treemix it seems that there is a lot of flow from northern groups into bengalis, and some of the mtDNA and Y also suggest this. basically a lot of it seems to be from *burmans*, who are a mix of older austroastic (who had mixed with the negrito-like substrate) and newcomers from the north who brought burman language.

and if it's austroasiatic, i think it's something like khasi, and not munda. the reason is i checked the Y and mtDNA, and the bengalis are balanced (at least 1000 genome south asian bengalis) for east eurasian lineages. the munda have almost no east eurasian mtDNA, and way way more Y.

Matt said...

@unknown, I'm kind of hopeful that some of what look like apparent huge "founder effects" present in some high AASI groups in South Asia will turn out to be long term, pre-2000BCE population structure in South Asia. It looks like there are a lot of groups with high local HG survival in South Asia relative to even the maximum Europe today, so I am optimistic that the forces leading to very structured population in South Asia today have preserved some of the old diversity.

But I think it's very unclear so far - if there was structure in South Asia, looks like it was largely a clade to the rest of the world. So is most of I think the structure we see among European HG though (e.g. although groups like Ukraine_Meso and Iron_Gates_HG do have some ANE ancestry, there is some Fst between Ukraine Meso and WHG that can't be explained by ANE, as well, and is hard / impossible to detect through outgroups).

Though looking at Fst from ancient samples from Europe, just as a tangent, it's hard to split apart recent NE and NW Europe using the wealth of ancient samples we do have, without using the Baltic_BA reference. Including Baltic_BA allows us to exploit that NE Europeans higher affinity to BA Baltic relative to expected based on (high) affinity to Yamnaya and WHG, while NW Europeans have higher affinity to WHG and Yamnaya than expected based on (still high) affinity to Baltic BA.

Without the Baltic_BA it's hard to split using Fst scores from Euro HG, Yamnaya, Barcin, even including other Bronze Age European references. f3 is even more difficult, since there's more overlap there (with Brit Isles / Northern Germanic having higher f3 with Villabruna references than Ukraine / Slovaks do, unlike what is the case for Fst, and Barcin / EEF f3 also being more evenly distributed among Europeans.) So there is questionable contribution of pre-BA structure there (but like I say, the case may be stronger for some contribution of surviving structure in South Asia).

(Some random attempts to use Fst scores to split European / West Eurasian populations, and I find Baltic_BA is pretty essential to split NE Europe:

I think a lot of the attempts to work out Indian ancestry so far, although generally basically right (there *was* some expansion from the western steppe via Central Asia), show that attempting to work out ancestry with long distance adna is difficult (e.g. some of early models including the model from Lazaridis 2016 which overshot high levels of Yamnaya / Sintashta ancestry in South Asia).

Davidski said...


It's possible that the Harappan genomes will shift the estimates of steppe ancestry for at least some South Asian populations, although probably not in a huge way.

Mike the Jedi said...

Thank you for this post. This particular PCA is a very helpful visualization. You can imagine it as another "fateful triangle" with AASI-rich at the left point, Iran farmer-rich at the bottom right point, and steppe-rich at the top right point. Or better yet a quadrangle when you consider the unseen top left East Asian pole influencing some populations, which you of course noted.

I'm not expecting any surprises with the Harappans, but I am very curious about what the average Iran farmer-native HG proportion should ultimately turn out to be. Only one sample is disappointing but I guess it's better than nothing.

I know there are Mesolithic remains from Pakistan in a museum somewhere, and I really hope it is possible to test them one day. It's probably our best chance at getting a "pure" AASI genome.

Jaydeep said...

Regrettably the Rakhigarhi paper is going to be based on a single low coverage individual and it is not going to help shed light on anything.

I don't know why they are doing this. Obviously they will say that rest of the samples were contaminated but quite frankly I do not trust these people anymore.

Davidski said...


This Harappan sample isn't exactly crucial, certainly not to the big picture for Bronze Age Eurasia, which has been very clear for a while now.

All it's going to do is to add a little more precision to our models and expectations for the future.

You just have to accept reality now. The agenda that you were pushing had nothing to do with the truth, and you'll have to admit that sooner or later.

Jaydeep said...

Trust me, it is going to create more uncertainty. You are not going to get what you are looking for. The rest we can discuss when the paper comes out.

Cristy Ganesh said...

"you are not going to get what you are looking for"
Jaydeep ji, you have telling this for years but every time Davidski seems wining. He is a born winner.

SGR Ram said...

I don't know why davidski is very much confident that rakhighari would
cluster with Paniya/puliyar? I think harappans(on whole) would be with significant Iranian ancestry with some 30-35% AASI.
Modern south indians[middle caste] have significant iranian ancestry[40-45%] proves it. This is my hypothesis.
Results may show different. who knows? It even may show steppe ancestry.

Davidski said...

@SGR Ram

Here's what I said...

I'm betting that they'll be very similar to the Indus Periphery trio, although judging by the latest press report on the topic (see
here), the Harappan samples from Rakhigarhi might be shifted much closer to the Paniya as a result of a higher ratio of indigenous South Asian ancestry.

Arza said...

Archaeologists found traces of submerged Stone Age settlement in Southeast Finland

A prehistoric settlement submerged under Lake Kuolimojarvi provides researchers with a clearer picture of the human occupation in South Karelia during the Mesolithic and Early Neolithic Stone Age (about 10,000—6,000 years ago) and opens up a new research path in Finnish archaeology.
This means that a huge and largely untapped archaeological resource is hidden in Finnish lakes. Moreover, extremely old organic materials may also have been preserved in these environments for thousands of years

Matt said...

@Davidski, yeah, I pretty much agree about the new paper.

Cool new datasheet btw, much appreciated the effort in expanding the sample set.

I had a go at removing the obviously East Eurasian influenced populations and adding in some ancient references, then PCA reprocessing:

Though this is just for curiosity's sake and not the sort of thing worth much discussion, ot kind of does seem like there is some additional dimensionality does exist in Global 25 that can begin to be picking out some potential fineer distinctions, where the Indus Periphery are pointing more specifically at Sarazm_Eneolithic than Ganj_Dareh, and some other population (Kalash? Iron Age samples?) are pointing at Sarazm_Eneolithic+Steppe_MLBA, while Makrani+Balochi+Brahui pointing to some degree more to more influence by Western Near East related samples.

Samuel Andrews said...

When using nMonte with G25, I get lots of Onge stuff for Austro Asiatic Indians, more than what southeast Asians get.

Aniasi said...

Can someone explain how Sarazm fits in? I thought they were just west Siberian hunter gatherers that were ANE?

Davidski said...


Sarazm_Eneolithic is a less Anatolian shifted version of Iran_N, but also with a decent chunk of Botai/West_Siberia_N ancestry. However, it doesn't appear to be relevant to most of South Asia, and won't be unless the Harappan sample(s) come back with a lot of this type of ancestry.

If they do, then that will probably impact negatively on the level of steppe ancestry in South Asian populations, because of that ANE-rich, EHG-like input in Sarazm_Eneolithic.

But this isn't likely to be the case, because Indus_Periphery doesn't show a close relationship to Sarazm_Eneolithic, except in the sense that it probably derives from the same Central Asian farmer population, because it lacks the significant Siberian-related input that Sarazm_Eneolithic has.

In any case, even Sarazm_Eneolithic can't explain the widespread presence of Sintashta-related/Steppe_MLBA ancestry in South Asia, as well as the high levels of R1a-Z93 there, so it's not a game changer no matter what.

Davidski said...


I managed to basically reproduce my PCA with a graph based on D-stats. Check out the update above. Woohoo!

Aniasi said...

Thanks! One more question.... I thought EHG was a mix of WHG and ANE?

Davidski said...


I thought EHG was a mix of WHG and ANE?

Something like that, but probably not a direct mix, just an intermediate population within what was a North Eurasian forager cline.

Garvan said...

I think Matt posted computed coordinates for “deeply diverged” South East Asian, but I have searched and can’t find these now. I would like see where they sit relative to the South Asian cline above. Does anybody recall this post, or still have the coordinates? Or have I miss-remembered?

Arza said...

@ Garvan



Garvan said...

Thanks Arza.

This paper may also be of interest to some: “Prehistoric peopling in southeast Asia -- genomics of Jomon and other ancient skeletons”.

I have only read the authors summary at

They have this to say:
“Group 1 contains Hoabinhians from Pha Faen, Laos, hunter-gatherers (~8000 years ago),
To our surprise, group 1 has higher genetic affinities with Ikawazu Jomon individual (Tahara, Aichi), a female adult, than other present-day Southeast Asians.”

Arza said...

So (Mbuti,X)(Onge,Ganj_Dareh_N) confirms that there is ~20-30% of Iran_N in Paniya as in my old G10-based model. Cool.

Paniya:PNYD3 - 20% Iran_N:


Quick test:

AASI_20_test 59.4%
Cambodian:HGDP00711 33.2%
Ganj_Dareh_N 7.4%

Distance 1.8027%


Paniya:PNYD3 66.5%
Cambodian:HGDP00711 33.5%

Distance 3.4594%

Davidski said...


I added the "archaic" Indo-Aryan Kho_Singanali to the D-stats graphs. Their position is very similar to that in the PCA, and confirms their relatively high level of Sintashta-related ancestry.

Mike the Jedi said...

Is the Punjabi Jatt in your D-stat runs from the Global25? Most of the Lahore samples seem to have quite a bit of AASI ancestry and I'm not quite sure how representative they are of Punjabis as a whole, especially considering the results I've seen from Punjabi members on AG who score much lower AASI than the public samples. It would be interesting to know how this diversity among Punjabis is distributed. Is it purely geographic?

Davidski said...


I'll be putting these Punjabi Jats into the Global25 datasheets later today or tomorrow.

They are indeed very different from the Punjabi Lahore samples from the Human Origins, but I don't know why? All I know is that the individuals from Lahore are rather unusual for Pakistanis in terms of their high levels of AASI ancestry.

SGR Ram said...

@arza by using (Mbuti,X)(Onge,Ganj_dareh) can you please give us Iran_N for Piramalai and Gujarat Brahmans?

Davidski said...


Brahmin_Uttar_Pradesh, Gupta and Punjabi_Jat are now in the Global25 datasheets.

Mike the Jedi said...

^ Thank you, Dave.