search this blog

Wednesday, July 8, 2015

Another look at the ancient mtDNA from Xiaohe, Tarim Basin

BMC Genetics has just published a new paper on the famous Tarim Basin mummies. It's a bit of a shame that it only deals with their mtDNA. Here's the abstract:

Background: The Tarim Basin in western China, known for its amazingly well-preserved mummies, has been for thousands of years an important crossroad between the eastern and western parts of Eurasia. Despite its key position in communications and migration, and highly diverse peoples, languages and cultures, its prehistory is poorly understood. To shed light on the origin of the populations of the Tarim Basin, we analysed mitochondrial DNA polymorphisms in human skeletal remains excavated from the Xiaohe cemetery, used by the local community between 4000 and 3500 years before present, and possibly representing some of the earliest settlers.

Results: Xiaohe people carried a wide variety of maternal lineages, including West Eurasian lineages H, K, U5, U7, U2e, T, R*, East Eurasian lineages B, C4, C5, D, G2a and Indian lineage M5.

Conclusion: Our results indicate that the people of the Tarim Basin had a diverse maternal ancestry, with origins in Europe, central/eastern Siberia and southern/western Asia. These findings, together with information on the cultural context of the Xiaohe cemetery, can be used to test contrasting hypotheses of route of settlement into the Tarim Basin.

Five years ago some of the same scientists published a paper on an older set of human remains from the same burial site, and found that all of the males belonged to Y-chromosome haplogroup R1a (see here). Last year one of them apparently left a comment under that paper saying this:

Our results show that Xiaohe settlers carried Hg R1a1 in paternal lineages, and Hgs H, K, C4, M* in maternal lineages. Though Hg R1a1a is found at highest frequency in both Europe and South Asia, Xiaohe R1a1a more likely originate from Europe because of it not belong to R1a1a-Z93 branch (our recently unpublished data) which mainly found in Asians.

So I'm pretty sure another paper is on the way. But hopefully the data will include much more than just broad Y-haplogroup classifications. A few full genomes from several layers of the Xiaohe cemetery would be really nice.


Chunxiang Li., Analysis of ancient human mitochondrial DNA from the Xiaohe cemetery: insights into prehistoric population movements in the Tarim Basin, China, BMC Genetics 2015, 16:78, doi:10.1186/s12863-015-0237-5

See also...

Lots of ancient Y-DNA from China

Bronze Age Tarim Basin Caucasoids belonged to Y-haplogroup R1a1a


Nirjhar007 said...

Exciting David.

Anonymous said...

I think this is some evidence for my theory that 'Indian' populations used to range much further beyond the Himalayas. I personally find it ridiculous that they are calling U7 and U2 West Eurasian, since several deep root clades are found in South Asia. The same thing applies for R.

Fanty said...

"find it ridiculous that they are calling U7 and U2 West Eurasian, since several deep root clades are found in South Asia. The same thing applies for R."

Well the term "Eurasia" is possibly problematic because it has 2 different meanings and both do not fit perfectly.

Eurasia meaning:

a) Region where Europe and Asia connect.

b) Europe is not a continent, its a western peninsular of a continent called "Eurasia". This sometimes leads to the interpretation that all the landmass of Europe and Asia combined can be called "Eurasia" but India aswell as Arabia are not part of "Eurasia" but are seperate continents.

But blabla....

I understand the "Western Eurasian" concept here as synonym for: Caucasoid (Europe+ Middle east + India), oposed to Eastern Eurasian (Northeast Asia + Southeast Asia)

Nirjhar007 said...

Goodness Gracious:-O

Alberto said...

Yes, it's interesting to see South Asian mtDNA there. Let's see if we get some autosomal data soon.

Anonymous said...


I generally think that West Eurasian excludes Indian components, unless they are specifically Central Asian or Middle Eastern elements (ie ANI & ANE)

Also, I think that West Eurasian is still a better term for Central Asian, Middle Eastern and European populations, which are more closely related genetically, than 'Caucasoid', which refers to a skull type and has nothing to do with ancestry (ie Kennewick man or Kostenki)

Krefter said...


U2e is popular in Mesolithic Russia, Yamanya, Andronovo, Corded Ware, etc. Most South Asian U2 falls under differnt subclades, but U2e does exist. So, this U2e is probably from the same source.

Krefter said...

"Importantly, it contains the oldest and best-preserved mummies so far discovered in the Tarim Basin, possible those of the *earliest people to settle the region*."

What the heck? It doesn't take a genius to figure out there were people living there long before 3800BP.

Davidski said...

Are you saying there were settlements in the Tarim Basin before the ancestors of the earliest mummies arrived there from the steppe? I don't know of any. I don't think there were even any hunter-gatherers in the area before 4000 BP.

Anonymous said...


Didn't see that. Kept reading it as c for some reason. Nonetheless, while you are quite right about U2e, its deep divergence and the fact that most other clades are rooted in South Asia seems to point to U2 being of Subcontinental origin.

I personally think that U2e is a remnant of an Indian-like population that was displaced during the mesolithic, especially since all of its sister clades are South Asian. It would explain the ASI-like component in Kostenki, and could explain the distribution of some Indian MtDna clades.

Davidski said...

I think you're missing the point of what the authors are saying.

They're not saying that U2 and R have deep origins in Europe. What they're saying is that the U2e and R among the Tarim Basin mummies very likely comes from Europe.

The reason they're saying this is because the Bronze Age Tarim Basin population, especially in the early phases of the settlement there, looks like a typical population from Bronze Age Central and Eastern Europe, with lots of Y-DNA R1a and mtDNAs T2, U2e and R.

In other words, claiming that they're a native Asian population by invoking arguments about the deep phylogenies of these uniparental markers won't work, because we now have evidence that these Bronze Age Europeans were indeed European in terms of overall genetic structure, and so were their descendents in Central Asia belonging to the Andronovo and Afanasievo cultures.

And yes, I know we all come from Africa if we go back far enough. But that's also not relevant.

Anonymous said...


I understand that is their argument, but I am disagreeing with their certainty. Since M5 and M25 are both undoubtedly South Asian, U7 isn't found in other Bronze Age Steppe populations, and these are found in the oldest layers, I am a little bit annoyed at how quickly they assume that some of the other clades must have come from Europe. I also note that they said that the U2e found in Xiaohe is not found in any other population.

Of course, I found how they read the clades and markers very confusing, so I don't know how high the resolution is. I am very happy to be corrected on this matter.

Fanty said...

"'Caucasoid', which refers to a skull type and has nothing to do with ancestry "

Well then make that "Caucasian" instead of Caucasoid. And it DOES refer to anchestry.
In America its the official term for "WHITE" no matter what skull type he/she has.

Others use it as:

Europeans+Middle Easterners+North Afrifans+Indians = "Caucasian" populations.

Of course they are called like that because 150 years ago people believed the white race to originate in the Caucasus and even if thats no longer truth, the term is still used. ;)

andrew said...

I have never seen "Caucasian" used in reference to South Asian people, although it is generally used for Europeans, Middle Easterners, West Asians, North Africans and some Central Asians.

Fanty said...

Well, the Nazis include Southasians into the white race.
Though they ever used the "Caucasian" on ANYONE.

Fanty said...

Edit: never

Fanty said...

Nazimaps usualy claim North-India to be terretory of the White Race.

Davidski said...

And how are the Nazis or Neo-Nazis an authority on anything?

Nirjhar007 said...

Indians are Caucasians but not all as its a diverse area.
//"Importantly, it contains the oldest and best-preserved mummies so far discovered in the Tarim Basin, possible those of the *earliest people to settle the region*."//
Probably related to Afanasevo?
If the R1a is European then it may have Z-283 or L-664....

Davidski said...

I think it'll turn out to be an extinct or extremely rare sister clade of M417 from Eneolithic Eastern Europe. Some people in places like Tibet and surrounds might still carry it.

Unknown said...

Id agree , something like Z282 (xZ280, Z284) and xZ93

Nirjhar007 said...

David, very interesting btw unless we have Eneolithic aDNA from SC Asia etc its not scientific to classify M-417 as European.
Guys, For A general Description Wikipedia on ''Caucasian Race'' has this to say-
''Caucasian race (also Caucasoid or Europid has historically been used to describe the physical or biological type of some or all of the populations of Europe, North Africa, the Horn of Africa, Western Asia, Central Asia, and South Asia.''

Krefter said...

Some of the C4s look like C4a1a. The C4 from Neolithic Ukraine and Yamnaya Ukraine was C4a2’3’4, so they share the C4a clade. Although I still doubt those results from Ukraine are legit. C4a may just be a really popular clade of C4 in Asia.

Krefter said...

Sure, the first time the Tarim basin or whatever was settled by humans was 4,000 years ago, but surrounding areas had been settled for over 40,000 years.

I was tired of bloggers and even professionals acting as if East Asia was originally inhabited by Europeans, by saying "The first people of the Tarim basin were Caucasian", which to nubs means all of China or East Asia. Alot of BS has been said about the Tarim mummies.

Archaeologist have known for decades people from Russia and South-Central Asia had immigrated that area of Asia by 4,000 or even 5,000 years ago.

Krefter said...

I've added the new mtDNA data to my collection of ancient mtDNA. It's under "Central Asian Bronze age".

*It looks like in 2515-1829 BC people of Siberian/East Asian+European+South Asian+West Asian decent lived in the Tarim Basin.*

I'll copy and paste the Allentoft data tonight, and then add the new data from Neolithic France and Romania.

The U7 is absent from Ancient Europe, and so clearly points to West Asia. The M5 is obviously from South Asia, and the R(xHV, U, T, J, R1a) could very well be from South/Central or West Asia. There is no R(xHV, U, T, J, R1a) from Ancient Europe.

The M9a1a, M8a2'3, D, and C4(some C4a1a) are probably from East Asia or Siberia. The M9a1a specifically is probably from East Asia not Siberia.

The U5a, U5a1b1, and U2e are almost for sure from Europe. Although 8,000 year old U5a has been found in Siberia, and U2e contemporary to these samples have been found in Siberia(along with U5a and U4, in people with mostly East Asian mtDNA).

The U5a1b1 though clearly points to Europe. U5a1b1 hasn't been found in Yamnaya/Catacomb, but it's been found in Corded Ware/Bell Beaker/Unetice. It's also has a pretty strong presence in Northern Europe today.

All the U5a are porbably U5a1, because none of the U5a1-mutations were tested.

Unknown said...

Im not sure I'd call U5a exclusively "European"> it probably existed all the way to Siberia from 16 - 5 ky BP.

Krefter said...


That's what I said. But U5a in the Tarim Basin is probably from recently from Europe. No obviously European(xU) lineages were found, but U4+U2e+U5a are really popular in Andronovo/Sintashta.

It's most surprising to see West and South Asian lineages. So, the argument that R1a-M417 is from West Asia can still live, although the Tarim R1a died out. I guess some untested could have been Z93 or Z283.

Unknown said...

I'm not surprised at all that a mix of groups are found. But I don't think at all tht M417 is from west asia. I'd like to see finely resolved R1a groups. Will be interesting and a crucial piece of the puzzke

John Thomas said...

As David said, I absolutely * hate* to cite Nazis or neo Nazis, for that matter, as authorities for anything, but it is curious that prior to 1939, the Nazis did send anthropological missions to Tibet, of all places, in order to ascertain the earliest 'Aryan' roots.
I only say that because David mentioned ancient apparently extinct r1a1a clades possibly surviving in Tibet.

Davidski said...

So Afanasievo and Andronovo were from Europe, as we now know without any shadow of a doubt, but the early Tarim Basin people were from West Asia.

But they all somehow ended up with similar archaeological cultures and mtDNA...?

Krefter said...


The Tarim Basin people were (probably)part West Asian not pure West Asian. We're talking mtDNA not Y DNA.

The U7 doesn't have to come directly from "West Asia". The U7, M5, and unclassified R could all probably at somepoint went through South-Central Asia. U7 has a presence in South-Central Asia today.

Unknown said...

Yes I think the Tarim basin was a cosmopolitan area. People from far and wide would have settled there given its vital position en route to the exotica of the East, even if they were predominantly eastern Euros.

Kristiina said...

The oldest layer of Tarim Basin burials has almost exclusively haplogroup C4 + one H, K, and M*. This is not at all a European distribution of mtDNA. In this new paper which covers mtDNA from later layers (upper layers, fourth layer), haplogroups are much more mixed:
C4 x 17
C5 x 1
D x 4
M x 2
U5a x 2
U7 x 2
G2a x 1
U2e x 1
M5 x 1
R x 1
T x 1
H x 1
K x 1
B5 x 1
However, 24 out of 36 are East Asian/Northeast Asian haplogroups. M5, M and R, i.e. 4 out of 36, may be Central Asian/Indian. U7 could come from Iran. Only U5a and U2e, i.e. 3 out of 36, are clearly steppe/Siberian haplogroups. T, H and K could be European but also West Asian.

Krefter, when you say that "U4+U2e+U5a are really popular in Andronovo/Sintashta", I must say that they were very popular already in Mesolithic Siberia.

I would also emphasize that mtDNA is usually more local than yDNA. When men are moving they usually take a local wife.

Krefter said...

"Krefter, when you say that "U4+U2e+U5a are really popular in Andronovo/Sintashta", I must say that they were very popular already in Mesolithic Siberia."

The R1a1(xZ93) could also be of Mesolithic Siberian origin. We know there was EHG-type presence mtDNA wise, so there must have been Y DNA wise. Although I tend to think the R1a1(xZ93) and U5+U2e is of Bronze age European origin.

Nirjhar007 said...

Possibilities and possibilities....

Karl_K said...

"The R1a1(xZ93) could also be of Mesolithic Siberian origin."

Definitely. That is extremely likely. Without any shadow of a doubt...

Nirjhar007 said...

We i think still need a bit more aDNA from some places before concluding whats ''likely'' or *not* unless you think we have enough or have all that matters...

Davidski said...

It's not possible because the Tarim Basin mummies show mtDNA and archeological links to Afanasievo, Andronovo and Early Bronze Age Eastern Europe.

Nirjhar007 said...

The Mtdna is Diverse and i don't think Such exclusive links exists for such one way ''origins'' in waht we call reality.

Davidski said...

We now have archaeological and genetic trails for a couple of major expansions of R1a-M198 populations from Europe to Asia.

You think the archeology, R1a-M198 and western steppe mtDNA HGs like U4 and T2 somehow all lined up at multiple sites to create an illusion like this?

Get over it. This has already happened.

Nirjhar007 said...

I think what you are forgetting is that we don't have dna from the areas which will decide! and even from those areas which are sampled say Andronovo we don't have scientific sampling because we don't have Eneolithic data! about archaeology please don't make arguments like those which don't make sense in proper and broader manner or say .

Karl_K said...


"We i think still need a bit more aDNA from some places before concluding whats ''likely'' or *not* unless you think we have enough or have all that matters..."

OK. So maybe you are right. Perhaps the R1a1(xZ93) came from Europe.

Thanks for clearing that up Nirjhar! I will start taking your comments much more seriously!

Nirjhar007 said...

Guys, IMHO what is shown in this paper-
Can also be applied for example with case of what David tries to force on Indian R1a which is quite similar to the faulty notion of the Copper / Bronze Age "Super Fathers"...

Davidski said...

That paper doesn't have any relevance here because it doesn't say that Y-hg founder effects can't happen.

Corded Ware, Andronovo, Sintashta, Tarim Basin are all almost fixed for R1a-M198 and share similar mtDNA lineages. They also share archaeological traits.

So they all come from the same place. It's not like a Siberian population with a 100% frequency of R1a-M198 learned to imitate Bronze Age Eastern European culture via the internet, and then decided to suddenly migrate to the Tarim Basin.

Use some common sense at least when coming up with arguments.

Davidski said...

Anyway, I'm pretty sure I have now nailed the ancestry of the Pathans with qpAdm. Here's the best fit using over 100K transversion SNPs and two of the better quality Andronovo genomes, both with low levels of East Eurasian ancestry. This is better than with Sintashta using transversion sites.

RISE_baAndrov 0.587
Georgian 0.323
Dai 0.090

chisq 0.263 tail prob 0.876876

Afanasievo is also a good fit, but not quite as good, possibly because Dai have to compensate for the non-ASI ENA admixture in Pathans.

RISE_baAfan 0.248
Georgian 0.636
Dai 0.116

chisq 0.523 tail prob 0.769891

Nirjhar007 said...

You are Seeing JUST what you are wanting to see YOU HAVE A VERY SELECTED OBSERVATION because of your default bias so every thing is clear and simpler for you.
In reality we have many miles to cross.
'' doesn't say that Y-hg founder effects can't happen''
Of course it don't say that but it also says or suggests Super Father Like stuffs are also very incorrect the stuff you want to see in case of Indian R1a if we believe in Archaeology that you seem to do in case of pressing the Steppe Hypothesis it also becomes much more clear.
The thing that Corded Ware, Andronovo, Sintashta, Tarim Basin are all almost fixed for R1a-M198 is indicative of what i suggested that Central Asia was majorly a R1a area from Archaic times with Relations to NC Euorope nothing else we just need some more Ancient dna to prove or disprove that, specially From Sites Like Afanaseivo, Jeitun etc.

postneo said...

"It's not like a Siberian population with a 100% frequency of R1a-M198 learned to imitate Bronze Age Eastern European culture via the internet, and then decided to suddenly migrate to the Tarim Basin."

1) how many centuries separate xiaohe and CW?
2) What were the things copied. some specifics would help
3) Is Europe the western end of these supposedly shared traits?

Karl_K said...

As Nirjhar just hinted to:

There is an excellent scholar named German Dziebel who could clearly explain how all R1a clades are back migrations from the Americas. No 'super fathers' necessary.

Karl_K said...


"1) how many centuries separate xiaohe and CW?"
and was the internet even available at that time?

"2) What were the things copied. some specifics would help"
could you include which internet browser was most popular with those cultures?

"3) Is Europe the western end of these supposedly shared traits?"
or does it go much further west, like all the way to silicon valley?

postneo said...

so xiaohe copied east europeans separated 1000s of miles and only a few centuries away without the internet.

Alberto said...

I didn't want to discard those results for Pathans as 70% Sintashta as an oddity in the algorithm of qpAdm, mostly because of the other connections that we know (archaeological and the Y-DNA), but they did go against everything else that we know regarding admixture/genetics.

The result with Andronovo makes a bit more sense, but the one with Afanasievo a lot more sense (even if it's a bit worse, though still really good). It's like when we get f3 stats like:

baCw hunterW baArm -0.0127780596 -10.1662437033 243500
baCw neolC baYam -0.0057595766 -5.4321848684 365920

The first one is much better, but the second one makes more sense (historically, in this case).

Alberto said...

I have to agree with Nirjhar that the sampling bias in this case is too big to have a fair debate (or better, a fair view of reality).

These latest Sintashta/Andronovo samples have provided a strong evidence for European migration to North-Central Asia (I just made up that term to refer to Kazakhstan). But there is still a world unknown south of that area. I'd wait to have some samples from BMAC, for example, to have a better idea of what was going there at that time. Things might not be exactly as they might look with the samples we have now.

Seinundzeit said...


This is pretty solid. Since it's based on high quality transversion SNPs, I guess these results are more relevant than the fits based on all the SNPs?

Matt said...

I'd particularly still doubt a model for Pathan (or South Central Asia generally) as those kind of levels of Sintashta+Dai+Georgian.

You have that stat which everest59 from Anthrogenica astutely asked for, where

D(RISE_baSin Georgian Pathan Yoruba) -0.0042 -2.665
D(RISE_baSin Georgian Pathan Chimp) -0.0089 -3.743

So Georgian marginally closer to Pathan than Sintashta.

That shouldn't be if Pathan is a mix of a greater amount of Sintashta than Georgian (like 55% as much Georgian as Sintashta), and Dai like ADMIXTURE should not inflate similarity to Georgian. (If anything, it Dai should make Sintashta relatively closer, as ENA is supposed to share more drift with HG than Near East).

Andronovo and Afanasievo seem more promising for admixture as direct f3 shared drift with Andronovo, Yamnaya and Afanasievo for Pathan is higher than Sintashta (which shares no more or less drift with Pathan than Bell Beaker). D(Rise_baAfan Georgian Pathan Chimp) might actually come out positive. Remember though that the Andronovo samples come from around the same neighbourhood as Afanasievo, not the same area as the Sintashta sample.

Re: the two models in the comments here, I think the Afanasievo model would seem to me to be more sensible in the K8, even if its a bit worse of a fit (

Also looking at the f3 stats from the paper, both models actually seem to work almost OK for relatedness to MA-1, but particularly the high Andronovo model seems to predict quite a bit more relatedness for Pathan to WHG than is correct (in the North Caucasus levels).

Weighing these other factors, I'd go for the Afanasievo based model for now, out of the two.

qpAdm is pretty cool, just not totally convinced in its outcomes unless checked against both directed tests of relatedness to the putative populations and that approximately those populations actually existed in that area at that time, via fossils and adna. Still glad that you're testing rather than just waiting for more dna (if we ever are able to get some).

Unknown said...

Can you use the Paniya on qpAdm?

Davidski said...

I can't use the Paniya because with them the marker overlap is too small.

The modeling based on over 100K transversion SNPs has to be taken very seriously, because of the high density/quality of the data. Those RISE samples produce very clean results and essentially behave like modern genomes when run with transversion sites.

We can still debate the levels of European ancestry among the Pathans, because qpAdm/f4-stats can only tell us what's possible, and not what actually is, but I have no doubt that Pathans have significant ancestry from the Eurasian steppe, and this is also the source of their R1a-Z93.

Krefter said...


Do you think it's possible before the arrival of R1a-Z93 South/Central Asians were mostly Basal Eurasian+ASI? Meaning their Near Eastern ancestry had hardly any WHG? So, when WHG-rich Sintashta is added their WHG level became as high as what West Asians today have?

Davidski said...

Yes, I'd say that's possible. Maybe some ANE as well.

Davidski said...

By the way, I'm working very hard here to find the best fits for a whole range of groups, also making sure that the results aren't biased by my choices of the right and left populations.

I'll post the final results early next week in part two of my Badasses of the Bronze Age series at my other blog. :p

Unknown said...

Okay. That's a bummer. At least 2 of them are the only Indians showing no Yamnaya ancestry. They would be better for ASI and Central Asian. Pathans typically get 40% Paniya, 30% Yamnaya, 30% mix of some Bedouin and EEF, in my runs.

Unknown said...


I think you might have misunderstood that paper, and my comment on Maju's blog. I was stating that I agree with the paper in that stochasticity might play a large part in the success of certain lineages, but this doens;t mean that such expansion didn;t happen, or indeed, that there were certain manifest cultural reasons for it happening.

But i certainly agree with you that we need samples from India, Iran and BMAC territory, no matter what our current set of data shows, or how sophisticated our modelling appears to be. Nothing substitutes for actual evidence at the end of the day

But I will iterate, that at present, Z93 looks nested within an otherwise very European set of haplogroups. As an analogy - look at E-V13. It is exclusively European, but its close and more distant 'cousins' are all north African/ SW Asian. So even if there were older, basal clades of R1a in central Asia, and even India prior to this, Z93 does appear at present to have an origin in EE. The fact that its not found in Europe (apart from a few individuals of probably recent south Asian origin) doesn;t change this. This of course could change if the entire Tree of R-M417 is proven to be wrong as currently is, the dating we have been using is wholly off, and/ or new aDNA somehow alters our phylogeny.

Nirjhar007 said...

Mike, What is the word on lack of R1a-Z93 samples from Europe again? That we should find them from Cultures like Abashevo right? the Sintashta and Andronovo show very high European like Ancestry in ADMIXTURE yes but Certainly Z94+ is Exclusively Asian.
So the the things which are vital are-
1.Samples carrying R1a-Z93 mutations are to be found from Eastern Europe which should show those Z-94+ mutations came from European Ancestors.
2. It must have to be proven for the Steppe proposal that R1a-M417+,R1a-Z645,R1a-Z93+ Mutations were absent in India and the Stans from the period before 2000 BC otherwise its Curtains.
3.We have to neglect the Archaeological observations contradicting the practicality of Late Bronze age Aryan Migration to SC Asia and India with Undermining Anthropological and in some manner Ancient Textual and Cultural patterns also which of course for the People of Genetics are not that **valuable**.
Yes Apparently the current observation are quite like you mentioned probably it also has something to do with Amount of Bias in Sampling, Prejudices etc but the above points i mentioned are worth considering unless i'm wrong.

Davidski said...

Before the Allentoft et al. paper came out I said here that there are Z93* lineages among modern Poles and Russians that don't look like anything in Asia (check the Yfull R1a tree), and that there will be Z93 in Sintashta and Andronovo remains.

So everything is making sense so far.

Nirjhar007 said...

Clades like Z2124, Z2123 are not E European and those Z93* lineages are whats the word? yes i remember.... bunk.

Davidski said...

The Polish and Russian Z93* are very real and very basal compared to anything in South Asia.

Davidski said...

And Z2124 is found in Bashkirs. Have a look on a map where they live.

Nirjhar007 said...

No they aren't btw its Interesting where is R1a1a1b2a1?? I think Pre-2000 BC samples from India and Pakistan etc should have them.

Nirjhar007 said...

^ *have it* and Z2124 is Quite frequent among S Asians...

Seinundzeit said...


That sounds pretty exciting, can't wait to see the results.


I'd prefer the Andronovo model, for four reasons:

1) The archaeological connection simply isn't there, when it comes to Afanasievo and South Central Asia.

By contrast, most linguists and archaeologists consider Andronovo to be an early Indo-Iranian culture. Basically, Andronovo are the linguistic ancestors of modern South Central Asians, while Afanasievo aren't. That makes any fit with Andronovo more compelling (in fact, much more compelling).

2) Also, 50%-70% of Pashtun males are (without a shadow of a doubt) direct descendants of Sintashta/Andronovo (same kind of R1a1a. In fact, the same rather downstream clade of it). By contrast, Afanasievo might have been an R1b-dominated population (this is more speculative). Also, Pashtuns do display solid mtDNA links with Sintashta and Andronovo. Although I might be wrong about this, it is my understanding that the mtDNA data shows a considerably weaker link between Afanasievo and Pashtuns.

3) In addition, the model is better, which kinda speaks for itself.

I think the output produced via qpAdm takes precedence over f3 stats. As you know, f3 stats are somewhat tricky (in terms of how they behave). Anyway, qpAdm was designed to get past the issues we see with f3 and d-stats.

4) Finally, the model isn't unprecedented, it's been prefigured on a few occasions, but with similar-but-different modern/ancient populations, and with different methods. A year ago, Everest was able to model Pashtuns as 66% Lithuanian, using ALDER. At the time, we thought that this was quite strange, and obviously incorrect. But now, I think it ties in nicely with Pashtuns being anywhere from 50%-80% Sintashta/Andronovo-admixed (I know Everest disagrees with this, but we'll see).

On the ADMIXTURE front, Chad once had a supervised ADMIXTURE run with Corded Ware, Yamnaya, and quite a few other West Eurasian components. Interestingly, the Corded Ware component dominated South Asia, at the expense of Yamnaya, ANE, EEF, BedouinB, etc. The HGDP Sindhi population turned out to be around 60% Corded Ware. This also ties in rather nicely with qpAdm showing Pashtuns to be around 60% Sintashta/Andronovo.

With South Asian aDNA, I think we will find that Pashtuns are around 60% "steppe" (basically, 60% of something similar to Sintashta/Andronovo), 30% ancient South Central Asian agriculturalist (which would be a combination of Near Eastern ancestry, ANE ancestry, and ASI ancestry. I think there will be a genetic continuum between the IVC and BMAC, with the same broad ancestral components, but just different levels of those ancestral components. So, I wouldn't bother pegging any proportion, between the BMAC and IVC), and 10% recent West Asian.

Also, just a side note, but Andronovo and Sintashta constitute a "clade", they are pretty similar. Afanasievo and Yamnaya are very close to each other, but quite distinct from Andronovo and Sintashta, which are much closer to Corded Ware. So I wouldn't make much of any difference in the models, when transitioning from Sintashta to Andronovo.


We might hear some news about IVC aDNA, next month.

Nirjhar007 said...

Where you got that news?.

Nirjhar007 said...

BTW The Idea that // Andronovo are the linguistic ancestors of modern South Central Asians//
Is Utter Contradictory Nonsense so please don't make statements like those...

Seinundzeit said...

I see, so that would mean that the IVC was Indo-Iranian?

I think that sounds more like nonsense.

Unknown said...

Well I agree with Nirj on the archaeological front
The "many archaeologists and linguists agree" stance ultimately rests on Kuzmina's reconstructions, which were wholly dismembered by Lamberg-Karlovsky. They're strained, constructed and wholly whimsical at times. So IMO the only evidence rests on genetics.

I think BMAC is the indo Iranian homeland- which also moved to occupy Northern Iran and Indus. It's culture is wholly original, and cannot be reduced to any steppic intrusions (here the "Andronovo culture" - itself a clunky construct), nor anything Mesopotamian, nor Harrapan.

Only aDNA from here will be the proof we all need.

Nirjhar007 said...

Is this hard to get by? those Mutations are not ancestral to SC Asians but Shared! and one mutation which is significantly present in S Asians i.e. L-657 is so far not found at all! OTOH Z-2124 is also interestingly shared by Jews and
Bashkirs are of Scythian origin from Central Asia if i'm not wrong.
//I see, so that would mean that the IVC was Indo-Iranian?//
By every chance! but this is not the place to discuss that but most likely as many will disagree IVC/SSC had IE speakers...
BTW you didn't tell where or how you got the aDNA news?.

Davidski said...

Sintashta people were fully European. Their Z2124 was native to where they lived.

Nirjhar007 said...

The Z-2124 by every chance is a result of intrusion there from C Asia or Around unless we get Z-93 from Earlier samples from EE area.

John Thomas said...

If it is true, as claimed here, that modern day Poles, Russians, Ukrainians etc share a great deal of deep genetic ancestry with modern Afghans, Pakistanis and north Indians etc, wouldn't we expect modern east Euros to accumulate quite a few very distant matches with south Asians on 23andme, for example?

Seinundzeit said...

This is a very good angle to bring up. I can't speak of Eastern Europeans showing very distant matches, but I can look at my own 23andMe results.

If I exclude Afghanistan, Pakistan, and India, my top countries on 23andMe's "Countries of Ancestry" are Ukraine, Poland, and Hungary. If I relax the settings, I also get other European countries (not necessarily Eastern Europe though), like Russia, Belarus, Finland, Norway, and Sweden (there are others, like Bulgaria, Macedonia, Slovenia, etc).

Matt said...

Sein: By contrast, most linguists and archaeologists consider Andronovo to be an early Indo-Iranian culture. Basically, Andronovo are the linguistic ancestors of modern South Central Asians, while Afanasievo aren't.

The Andronovo and Afanasievo samples here are basically from the same area in Russia, AFAICT. I think there's maybe an open question about whether all the Andronovo were the same as this one, or others were more like Afanasievo and Yamnaya (less MN European related ancestry?).

Anyway, qpAdm was designed to get past the issues we see with f3 and d-stats.

I think they're pretty different functions. IMO qpAdm seems made for estimating proportions when you already have good formal evidence for the populations via direct relatedness and samples. Not really "getting around" problems, just using the f4 information to produce fits. But seems to me there are potential pitfalls with doing this, blind to other information than the relatedness to outgroups. Even with rigorous choice of outgroups you may have different populations that just aren't very differently related to the outgroups. I don't know if you can view one as the replacement for or as superceding the other. Certainly, I really don't think we can dismiss results like the D(RISE_baSin Georgian Pathan Yoruba) -0.0042 -2.665 entirely just beause of qpAdm.

Also, just a side note, but Andronovo and Sintashta constitute a "clade", they are pretty similar. Afanasievo and Yamnaya are very close to each other, but quite distinct from Andronovo and Sintashta, which are much closer to Corded Ware.

For the most part Andronovo and Sintashta samples are similar. Most different outgroup D stats from Allentoft (to other ancient samples) are -

D( Yoruba baKarasuk baAndrov baSin) = -0.014, Z = -5.9
D( Yoruba baYam baAndrov baSin) = -0.012, Z = -4.7

(also comparison - D (Yoruba baAfan baAndrov baSin)= -0.009, Z =-3.3).

Not sure what the boundary should be seen as falling, in D stats, for forming a clade.

From the selected significant statistics from the paper, their evidence that Bell Beaker is closer to Neolithic Central than Corded Ware is by comparison:

D (Yoruba Neolithic Central Bell Beaker Corded Ware) = -0.009, Z= -3.4
or for Sintashta vs Bronze Age Hungary to Yamnaya

D (Yoruba Yamnaya Bronze Age Hungary Sintashta) = 0.011, Z= 4.8

which is quite similar in magnitude of D and Z.

(Comparably D (Yoruba neolC baCw baSin) = -0.003 Z= -1.15, D (Yoruba baYam baHu baCw) = 0.016, Z = 6.76)

Also, specifically on whether rise Sintashta and rise Andronovo is closer to Corded Ware than Yamnaya and Afanasievo:

D (Yoruba baCw baYam baSin) = 0.0007140329 Z= 0.3053068916
D (Yoruba baCw baAfan baSin)= -0.007091106, Z=-2.5222587839
D (Yoruba baCw baYam baAndrov) = -0.00456324, Z=-1.94311511
D (Yoruba baCw baAfan baAndrov) = -0.0032209002, Z=-1.2478505608

Lower D and Z than above, for the case of the Yamnaya and Sintashta comparison, certainly not significant.

Davidski said...


Andronovo is archeologically an extension of Sintashta into Central Asia. These days they're treated as separate but closely related cultures, but in the past they were both called Andronovo.

Two of the higher quality Andronovo samples are very similar to the Sintashta samples and very European, apart from minor Siberian admixture, and it's these two that I'm using to model the Pathans and getting good fits.

Of the other two Andronovo samples, one is very similar to Afanasievo/Yamnaya, and the other has significant Siberian admixture. When I also use either of them to model the Pathans I get poorer fits.

By the way, as you probably know, the R1a carried by these Andronovo and Sintashta samples is Z93+. It's the same subclade that reaches ~70% among Pathans.

Seinundzeit said...


The only thing I can say about the Georgian-Sintashta d-stat issue, look at BA Armenians. Pashtuns are closer to Sintashta than to BA Armenians, according to the d-stat David ran. Assuming that the BA Armenians do resemble NE Caucasians, that should tell us that d-stats aren't so clear cut (Pashtuns are closer to Lezgins when looking at fst distances, compared to Georgians. In addition, Lezgins have ANE levels which almost match Pashtuns on the K8 ADMIXTURE model, and have fairly substantial ASE levels in that same ADMIXTURE run, while Georgians are more distinct from Pashtuns in terms of both ANE and ASE, on K8).

Anyway, you've even noted that Andronovo are probably closer to Pashtuns than Georgians are, when it comes to d-stats, so the problem has basically been resolved. And with Andronovo, we still have Pashtuns at 60% Bronze Age steppe-admixed (which is the same amount they get using Sintashta). On top of that, Andronovo are fairly similar to Sintashta (which makes sense, as they were descended from them).

Krefter said...


How do you know the high amount of R1a-Z93 in Pashuten isn't a founder effect?I want to start looking at their mtDNA. If they really are basically 50%+ Sintashta, we should see in their mtDNA It'd be exciting if some are matches with Steppe samples.

Davidski said...

It's a founder effect to some extent, but obviously someone had to turn up there with Z93 to start the founder effect.

Alberto said...


That idea of S-C Asians being very Basal Eurasian and when mixing with WHG becoming (or being "absorbed" by) ENF is interesting. It could explain why WHG does not show up in those populations even if they mixed with other populations high in it. I tried to find an explanation similar to it with ANE, but failed. But your idea is more straight forward and could be the right one.

It would mean that S-C Asians were very rich in Basal Eurasian and ANE. And it could explain why modern Near Easterners "lost" their WHG ancestry with the arrival of ANE (if it came from S-C Asia at some point, carrying very Basal Eurasian admixture). And why NW Africans didn't lose it (Mozabites don't have much European ancestry, I'd guess, but still some 10-15% WHG, clearly above modern Near Easterners even when they have quite more SSA admixture).

Unknown said...

Fascinating points, lads

Nirjhar007 said...

''the R1a carried by these Andronovo and Sintashta samples is Z93+''
Wrong they carry Z-94+ and how much Euro_HG do the Pathans show?? at best around 10%?.

Davidski said...

The fact that it's Z94 and more derived doesn't change anything and doesn't help you.

All it means is that Z94 and Z2124 moved into South Asia from the steppe.

Nirjhar007 said...

Boring and Circular.

Alberto said...


Another problem with the qpAdm results for Pashtuns that we are ignoring is the fact that it's quite unlikely that the IVC people were Dai-like. So fitting Pashtuns as largely Sintashta + Dai might be as relevant as fitting South Europeans as Somali + Lithuanian. The fit might work, but it's unrealistic.

So if one wants to prove an Aryan invasion from the steppe, the base population should at least be Dravidian. Also while Georgians might help to give a good result, we don't really know what they stand for. Do they represent part of the local population (a mix of Georgian and Dai) or part of the steppe population (a mix of Sintashta and Georgian)? Trying to get good fits with, for example, Tamil as one side of the equation seems more realistic:

Punjabi baSin Tamil -0.0044927892 -9.5961796973 984919
Punjabi Tamil baAfan -0.0043542188 -8.5280938162 1066449
Punjabi Tamil baArm -0.0042693241 -10.4633613023 658310

All 3 seem to work, but I guess this is where qpAdm can help to determine the proportions of each and give a much better idea than f3 stats alone.

Seinundzeit said...


David has tried to use peninsular South Asian populations, in conjunction with Sintashta, and the models failed. That tells us something.

Interestingly, GujaratiD (the most "South Asian" subset of the Gujarati samples that David has) are best modeled as around 25%-30% Sintashta. The model was excellent (the chisq and tail probability were perfect). With transversion sites, I'm sure Andronovo will do better. Anyway, many Dravidian populations do display Z94.

Also, I doubt that qpAdm could model Southern Europeans as Somali + Lithuanian (the model will probably be a failure, or a terrible fit, unlike the excellent fits produced for Pashtuns using Andronovo and Sintashta). Although, it isn't too implausible of model a priori, since Southern Europeans do have minor African admixture (1%-5%, with the highest in Iberia), based on formal methods (there was a paper on this), and they do have higher "Basal Eurasian" admixture in comparison to Lithuanians.

Regardless, the non-steppe portion of Pashtun ancestry is represented by Georgians + Dai, not just Dai.

Finally, I think the discussion has grown somewhat stale, since the models are what they are, yet people keep on bringing up the same points which are either irrelevant or incorrect (usually both). So, all I'll say, to conclude things, "let's wait for South Asian aDNA". I'm pretty confident that the qpAdm models are correct, as this is what this methodology is supposed to do, and it has worked great in previous cases. I'm willing to bet that South Asian aDNA will show Pashtuns to be around 60% Andronovo/Sintashta (although, not all of it must necessarily come from Andronovo, but all 60% must necessarily come from steppe populations that were very similar to Andronovo) + 30% IVC/BMAC (in my view, both cultures represent a continuum of sorts. I have my reasons for this) + 10% recent West Asian (to represent ongoing gene-flow from the Iranian plateau). But lets leave it at that for now, and see what IVC aDNA shows us.

Alberto said...


Yes, I agree that to know we have to wait for aDNA. Till then these are just theoretical models and debates. I'm sure we all now that.

But I'm interested in your model, and just to make sure if I get what you mean:

- IVC was a Caucasus-like population with some ASI? To me that sounds pretty much like the population that is 50% of Yamnaya which in turn is 73% of CW. Do you think that it all started in S-C Asia, from there to Afanasievo/Yamnaya and then to CW? And then a back migration to S-C Asia? (In this model, would Afanasievo be the source of R1a?) Or do you propose a different model?

- David has tried to use peninsular South Asian populations, in conjunction with Sintashta, and the models failed. That tells us something.

Tells us what? That S-C Asia was not Dravidian-like, but more Caucasus-like? (going back to the model proposed just above). Or something else? If the former, where do Dravidians come from? If the latter, what else does it tell us?

Seinundzeit said...


1) IVC was probably a combination of Near Eastern, ANE, and ASI ancestry. Although, I'm not sure where you get the notion that it all started in South Central Asia? Also, Dravidians are rather Caucasus-like, in the sense that their West Eurasian ancestry is closest to Caucasians, which needs to be noted.

2) It tells us that you can't get decent models with a South Asian population included, since most South Asians probably have IE ancestry. Moorjani et al. found very shallow admixture timings for all South Asian populations, which supports this. There has been pretty significant gene-flow from northern to southern India, across time, so southern Indian populations (that are a part of the caste system) probably have minor Andronovo-related admixture, via northern Indians.

Seinundzeit said...


Rereading what you wrote, IVC could basically be like the population that is 50% of what went into Yamnaya, but with the addition of substantial ASI ancestry.

Davidski said...

Alberto, some key points:

- Yamnaya doesn't show any ASI. Check out the K6 and K7 runs here, before the Kalash make their own cluster...

- Afanasievo people didn't migrate to Europe. They were European migrants in Asia.

- R1a didn't arrive in Europe with Afanasievo. It was already present among Mesolithic foragers in Eastern Europe.

- South Central Asians appear to be a complex mixture of West Asian Neolithic groups, Corded Ware-derived Bronze Age steppe pastoralists, and Dai-like native foragers.

Unknown said...

Id heard it before, but where did we get that Dravidians are (South) Caucasuan like ? Is that evident in any of the stats within Formally published appendixes (eg Haak)?

Alberto said...


Rereading what you wrote, IVC could basically be like the population that is 50% of what went into Yamnaya, but with the addition of substantial ASI ancestry.

Yes, they must have been more like 45% South Asian, 50% ANE and 5% ENF to be a good match with Sintashta/Andronovo. We don't have such population to test in qpAdm, but somehow 2/3 Georgian + 1/3 Dai do the trick.

Anyway if it's true we might get IVC DNA next month then it all will be much more clear.


Yamnaya doesn't show any ASI. Check out the K6 and K7 runs here, before the Kalash make their own cluster...

Hhmmm... I don't see any South Asian cluster in those runs, so how could they show any South Asian? For the rest we've seen, I think they did have some small amount of South Asian, maybe 5-8%?

Afanasievo people didn't migrate to Europe. They were European migrants in Asia.

Well, assuming they did migrate from Yamnaya. But even in that case 50% of their DNA was Asian (Caucasus-like). That's the point I was making.

R1a didn't arrive in Europe with Afanasievo. It was already present among Mesolithic foragers in Eastern Europe.

Yes, it was in Mesolithic HGs, but we didn't find it in Yamnaya. So maybe Afanasievo was R1a, or else who knows where it came from to CW.

Davidski said...


I didn't say South Asian, I said ASI. In other words, Ancestral South Indian.

In the K6 and K7, ASI is the Southeast Asian ancestry present in South Asians. That's because the closest we have to pure ASI are Dai.

And I still don't understand why you're saying Afanasievo was the source of R1a in Corded Ware?

R1a is present in an EHG genome from Mesolithic Europe, and Corded Ware show significant EHG ancestry. So how does Afansievo fit into this?

Alberto said...


Ah, ok, you meant the East_Eurasian cluster. Well, it's normal that Yamnaya didn't show any of it, it's basically East Asian (Han, She, Dai,... are highest in that component). Pathans do show a small amount, either due to ASI or to East Asian admixture, or both.

The Afanasievo is just speculative. CW is Yamnaya + MN. MN didn't have R1a, so it came with a Yamnaya-like population. But in Yamnaya we only found R1b (and we have samples from different times and places). Notice that founder effects take many generations to happen. Here we're talking about a supposedly mass migration/replacement that happened fast. A founder effect wouldn't work. And MN people taking Yamnaya wives wouldn't work either, because they were not R1a.

So there must have been some Yamnaya-like population rich in R1a. Maybe in the forest steppe north or Yamnaya (where EHGs were taking Yamnaya wives, lots of them to become almost completely Yamnaya-like), or maybe it was Afanasievo moving north west. Who knows.

Unknown said...

Or in Poland, the Baltic, East carpathians etc where R1a already existed in an EHG like population

Alberto said...

Yes, but in that case we should be able to model CW as EHG + MN (plus only a small amount of Yamnaya). Which might be possible. I personally think that CW might not be 73% Yamnaya, but that's the "official" model for now.

Unknown said...

Maybe Im being stubborn, but I have difficulty accepting that model. Apart from the obvious issue that Yamnaya R1b separated from CWC R1a 16 k years ago, it has not considered that the "Teal" component entered CWC territory directly via the Dniester-Bug-Visla highway, and not from a peripheral Samara

Davidski said...


The East Eurasian clusters in the K6 and K7 also represent native South Asian ancestry, which was Dai/Onge-like and part of the ENA node in the Eurasian phylogenetic tree.

Some Yamnaya do show small amounts of the East Eurasian at K6, but when this cluster is broken up into Siberian and a more specific East Eurasian at K7, Yamnaya only show the Siberian, which is obviously a signal of ANE from Siberia.

On the other hand, all South Asians show significant levels of the K7 East Eurasian, especially the GujaratiD and Punjabis from Lahore, who are the most South Asian populations in this run.

In other words, if you want to track South Asian ancestry (as opposed to noise created by admixed South Asian clusters), it's best to focus on ASI and components that represent it.

As for Corded Ware, the best model is Yamnaya + extra EHG + Middle Neolithic admixture. Hence the conclusion in Haak et al. that the Eastern European ancestors of the Corded Ware were basically Yamnaya with extra EHG, and thus well over 50% EHG. So the high frequency of R1a in the Corded Ware makes sense.


There's no evidence that EHG was present in Poland. Something similar to SHG maybe was, but I doubt it, and even if it was, then I doubt it carried R1a.

Unknown said...

OK Dave I take your point, but the corollary of your statement is that R1a must have arrived even to western Russia relatively recently, is in the late Mesolithic, just prior the Karelia sample.

Unknown said...

*as in*

Davidski said...

I have no idea when EHG and R1a (or maybe rather R1) arrived in western Russia.

It might have been just after the LGM from the Altai-Sayan refugium, or during the Mesolithic from western Siberia.

Krefter said...

We need Ancient DNA ranging from Iran-Bangladesh. Ancient DNA from Europe revealed a lot modern DNA couldn't, especially about percentages of ancestry.

Nirjhar007 said...

I will rely mostly on Ancient SNP's :).

Alberto said...

David, yes, I tend to use South Asian and ASI without much care of their specific difference. Now I see that Seinundzeit also meant more specifically ASI, which does seem to be completely absent in Yamnaya.

As for CW, the question of the origin of R1a there is still not clear. Yamnaya as far as we know (and we more more than anecdotal data already) was R1b (either because the HGs from that area were R1b taking Caucasus wives or because it came with the Armenian-like population). So I still find it difficult to justify a direct mass migration and population replacement from the Yamnaya people into the CW area. There had to be a Yamnaya-like population that was mostly R1a. Somewhere. Or alternatively, CW formed without Yamnaya-like population, directly from EHG and MN, though I'm not sure if this model works.

Matt said...

Alberto: Or alternatively, CW formed without Yamnaya-like population, directly from EHG and MN, though I'm not sure if this model works.

The simplest models of Corded Ware are as either 66% MN_Euro, 33% EHG or 20% MN_Euro, 80% Yamnaya. Split the difference and that's 43% MN_Euro, 17% EHG, 40% Yamnaya.

p.96 Haak et al "For none of the LN/BA populations is ancestry from an EN/MN farmer group the best N=1 model, and in all populations a marked improvement is observed between N=1 and N=2, but not between N=2 and N=3. The N=2 models shown in panel (b) of Figs. S9.7-10 always show the best fit when a “western” and “eastern” population is paired. For the Corded_Ware_LN, the best models involve European farmers and Yamnaya in proportions of approximately ~1/5 and ~4/5 or European farmers and Karelia_HG in proportions of ~2/3 and ~1/3."

This is the model with only the World Foci 15, outgroups without including any ancients as an outgroup and can't really distinguish well between the above two. So that model works just as well as Yamnaya+MN, for those outgroups.

They lean towards a Yamnaya one, as qpAdm when including 16 Ancients (mainly Neolithic European farmers, not including Yamnaya or EHG themselves) as "right" populations, finds that fits better, or an N=3 of Esperstedt_MN 29.1, Samara 9.4, Yamnaya 61.5. See p117. I don't know if there is any question there though around adding these ancients in as "right" populations, or about adding in this many European ancients without corresponding Steppe / Near East ancients. It seems a little strange to have the closely related Spain_MN in the "right" and Germany_MN in the "left" without having any similar situation for the EHG or Yamnaya who are also in the "left", but then we are much more limited in what we have for steppe / Russia from that time.


Mike: Id heard it before, but where did we get that Dravidians are (South) Caucasuan like ? Is that evident in any of the stats within Formally published appendixes (eg Haak)?

There's the ADMIXTURE clustering which suggests this and generally D-stats involving D(West Eurasian,Georgian;South_Asian,Outgroup) indicate that Georgians are closer. I don't have any to hand though.

It can also be tested via D(Outgroup,Georgian;South_Asian,Other) stats.
Although such stats will be affected by the fact that admixture from groups phylogenically neutral to 1 and 2, will make the stat tend more towards zero (e.g. ENA admixture is closer to Georgian than it would be a Yoruba outgroup, but it's still more neutral than say Sardinian admixture).

If you wanted to test the similarity to Georgian vs a "Pop B" in South Asians compared to other populations, net of an outgroup related population affecting similarity, I think maybe the "best" way to do it would be to do a ratio of D(Outgroup,Georgian;South_Asian,Other):D(Outgroup,PopB;South_Asian,Other). IDK for sure though, not sure anyone has ever done this.

Balaji said...


I have compared your qpAdm modeling of Pathans with your Stepppe_K9 modeing.

In Steppe_K9, Pathan = 0.03 SE_Asian + 0.02 Siberian + 0.54 South_Central_Asian + 0.2 Steppe + 0.18 Middle_Eastern + 0.03 MN_European

According to your apAdm modeling, Pathan = 0.587 RISE_baAndrov + 0.323 Georgian + 0.09 Dai. Substituting the appropriate components for RISE_baAndrov, Georgian and Dai, I get

Pathan = 0.09 SE_Asian + 0.03 Siberian + 0.03 South_Cenral_Asian + 0.54 Steppe + 0.21 Middle_Eastern + 0.1 MN_European

This is quite inconsistent with the direct Steppe_K9 analysis of Pathan. Therefore your qpAdm analysis or your Steppe_K9 analysis or both are incorrect. That is unless you have a way to transmute the Steppe_K9 components from one to another.

Davidski said...

The South Central Asian component is a mixture of Near Eastern, steppe and native South Asian elements.

Matt said...

Balaji, yeah, a population that is more or less equivalent to some combination of other populations could get different ADMIX components.

I don't think you're totally barking up the wrong tree though. One way that you could look at this might be to look at the FST from the components that would be predicted based on the Andronovo+Georgian+Dai model vs the actual Steppe K9, as combinations.

So like -

Similarity to outgroups, net of drift, is about the same either way. And outgroups are what qpAdm uses, so no real surprise there.

But it does seem like the Andronovo+Georgian+Dai model would predict that Pathan should be further from the South_Central_Asian component and closer to the Steppe and MN_European components than it would be predicted by the SteppeK9 (unless this is some weird property of FSTs at work, always possible). Different relatedness of a test population to West Eurasian populations is used by the ADMIXTURE run and not by qpAdm with only non-West Eurasian outgroups, so again maybe no surprise again that's where the results diverge.

Davidski said...

Just for fun, here's a run with all of the UP samples.

Balaji said...


Thanks for your analysis. I agree with it and it illustrates again that apAdm does not give unique and always correct solutions. Haak were able to model Corded Ware as 35% Karelia_HG + 65% Eperstedt_MN or as 72% Yamnaya + 28% Eperstedy_MN. These are two quite different things. It was only by adding European_EN samples as outgroups that they could figure that modeling with Yamnaya was better.

Modeling Pathan as 0.587 RISE_baAndrov + 0.323 Georgian + 0.09 Dai is likely to be wrong.

Regarding the ancestry of Pathans, I think we should go back to the latest paper from Reich labs on this topic.

Table S2 and S3(a) and S3(b) of the Supplemental Data are of particular interest. These show that Indian populations including Pathans are most similar to Caucasian populations. Most are most similar to Georgians. Therefore any model which shows Pathans as more similar to a European-like population such as RISE_baAndrov cannot be correct.

Also noteworthy is that Indians are not really close to Iranians (see Table S3(b)). People talk of Indo-Iranians based on linguistic grounds. But genetically, the Caucasians seem closer.

Reich lab modeled Pathans as 70% ANI and 30% ASI.

Seinundzeit said...


This isn't correct. Chad ran a few stats with the Onge included, at Anthrogenica.

Source 1 Source 2 Target f_3 std. err Z SNPs

Onge Corded_Ware_LN Pathan -0.008277 0.000954 -8.677 143113

Onge Georgian Pathan -0.008108 0.000568 -14.267 144570

Onge Yamnaya Pathan -0.006741 0.000765 -8.810 142987

Onge Iranian Pathan -0.004181 0.000596 -7.009 144679

As you can see, Corded Ware + Onge provide the best signal of admixture (looking at the f3 score), better than Georgian + Onge. Naturally, the Z score is lower for the Corded Ware stat, since it involves ancient samples.

Now, look at this qpAdm model for Pashtuns, using the RISE Corded Ware samples (thanks go to David):

57.1% Corded Ware + 30.5% Georgian + 12.4% Dai


tail probability=0.438546

This model is much worse than what Pashtuns get with Sintashta and Andronovo, yet Corded Ware + Onge still performs slightly better than Georgian + Onge, using f3 stats. I'm sure Sintashta + Onge or Andronovo + Onge would be even more better than Georgian + Onge.

Also, take a look at these:

Source 1 Source 2 Target f_3 std. err Z SNPs

result: Mala SwedenSkoglund_NHG Pathan -0.008884 0.000746 -11.916 131987

result: Kharia SwedenSkoglund_NHG Pathan -0.006411 0.000926 -6.921 131992

result: Mala Karelia_HG Pathan -0.006220 0.000839 -7.414 137893

result: Mala Samara_HG Pathan -0.005400 0.000941 -5.737 83037

Mala + SHG provides the strongest signal of admixture, stronger than EHG + Mala (even Kharia + SHG is better than Mala + EHG)! As you know, SHG are predominantly WHG + EHG/ANE

Balaji said...


Thanks for the interesting information. It does suggest some WHG-like ancestry in Pathans.

However, we cannot ignore the Reich lab work. According to them, the West Eurasian populations most similar to Pathans are in decreasing order, Georgian, Armenian and Abhkasian. Also the qpAdm method seems to be a generalization of their f4 ratio method of estimating ANI. With this they have estimated ANI of Pathans to be 70%. If this is correct, then how can Pathan be 58.7% RISE_baAndrov + 32.3% Georgian which would be 91% ANI?

I think the Reich lab is overdue for revisiting this area.

Seinundzeit said...


Yet, in their first paper, they found that the closest West Eurasian population to Pashtuns were CEU (Americans of northern/northwestern European descent, from Utah). So it isn't really so clear cut. And with these new stats, we can't ignore the fact that Corded Ware are a better proxy than Georgians, and that this shows that Sintashta and Andronovo will be an even better proxy than Georgians (when it comes to f3 stats). As Matt previously noted, Andronovo are probably closer to Pashtuns than Georgians are, when it comes to d-stats. Also, we can't ignore the fact that predominantly WHG samples from Scandinavia perform better than MA1 and EHG, and provide the strongest signal of admixture for Pashtuns.

Also, the modelling implemented via qpAdm is a huge improvement and expansion of f4 ratios. This is the "cutting edge", so I think the output arrived via qpAdm needs to be taken seriously. Besides, as demonstrated via the f3 stats shown above, the qpAdm models are clearly pointing us in the right direction.

On top of that, one can model Pashtuns as Georgian + MN_Germany + EHG + Dai, and the proportions are perfectly consistent with them being 60% Sintashta/Andronovo + 30% Georgian + 10% ASI/ENA (since Sintashta/Andronovo are, fundamentally speaking, a broad mixture between Georgian-like pop + MN_Germany + EHG, it's quite easy to gauge how Pashtuns "should" look under such a model, if they really are 60% steppe-admixed). In addition, the model is quite excellent. If it was a poor model, and if it was a model that had proportions which didn't make sense for a pop that is 60% Sintashta/Andronovo, this would tell us that something strange is at work with the models showing Pashtuns to be around 60% Bronze Age steppe-admixed. But again, on the contrary, the percentages of Georgian, Germany_MN, EHG, and Dai are perfectly consistent with 60% Andronovo + 30% Georgian + 10% Dai, and the model is excellent in terms of stats.

As for Pashtuns being only 70% ANI, this never did make much sense to me (my reason for this is quite vague and mushy. Mainly, it boils down to phenotype, since I can't really imagine Pashtuns having such a high amount of South Eurasian admixture, yet showing almost no influence from this heritage on their facial features and hair form). Since qpAdm consistently has Pashtuns at around 91% to 85% West Eurasian, I think that this range captures the actual amount of "ANI" they have (and thus they have somewhere between 9%-15% ENA, most of which is ASI, and some of which is East Asian admixture from Siberia). Whatever the case, the same people behind the Reich et al. paper are behind qpAdm, and qpAdm is supposed to supersede f4 ratio estimation.

At the end of the day, I see no need to attach any lasting significance to a paper which preceded all the aDNA samples that we now have, and which was technically deficient compared to what we can now accomplish.

Finally, David has some new TreeMix graphs. If you look at the tree with 6 migration edges, the position of the HGDP Pashtuns and Kalash are very consistent with a population that is 60% Andronovo/Sintashta + 30% Caucasus-like + 10% Dai (ASI + other minor ENA admixture). That tree has Andronovo/Sintashta, Georgians, and Dai, and the Pashtun/Kalash position is congruent with them being intermediate between these populations, and pulled in various directions based on those proportions.

We can certainly agree that the Reich lab needs to revisit this region.

Matt said...

Balaji: I agree with it and it illustrates again that qpAdm does not give unique and always correct solutions. Haak were able to model Corded Ware as 35% Karelia_HG + 65% Eperstedt_MN or as 72% Yamnaya + 28% Eperstedy_MN. These are two quite different things. It was only by adding European_EN samples as outgroups that they could figure that modeling with Yamnaya was better.

Modeling Pathan as 0.587 RISE_baAndrov + 0.323 Georgian + 0.09 Dai is likely to be wrong.

Re: Outgroups, there was an interesting nugget on a thread on the Anthrogenica forums, where it was mentioned that this fit came from David using Biaka, Mbuti, Yoruba, Karitiana, Surui, Chukchi and Ulchi as the pright outgroups.

I can see why using Biaka, Mbuti, Yoruba, Karitiana, Surui, Chukchi and Ulchi helps models with Dai to not produce poor outcomes or fail.

You've basically got 3 outgroup axes here:

1. African (Yoruba, Mbuti, Biaka) -> Northeast Asian (Chukchi, Ulchi)
2. Northeast Asian->Native American (Karitiana, Surui)
3. African->Native American

- African->Asian is more or less a function of ENAness only, with a slight confounds (as an axis, this would not discriminate that well, alone, between WHG ancestry against Basal+East Asian, for instance, since both would be intermediately in ENAness).

- Native American->Northeast Asian is purely a function of ANE, as that's essentially what should be separating Native Americans from Northeast Asians

- African->Native American is a function of ENAness+ANEness, and in a sense derivative of the previous two axes (it also provides a measure of Basal Eurasianness more independent from ENA).

If you added in the extras from the "magic set" used by the Haak paper in its fits, who are outside the African+Northeast Asian+Amerind groupings, the ones that would make a difference would be:

She (Southeast Asian), Kharia (ASI+East Asian only), Papuan (Oceanian), Bougainville (Oceanian), Onge (outgroup ENA).

Adding these in would suddenly introduce new axes like -

African-Oceanian, Oceanian-East Asian, Northeast Asian-Southeast Asian, ASI-East Asian and ASI-Oceanian (via Kharia and Onge stats), etc.

On these axes Dai would be quite different from the real input into South Asians (particularly on the Oceanian-East Asian axis and ASI-other ENA axes) and this would end up with the models including them becoming a worse fit (whether its by enough to fail...).

At the same time, for all that its interesting to see it done, are the qualities of ANE shift and ENA shift alone (more or less) sufficient to identify the ancestral populations (using samples from quite far away, only one of which is actually a representative of near the right time period)?

Seinundzeit said...


Your observations concerning outgroups are quite pertinent, so I'd just like to add an empirical note. David has used a larger set of outgroups before, to model the HGDP Pashtuns (this set included Oceanians). The best fit he found for Pashtuns:

47% Yamnaya + 43.2% Starcevo_EN + 9.8% Dai

He didn't mention the chisq and tail probability, but did say:

"This is the best so far and seems like a good fit..".

So, even when one includes Oceanians, Dai provide excellent fits.

Balaji said...


Thanks for your analysis suggesting the importance of the choice of outgroups. Even with an ideal set of outgroups, sometimes there is no unique solution. Haak found that they could use qpAdm to model Yamnaya as EHG + Bedouin or EHG + Armenian or EHG + Lezgin. Davidski found that he could model Yamnaya as EHG + Georgian or EHG + Iranian. Some other extra information has to be used to decide what the best model is. In the case of Yamnays, we don't know yet.


I agree with you that the Reich lab results should not be taken as sacred and etched in stone. No doubt, we will be seeing a paper from them modifying some of their earlier findings. However, their methodology seems to be sound and generally producing consistent results. In their 2009 paper, they used Adygei, CEU and Papuan as outgourps to Indian populations. In the 2013 paper, they used YRI, Basque and Georgian as outgroups. Nevertheless, the ANI results were similar (within two standard deviations).

I think looking at Table S3(b) in the Supplemental Data of the Moorjani paper will be instructive. This is for Kashmiri Pundits, who at ANI of 65% are not too far from Pathans who have an ANI of 70%. For Kashmiri Pundits as for Pathans, the top three populations closest (using D statistics) are from the Caucasus. The next two are Cypriot and Tuscan who are Southern European. The first Eastern European population is Lithuanian which is in the thirteenth position. Andronove and Sintashta are all more like East European populations and thus not likely to be ancestral.

Here is another reason to doubt the modeling of Pathans as 91% ANI. I excerpted the following outgroup f3 statistics involving WHG from the Allentoft paper.

This is a good measure of “Europeanness”. Pathans are at number 89. Above them are all Europeans, all peoples of the Caucasus and some Middle-Easterners. Among those who are more European than Pathan are Selkup. Phenotypically Selkup are not very European as seen below.

According to HarappaWorld, Selkup are 27% NE Euro and 52% Siberian.

Seinundzeit said...


The fact remains that qpAdm has been developed by the same people who worked on Reich et al. and Moorjani et al., and has been designed as a replacement for the much coarser methods utilized in those papers (but still using f-4 ratios). Basically, it would be quite strange to stand by those results, which preceded all the aDNA samples that are now at our disposal, and which were based on a method that the same authors no longer use.

Also, all of your points are only relevant if one ignores the new stats that I posted (with Pashtuns having their best admixture signal as Scandinavian hunter gatherer + South Indian, and with Corded Ware + Onge providing a better signal of admixture than Georgian + Onge. These are some pretty important findings). In addition, based on d-stats, Pashtuns are only marginally closer to Georgians than to Sintashta, and as Matt noted, Pashtuns are probably closer to Andronovo than to Georgians. We just can't ignore these facts. Besides, the qpAdm models all point in the same direction (the best fits for Pashtuns always require Bronze Age steppe populations, and the percentages for BA steppe-related ancestry are always very high). As we have already mentioned, the qpAdm models are also backed by f3 stats + d-stats. On top of this, the uniparental data provides fairly robust support (there are many mtDNA links between the Bronze Age steppe and modern Pashtuns, and 50%-70% of the Pashtun y-DNA gene pool is directly derived from Sintashta/Andronovo).

Also, Matt's points concerning pright outgroup are very interesting in terms of theory, but they don't seem to hold in actual empirical cases. With Oceanians included in the outgroups, Dai still provide excellent fits for Pashtuns, and the ENA percentages remain in the 9%-14% range for Pashtuns.

At this point, the only real thing left to say is that the best model for Pashtun genetic ancestry (with the aDNA samples that we currently have) involves them being anywhere from 50% to 70% Sintashta/Andronovo, 40% to 20% Caucasus-like, and 10%-15% ENA, but most likely 60% Sintashta/Andronovo + 30% Caucasus-like + 10% ENA (the Caucasus-like + ENA represents their non-IE ancestry).

At the end of the day, that is all there really is too it, until we get aDNA from BMAC/IVC. The rest is pretty much just commentary.

For what it's worth, I think the discussion will only cease running in circles once we see South Central Asian aDNA, and I'm fairly sure that around 60% steppe admixture for Pashtuns (and company) will be verified when that occurs. But if not, I will certainly remember our discussion here, and I will definitely "go where the science takes us".

But again, for now, 60% Andronovo + 30% Georgian + 10% ENA is the best model we have (this is where the science is currently taking us), and it will probably remain so for quite a while.

Matt said...

S: 47% Yamnaya + 43.2% Starcevo_EN + 9.8% Dai

Hmmm. That case should have a bit more ENF and less WHG, but its still fairly similar. Hard to see why that should be unless the ASI ancestry actually does form a clade with Dai to the exclusion of Papuan (i.e. Papuan is the outgroup) though, rather than ASI as outgroup to Oceanian and East Asian. Which I guess is possible given the Denisovan contribution to Oceanian (models seem to vary).

(Looking at that in K8 terms though, although the combo is similar to the Andronovo+Georgian combination, it makes the difference of some of these fits from what we would expect from previous information really visually apparent - in K8 a pop with a ratio 47:43 (52:48) European_EN to Yamnaya in K8 ought to be near Southeast Europe, and have very different shift away from WHG and the Near East than the non-ENA part of Pashtuns.

But qpAdm isn't using any direct information about how related to ENF or WHG populations are, to beat a dead horse.

Comparing the Haak estimation of Yamnaya+WHG+LBK_EN ancestry, if they're both true that would be saying that French or English, as the same in ancestry as Pathan, except swapping extra WHG for ASI in more or less 1:1. That just doesn't seem reconcilable with the prior modeling and direct results from ADMIXTURE.)

From his comments, I expect David is iterating to find out exactly which outgroups added to pright cause the model to lose fit, and that might tell us more information about what happens in South Asia.

(Even if I'm a bit dubious about inherent limits in qpAdm atm, simply from the pleft and pright setup forcing a population to either be used to estimate admixture or involved in admixture).


Balaji: looking at Table S3(b)

Just looking at the S2 stats myself, where they are of the form "D(Onge, X; YRI, Y) where X is an Indian group shown above and Y is a West Eurasian group chosen from a panel of 43 groups including Europeans, Central Asians, Middle Easterners and Caucasian populations".

Basically what that seems to me to do is find the West Eurasian group that shares the most drift with each of the Indian cline populations relative to Onge (Yoruba is a just an outgroup and it could be any African outgroup). So for instance, Tuscan shares the greatest excess of drift with Brahmin relative to Onge, while Georgian shares the greatest excess of drift with Pathan relative to Onge.

That's pretty interesting, however there is one issue I can think of in light of what we know from the time since the publication. And that is that WHG, ANE and EHG share more drift with ENA than Near Eastern populations, and so do populations which carry more WHG, ANE and EHG.

So a population richer in WHG/ANE/EHG might have less disproportionate relatedness to an Indian one relative to Onge simply because of higher relatedness to ENA generally.

Not 100% sure how that would wash out, perhaps worth considering though.

Seinundzeit said...


Your'e comparing a model for Pashtun genetic ancestry produced via a direct comparison of genomes to what Pashtuns get on a PCA that is itself based on a supervised ADMIXTURE test. Surely that isn't a reasonable comparison (and I'm sure we know which analysis carries more weight).

This is a better model though (although the Yamnaya model can be seen as more "basal" in terms of basic components, since both Sintashta/Andronovo and Georgians have Yamnaya-related admixture, and since Starcevo_EN is an okay proxy for Neolithic ancestry in both Europe and West Asia, as far as formal stats are concerned):

46.3% Georgian + 28.5% MN_Germany + 14.5% EHG + 10.7% Dai


tail probability=0.941588

Just a fun side note, but if one assumes that 28.5% MN_Germany + 14.5% EHG is directly reflective of Andronovo ancestry, and if one assumes that Pashtuns are 32% Georgian-admixed if one isn't accounting for Georgian-like ancestry in Andronovo (32% Georgian is what they get in the best model yet), this model then has Pashtuns at around 57% Andronovo (basically, 28.5% MN_Germany + 14.5% EHG + 14% Georgian would then represent the Andronovo portion of Pashtun ancestry), which is identical to the best model David has produced (59% Andronovo + 32% Georgian + Dai). One could get the models to be identical, if we add 2% Dai to the Andronovo portion, which is quite a nice detail, since Andronovo do have some Siberian ENA admixture.

Regardless, I think this model is a good sanity check on the Sintashta/Andronovo models, and demonstrates that the Sintashta/Andronovo models are pointing us in the right direction, since it shows that Pashtuns can be modeled as a combination of the ancestral populations that went into Sintashta/Andronovo (Georgian + EHG + MN_Germany) + Georgian + Dai. Also, it shows that such a model yields percentages which are perfectly in tune with the best fit produced with Andronovo, and that this more "basal" model ("basal" in terms of ancestral populations, since Andronovo is, broadly speaking, a combination of Germany_MN + EHG + Georgian-like pop) is itself excellent in terms of stats.

For what it's worth, it also depends on what prior ADMIXTURE modelling one has seen. For example, one of Chad's supervised ADMIXTURE runs had a Corded Ware component which dominated South Asia, even though he included Yamnaya, EEF, and other West Eurasian components (Sindhi were 60% Corded Ware).

At the end of day, David has tried many different models in the past, with different sets of outgroups, and the best models are always along the same lines we've become used to (very heavy contribution from BA steppe, usually predominant portion of ancestry + Caucasus-like + always 9%-14% ENA = Pashtuns).

Balaji said...


You wrote, “So a population richer in WHG/ANE/EHG might have less disproportionate relatedness to an Indian one relative to Onge simply because of higher relatedness to ENA generally”

You are right. And this does weaken my case for the lack of significant European ancestry in Pathans.

ADMIXTURE analysis generally show a close relationship between the populations of South Asia and the Caucasus. Dienekes calls this component West Asian/Gedrosian. Zack Ajmal calls it Baloch/Caucasian.

Moreover the low position of Pathans in the “European Index” also argues against significant European ancestry in them.

I am sure that better analytic techniques and/or aDNA will soon resolve this issue.

Matt said...

@ Balaji, I kind of find that wierd to totally think of that as a European index as e.g. Surui>Lebanese and Karitiana>Syrian, but yeah, I really do think the relatedness to WHG and EEF, as you've shown by that stat for WHG, is important info to look at what the ancestral population of different West Eurasians is, not just the relatedness to the outgroups. (That's kind of really essentially what the last post's discussion with Sein about the implied position of the qpAdm combinations on PCA relative to WHG vs "real" position of Pashtuns/Pathans was about, although I was am happy to let him have the last word there.).