search this blog

Saturday, August 29, 2015

Children of the Divine Twins

The Trundholm sun chariot was found in a peat bog on the island of Zealand, Denmark, in 1902. It's thought to be an Indo-European religious artefact dating back to the Nordic Bronze Age; a representation of a horse pulling the sun and perhaps also the moon in a spoked wheel chariot. So one way or another it appears to be a reference to the Divine Twins mythos.

The Divine Twins are a key part of Indo-European religion, and they appear in the Rigveda, the most archaic of the Indo-Aryan Vedic texts.

However, because the concept includes the spoked wheel chariot, it probably can't be much older than 2,000 BC. That's because the invention of the spoked wheel chariot is more often than not credited to the Sintashta Culture of the Trans-Urals, which is dated to 2100-1800 BC.

Considering these cultural and technological ties between Bronze Age Scandinavia and South Asia, it's an interesting question whether there were also strong genetic links between these two outposts of the early Indo-European world.

Unfortunately, we don't yet have any ancient genomes from South Asia to compare to the Late Neolithic/Bronze Age (LN/BA) Nordic genomes published recently with Allentoft et al. 2015. However, we do have the Kalash.

The Kalash people of the Hindu Kush are Indo-Aryans, but they're also an extreme cultural and genetic isolate. It's likely that they haven't mixed very much with any of their neighbors since the Bronze Age. About half of them also practice a unique Vedic religion that celebrates the sun and moon (see section 1.5.4. "Creation myths" in Witzel 2002).

In the TreeMix analysis below I used three random Kalash individuals from the Human Origins dataset (HGDP00311, HGDP00313 and HGDP00315). I didn't run the whole set of 18 because they seem to create a genetic monolith that is impossible to break down and analyze correctly with TreeMix.

Note that after their Central Asian admixture is accounted for with a migration edge of 33%, the Kalash sit on what seems to be an early Indo-European branch that also includes the LN/BA Scandinavians. The full output from this analysis is available for download here.

I also employed the qpAdm software to model all of the Kalash from the Human Origins as a mixture of LN/BA Scandinavians, various ancient and present-day West Asians and Dai from south China. The ancestry proportions are listed at the bottom of the sheets. To check the success of the models consult the chisq, tail prob and standard (std.) errors.

Nordic LN-BA/Armenia BA/Dai

Nordic LN-BA/Iranian/Dai

Nordic LN-BA/Iranian Jew/Dai

Nordic LN-BA/Georgian/Dai

Now, qpAdm is easy to run but very difficult to use correctly. However, even when fumbling around like a drunk with this software, it's easy to pick up some useful hints. Clearly, even if the ancestry proportions are way off, the Kalash show stronger affinity to the ancient Scandinavians than to West Asians. Also, the models more or less reflect the TreeMix analysis above.

Thus, the answer to my question is a resounding yes; there were indeed strong genetic ties between Scandinavia and South Asia during the Bronze Age.

See also...

The Poltavka outlier

The mystery of the Sintashta people

The real thing


Alberto said...

David, could you test with EHG/Armenia_BA/Dai? I'd like to see if this preference for north European populations is due to compensating for the excess Basal Eurasian in Armenia_BA or it's a real ancestry it's picking up (alternatively, if EHG is infeasible, maybe Motala might work?).

Also the other day looking at Haak et al. paper, I saw that the best model for Unetice_EBA was with Spain_MN + Motala. I wonder if you could reproduce that result (maybe including Yamnaya as a third population to see what's the best match).


Davidski said...

These work just fine, but of course as you know it's impossible for Unetice to be a mixture of just SHG and Spain_MN. Btw, the Unetice here is a mix of Rise/Haak samples in order to get as many transversion sites as possible.

So I think this is a good example where f4 stats find a model that works, but has little to do with reality. I think that's probably because they work on allele frequency correlations deep in the phylogeny.

To model Unetice properly I'd probably need a program that can fit in Corded Ware, SHG, Germany_MN, and maybe even excess WHG. That's too much for qpAdm.

At the end of the day, to prove something like this, you need multiple lines of evidence. Finding a good fit isn't enough. You need backing from something unsupervised like TreeMix, where the algorithm has an infinite number of choices but picks a very specific one that matches a good supervised fit, as well as linguistics and archeology.

Alberto said...

Thanks David. Yes, I agree. Looking at the models from the Haak et al. paper the other day I became even more sceptical about how reliable this method is. So I wanted to see if you could reproduce the results or if qpAdm had improved since then in some way.

It basically seems that many different combinations can give good results, even if many of them don't make any sense at all, and not only historical sense, but even genetic sense (for example, a model of CW as 97% Loschbour + 3% Spain_MN was not terrible at all).

The example with Unetice was not only a good result, it was the *best* one. And while I do think that Unetice had SHG-like admixture, it's almost impossible that it was 56% Motala + 44% Spain_MN.

About TreeMix I don't know well how it works, but I've also seen some very different variants depending on the input. So for now, I think that contrasting results with other methods like IBS sharing and admixture is the most reliable way to check how good those models are (from a strictly genetic point of view, not even considering the historical impossibilities).

Matt said...

Not totally sure anyone needs to hear more or less the same comment from me again on these qpAdm fits, but just for the sake of it, as I understand it these basically mean that Kalash have a pretty similar relatedness of the "right pops" (Karitiana, Surui, Chukhi, Ulchi, Mbuti, Ju_Hoan_North) as the blends of the qpAdm finds "left pops" (Nordic LN-BA, Dai, X-West Asian).

So I would interpret these as saying that the ancient Scandinavians better approximate the Kalash in terms of their relatedness patterns to Native Americans, Africans and East Asians, than do either the West Asians (Armenia BA, Iranian, Iranian Jew, Georgian) or Southeast Asians (Dai).

That's not so surprising from context, since we know that the relatedness to MA1 and to East Asians is pretty different among Kalash than West Asians. Both Nordic LN-BA and Kalash should have a raised affinity to MA1 relative to Africans, compared to the West Asians (mediated by a slightly different set of components), and adding some Dai admixture helps sort out relatedness to East Asians, and also damps down the affinity to MA1 a bit.

There could be different stories in the phylogeny about why this is the case. At the moment this makes sense, a population X may eventually show itself in adna which can explain the Native American (ANE) affinity (relative to East Asian and African affinity) which is a better fit than any of the steppe people we have, when checked against the direct shared drift with the Kalash (e.g. D(Chimp,Kalash)(Ju_Hoan_North,PopX) or something like that). Also where Nordic LN-BA would be quite different from Kalash though, would be their relatedness to WHG, which is not explicitly taken into account here.

Matt said...

Alberto: It basically seems that many different combinations can give good results, even if many of them don't make any sense at all, and not only historical sense, but even genetic sense (for example, a model of CW as 97% Loschbour + 3% Spain_MN was not terrible at all).

qpAdm is essentially a technique to use different relatedness to world populations to work out how much ancestry a population has from different sources. "The method uses the intuition that the reference populations are not identically related to a panel of “focal” or “outgroup” populations, but share different amounts of genetic drift with them as a result of their deep evolutionary history (which is, however, not explicitly modeled)."

As they say "This method works if the reference populations are not all identically related to the outgroups, but does not work if they are identically related to the outgroups.... (Otherwise) there will be no leverage to discern whether test samples are more closely related to Ref1 or Ref2."

So depending on the outgroups chosen, the software may not be able to distinguish very well between say, 97% Loschbour+3% Spain_MN, and what was likely actually the case for Corded Ware, as 97% Loschbour+3% Spain_MN might approximate CW's relationship to outgroups pretty well, even though they would seem totally wrong if we directly measured how close CW was to Loschbour.

At the same time "These “outgroups” must be devoid of recent gene flow with either the Test or the candidate reference population, as such gene flow introduces additional common genetic drift" so the method is constrained in the choice of outgroups. But, and this is a problem, they don't really define "recent", so whether LBK_EN or WHG, for instance, is too recent to function properly as an outgroup is not really defined.

Note that they also say "Two closely related populations (e.g., LBK_EN and Spain_EN) have a virtually identical relationship to the outgroups, while two other populations (e.g., LBK_EN and Karelia_HG) have a visibly differentiated relationship. This difference is not huge, as LBK_EN and Karelia_HG are West Eurasian populations that are mostly symmetrically related to the outgroups. However, it is sufficient to infer ancestry proportions for populations of mixed LBK_EN and Karelia_HG-related ancestry, as we will show." which gives an idea of how hard it would be with groups that are even more similar in their relationship to outgroups than LBK_NE and Karelia_HG.

Davidski said...

What are your thoughts on the TreeMix model Matt?

The position of the Kalash among the BA steppe groups and the ~33% migration edge from Central Asia aren't too hard to achieve with the right samples, despite the program having an infinite number of choices in how to model the Kalash.

Matt said...

Treemix looks more convincing to me. I think it makes sense that it places Kalash at the margins most close to CW and Yamnaya out of the West Eurasian populations, then gives it this extra Central Asian edge to sort out that this placing would make it too close to all the West Eurasians and not close enough to the East Eurasians. I don't know if I totally trust it to be putting all the pre-IE Kalash ancestry in the one branch and all the IE / steppe type ancestry in the other (esp. because of relatedness of the non-HG side of Yamnaya ancestry to South-Central Asia). So for proportions probably I'm not too sure.

Just to clear up the earlier post, I do think there was a pulse of ancestry from Sintashta / Andronovo to South-Central Asia because that seems to be the mainstream linguistic / archeological history, and makes good sense from y-dna markers. Just hard to know the size of it. My comments are really relating to the proportions and how good qpAdm is at distinguishing between populations and saying that "for sure" the population we are using is the one and not another unknown / unused population, i.e. qpAdm can only check a population do so via the cumulative effect of difference in relatedness to outgroups, and doesn't have any extra / special ability to do so beyond that, so can't choose between models two different sets combined can both approximate the same thing (so then looking at other stat measures is important).

Coldmountains said...

Davidski, Thanks for posting this. Can you make a similar Treemix model for Burusho. They are non-IEs and quite low in R1a (~10%) so they should have much lower Sintashta affinity than Kalash, Pashtuns or Tajiks. It would be interesting to compare them with Kalash or Pashtuns because they are genetically similar to them.

Alberto said...

Thanks Matt, I surely want to hear your opinion about these things each time. And actually this time I think I did understand it much better how qpAdm works.

So yes, I guess it all comes to the choice of outgroups and how constrained we are about the choices. For example, in the 6 right populations in these tests, the difference in relatedness to each pair of similar pops (Karitiana and Surui, for example) would be mostly like splitting hairs. So in the end it's almost like having 3 outgroups instead of 6. And even then, since they have to be unrelated to the left populations, the difference in relatedness to them is very small in most cases.

I do think the concept is very interesting, but it does need more diversity in the outgroups, and to be able to choose some slightly more related to the left pops so that the differences are enough to be able to discern the good models from the not so good ones in a more clear way.

I guess it would take quite a bit of experimenting with outgroups to find a "sweet spot" for the kind of left populations being tested. But for this specific case (where it's not strictly West Eurasian groups, because Kalash has ENA and therefor Dai has to be in the left populations), it might be quite difficult to find the right ones.

Davidski said...


Burusho are very difficult to model with TreeMix because they have some sort of East Eurasian ancestry that is missing in the Kalash. So this tree is a bit of a mess...

But I did manage to split them into basically two parts: West Eurasian and Central Asian/East Eurasian, and it seems like the former doesn't quite land among the European BA groups. It appears to have a little too much Near Eastern input.

postneo said...

One cannot call the Kalash practices as specifically vedic. Sure there is loose resemblance, but these are true of many ethnic groups and tribes in India.
Puja is non vedic and a very late tradition. The notion of calling or invoking a deity is likely a trait that vedic shares with other indic cults. We cannot definitively say vedic is the source.
In fact if the isolated kalsh have it, it means that its pre-vedic.

We have to tread through Witzel's gobbledy gook carefully. Most Indians can see through such generic conflations.
Notions of ritual purity, isolation during menstruation are common in many cultures. The perceived impurity of muslims by pagan/brahmanical/hindu groups is post islamic and cannot be called vedic by any figment of the imagination. Its 2000 years too late. Witzel likes to heap superficial resemblances in the hunger to publish. Its a trait shared by most academia.

postneo said...

heres an excerpt from Witzel:

"Purity is very much stressed, just as in the Veda or in Hinduism. In
Kalash religion it is centered around altars, goat stables, the space between
the hearth and the back wall of houses (as modern Himalayan/Newar
practice), and also in periods of festivals; the higher up in the valley, the more
pure the location. By contrast, women (especially during menstruation and
birth), as well as death and decomposition, and the outside (Muslim) world
are impure, and, just as in the Veda (and Avesta), many cleansing ceremonies
are required, even for the average householder, if purity was infringed upon.
In Kalash ritual, the deities are seen, as in Vedic ritual (and in Hindu
Pūjā), as temporary visitors. Other than Nuristani shrines, Kalash ones..."

Seinundzeit said...

I think the TreeMix results, and the qpAdm fits, are quite in line with the d-stats that we've seen. Kurd at Anthrogenica ran many d-stats of the form (Papuan, Kalash, testpop, Mbuti) and (Papuan, Pamiri Tajik, testpop, Mbuti). Here are the top six results for the Kalash, the six strongest signals of "gene-flow" for the Kalash versus other populations (many were compared, a lot of European, South Asian, and West Asian populations, the full output can be viewed at that forum):

(Papuan, Kalash, Yamnaya_WEST, Mbuti) d-stat = -0.0873

(Papuan, Kalash, Corded_Ware, Mbuti) d-stat = -0.0860

(Papuan, Kalash, Georgian, Mbuti) d-stat = -0.0854

(Papuan, Kalash, Abkhasian, Mbuti) d-stat = -0.0850

(Papuan, Kalash, Sintashta, Mbuti) d-stat = -0.0846

(Papaun, Kalash, Andronovo, Mbuti) d-stat = -0.0830

As we can see, the Kalash are closest to Yamnaya and Corded Ware. Again, perfectly in line with both TreeMix and qpAdm.

Sintashta and Andronovo seem almost tied with Caucasus populations, and if we view the full output, are quite ahead of all West Asians and South Asians! But they are still slightly behind the Georgian/Abkhasian group. That is probably a reflection of the fact that d-stats are exceedingly sensitive, and these Sintashta/Andronovo aren't the exact steppe IE populations that provided around 60%-70% of the ancestry we see in the Kalash. I'm sure that more southern Andronovo samples are going to be closer to the Kalash (in the context of d-stats) than any population from the Caucasus, and will also probably beat Yamnaya and Corded Ware in this respect.

Interestingly, Yamnaya_EAST comes in at eleventh place, even though Yamnaya_WEST provides the strongest signal. That also supports the qpAdm and TreeMix models, because Sintashta/Andronovo are probably more closely related to western Yamnaya populations (this would include currently unsampled southern Andronovo, the ones that would meld with BMAC to form the Indo-Aryans).

Now, here are the top six results for Pamiri Tajiks:

(Papuan, Pamiri Tajik, Corded Ware, Mbuti) d-stat = -0.0933

(Papuan, Pamiri Tajik, Sintashta, Mbuti) d-stat = -0.0926

(Papuan, Pamiri Tajik, Yamnaya_West, Mbuti) d-stat = -0.0922

(Papuan, Pamiri Tajik, Lithuanian, Mbuti) d-stat = -0.0890

(Papuan, Pamiri Tajik, Afansievo, Mbuti) d-stat = -0.0888

(Papuan, Pamiri Tajiks, Georgian, Mbuti)d-stat = -0.0885

The results couldn't be clearer.

Basically, I think around 60%-70% BA steppe-related ancestry among Indo-Iranian populations from the Hindu Kush and Pamirs is perfectly reasonable.

Seinundzeit said...

Continuing from where we left off...

To be clear, there is going to be a lot of complexity. The Kalash are a Dardic isolate, with cultural affinities to neighboring Nuristanis in Afghanistan. The Nuristanis speak languages that are neither Iranian or Indian, yet are very closely related to both sides of the Indo-Iranian group. And the Dardic language spoken by the Kalash is quite distinct within the family of Indian/Indo-Aryan languages. By contrast, Pashtuns are a tribal Iranic people, ones without anything resembling a caste system (although the fact that men who cut hair, sell fruit/vegetables, build houses, or provide religious instruction are excluded from Pashtun ethnic identity, and the fact that Pashtuns conceptualize themselves in tribal contexts as solely being "da topak khalaq" ("people of the gun") probably reflects distant echoes from something like a caste system). Pamiri Tajiks are also an Iranic people, but without a tribal or caste-based societal organisation. So, all these groups must have received their LN/EBA European ancestry from different populations, at different times, across different spatio-temporal scales, with complex episodes of gene-flow between each other, and complex patterns of isolation/drift. The ways these populations relate to each other, and the ways they relate to ancient steppe populations, just aren't going to be simple.

But therein lies the beauty of f-4 stats. They cut very deep into the phylogeny, so we can construct the broad picture, until we can add much greater detail with the addition of southern Andronovo samples, Scythian samples, Hepthalite samples, Dahae samples, etc.

And looking at the broad picture, it seems that Indo-Iranian South Central Asians are around 60%-70% LN/EBA European + 20%-30% ancient West Asian (with both high ANE and high BEA) + 10%-15% ENA.

As far as the Burusho are concerned, here is a good fit David once did for Pashtuns:

59.2% Burusho + 22.8% Sintashta + 18% Georgian


tail probability=0.875585

That should give us an idea concerning where the Burusho stand in terms of BA steppe-related admixture.

Also, for Matt, if one adds MA1 to the qpAdm model, this is how Pashtuns turn out:

62% Sintashta + 16.2% BedouinB + 15.8% Dai + 6.1% MA1

Adding MA1 doesn't change the LN/EBA European percentage for Pashtuns in the least, it just makes the stats terrible, compared to what we've become used to:


tail probability=0.174448

I've seen Europeans modeled with both Yamnaya and EHG together, yielding excellent fits, so the program can handle something like that.

Coldmountains said...


Thanks for the answer. If the same models, which show high Sintashta-affinity for Kalash/Pashtuns/Tajiks, show also low Sintashta affinity for Burusho than this is indirectly proving that it is not just older Pre-IE ANE/Teal which creates this Sintashta affinity. According to this model (if I correctly understood it) they indeed look much less Sintashta-shifted and more West Asian-shifted. Quite interesting results and it is certainly not just older ANE ancestry which creates this high Sintashta affinity for some Indo-Iranians but maybe the mix of older ANE/teal + actual steppe Indo-Iranian ancestry.

Alberto said...


Don't you think that the K12 here:

Explains what those D-stats are showing in a more simple way? Basically the populations with more ANI (there called "Afanasievo").

Have you seen any IBS sharing of Yamnaya and Corded Ware? Here you can see from both, normalized in % with Masai as baseline (0%) and Estonian as 100%.

Matt said...

@ Alberto, cheers.

Sein: if one adds MA1 to the qpAdm model, this is how Pashtuns turn out:
62% Sintashta + 16.2% BedouinB + 15.8% Dai + 6.1% MA1

I think that really shows that if another population might be a better proxy than LNBA Europeans for the relatedness of Kalash / Pathans to East Asians, Americans and Africans, it certainly isn't a combination of BedouinB+MA1, which is quite interesting in itself.

We might want to think about what the properties of BedouinB+MA1 are in f4 stats with respect to Americans and African that make this so. How is this combination very distinct from what the Pathan / Kalash are, even net of differences in ENA ancestry?

It certainly isn't through any different relatedness to Europeans, as there are no Europeans in the "right pops" which affect the fit (in the Haak paper they visualize differences through graphs, which is a pretty painful process to go through).

Which is really, only the main point: that the software works as it does, is subject to all the advantages and limitations described in the Haak paper, and nothing more or less than any of that.

apostateimpressions said...

"Thus, the answer to my question is a resounding yes; there were indeed strong genetic ties between Scandinavia and South Asia during the Bronze Age."

I have been arguing this stuff for years.

The early Indian nationalist Bal Gangadhar Tilak presented an interesting theory. He argued in his book _The Arctic Home in the Vedas_ that astronomical descriptions in the Vedas indicate that the Indo-Aryans originated near the north pole.


In 1903, he wrote the book The Arctic Home in the Vedas. In it he argued that the Vedas could only have been composed in the Arctics, and the Aryan bards brought them south after the onset of the last ice age. He proposed the radically new way to determine the exact time of the Vedas.[8] He tried to calculate the time of Vedas by using the position of different Nakshatras. Positions of Nakshtras were described in different Vedas.

- At the North Pole, one sees the heavenly dome above seems to revolve around one like a potter's wheel. The stars will not rise and set but move round and round in horizontal planes during the long night of six months. The Sun, when it is above the horizon for six months, would also appear to revolve in the same way but with some difference. The Northern celestial hemisphere will alone be visible spinning round and round and the Southern half remain invisible. The Sun going into the Northern hemisphere in his annual course will appear as coming up from the South. Living in the temperate and tropical zones, however, one sees all heavenly objects rise in the East and set in the West, some passing over the head, others traveling obliquely.
- The long dawn of two months is a special and important characteristic of the North Pole. As we descend southward, the splendor and the duration of the dawn will be witnessed on a less and less magnificent scale. But the dawn occurring at the end of the long night of two, three or more months will still be unusually long, often of several days duration.
- All these characteristics of an Arctic home are clearly recorded in several Vedic hymns and Avestic passages and they come to us sometimes as the description of the prevailing conditions or the day-to-day experience or stories told by the earlier generation and sometimes as myths.


Friday, February 01, 2013 8:54:00 p.m.

Seinundzeit said...


But that is an ADMIXTURE run. And anyway, the component in question is peaking in Afanasievo, an early Indo-European population from the steppes, who also happen to be genetically identical to PIE Yamnaya! Surely that isn't a counter demonstration against the TreeMix and qpAdm output.

Also, I wouldn't take IBS too seriously. Here are my top 30 IBS results (thanks to David), using 200K SNPs, and high quality modern samples:

Kumyk 0.705775
Chechen 0.705754
Brahmin_TN 0.705622
Brahmin_UP 0.705610
Georgian_Imer 0.705488
Pathan 0.705472
Kabardin 0.705442
Kshatriya 0.705338
Lezgin 0.705321
Afghan_Pashtun 0.705270
Armenian 0.705252
Punjabi_Jat 0.705205
Abhkasian 0.705137
North_Ossetian 0.705115
Georgian 0.704983
Azeri 0.704971
Balkar 0.704964
Gujarati 0.704944
Ossetian 0.704894
Burusho 0.704846
Assyrian 0.704557
Adygei 0.704489
Georgian_Laz 0.704390
Russian_Kursk 0.704368
Afghan_Tadjik 0.704367
Kurdish 0.704255
Iranian 0.704138
Greek_Thessaly 0.704131
Meena 0.704131
Tabassaran 0.704119

Let's reflect on this for a moment. I'm a Pashtun, and every genetic analysis I've seen places me squarely among other Pashtuns. But, based on IBS, I'm closer to Georgians than I am to "Pathans", closer to Lezgins than I am to "Afghan Pashtuns", and closer to Armenians than I am Punjabi Jatts. And, my closest populations are Kumyks and Chechens. In addition, Lezgins-Tabassaran-Chechen are quite similar to each other, yet look how differently they relate to myself. Also, according to my "DNA Tribes" IBs analysis (based on perhaps 100K SNPs), my closest populations are Serbians and Croatians!


You can't add Europeans as pright pops for Pashtuns and Kalash, because obviously they aren't in anyway outgroups to Pashtuns and Kalash.

Anyway, my main point is that the d-stats provide strong support for the qpAdm models. And again, the TreeMix output provides excellent verification as well. The fact that d-stats, TreeMix, and qpAdm show the same patterns, makes for a very solid case. Not to mention the uniparental genetic data, and linguistic ties.

Seinundzeit said...


Basically, if IBS can yield output that unusual with high quality modern samples and an impressive amount of SNPs, one can only imagine what we'ed see with lower quality ancient samples and less SNPs.

A side note, but Kurd was kind enough to play with my raw-data. A good fit for myself (based on more than 100K SNPs, qpAdm) is:

61.6% Pamiri Tajik + 21.3% Punjabi + 17.1% Armenian


tail probability=0.94415

This is a good one as well:

51% GujaratiD + 49% Andronovo


tail probability=0.942919

GujaratiD also have Andronovo-related admixture, so it's interesting that I can basically be modeled as 50% GujaratiD and 50% Andronovo.

Davidski said...


Yes, based on this TreeMix output I'd say that the Burusho have less admixture from the EBA steppe and more from West Asia compared to the Kalash and Pathans, and perhaps even the Gujarati. But they do have some EBA steppe ancestry, because their branch isn't all that far from that of the LN-BA Nordics, and I'd say they acquired this from mixing with their Indo-Iranian neighbors.

Alberto said...


Yes, that's ADMIXTURE. Which is the right tool to work around the problem with Pathan/Kalash having ASI admixture that makes other methods (like D-stats) complicated to use.

The "Afanasievo" component there does peak in Afanasievo, but also in Kalash, BA_Armenian, Corded Ware, Abkhazians,... And if you look at it, it removes the European specific ancestry from Afanasievo/Yamnaya/Corded_Ware populations, and it removes the South Asian component in the Pathan/Kalash populations. And the excess NEar Eastern in Abkhazians/Georgians. Which basically leaves us with a more or less "pure" ANI. And all those populations have a lot of it (this is what the D-stats are picking up), but mixed with different components each.

Admixture is the best tool that we have for this case. And the output is clear enough.

Regarding IBS sharing, it varies from one individual to another. And your claim of lower quality ancient samples is fair enough. But we have quite a few of them that show the same pattern, and that makes the results quite solid. If you test other modern Pashtuns like yourself, you'll get also a pretty good picture. And I don't think your own results are very striking. They do seem to make good sense, in general terms. They would be surprising if they showed Estonians in the top 5. Otherwise, a Pashtun being very close to Georgian or Armenian is not surprising (the difference is minimal between Georgian and Pathan, still slightly surprising, yes, but it can be true for your specific case).

Now, I didn't know you were an ethnic Pashtun yourself, so I'm not sure if your interest here is more personal than scientific. You find the IBS results not worthy of consideration because of that detail but the qpAdm results showing you as 49% Andronovo as solid evidence (in spite of all the caveats of qpAdm outlines above by Matt, and proved by simply looking at different outputs of the program).

So maybe at this point is better to not debate about this any further just wait for more data to see what's really true and what not. I personally don't care either way. I just look for what looks more reasonable to me based on all the data available (but I'm expecting to be surprised by ancient DNA each time, since I have been surprised before).

Seinundzeit said...


The notion that ADMIXTURE is somehow determinative when it comes to anything is highly strange. By repeating it, one doesn't change the fact that this simply isn't the case.

I think the biggest issue here is Platonism. Your'e conceiving of ADMIXTURE components as Platonic essences, which is very far from what STRUCTURE/ADMIXTURE is supposed to do. Regardless, those programs are dealing with allele frequencies (parsed as panmictic populations), while qpAdm involves a direct comparison of the genomes.

Anyway, Matt didn't present any serious caveats concerning qpAdm, he simply explained how it works. If you give it some deep thought, it is a brilliant concept. You don't have to worry about drift messing things up, which is the biggest problem with ADMIXTURE.

Also, this doesn't boil down to qpAdm. There are d-stats, and there is TreeMix. The Kalash d-stats don't make sense in light of the Afanasievo ADMIXTURE run, if you actually examine the populations that consecutively provide the strongest signals.

And the Pamiri Tajiks don't show any results which would match what we see with the Afanasievo run. They are closer to Lithuanians than they are Georgians, closer to Estonians than they are to all West Asians/almost all Caucasians, and closer to Sintashta than to Yamnaya!

In terms of my own views, I used to think that Indo-Iranian incursions from the steppe had left little (if any) genetic impact in South Asia, and I assumed that the Indo-Iranian populations of South Central Asia had wholly "local" genetic origins. Looking back now, I feel quite embarrassed by this, considering the movements of Indo-Aryans, Kushans, Hepthalites, Dahae, and other steppe groups into the region (and the fact that the Indo-Aryans brought about linguistic change, which considering the spatio-temporal context must have entailed gene-flow, and the fact that the later steppe groups all produced large polities stretching across South Central Asia and South Asia, which again must have entailed genetic assimilation of those populations), and considering the fact that those steppe groups are often implicated by scholars in the ethnogenesis of modern South Central Asian populations.

The genetic data is quite clear now. TreeMix, qpAdm, and d-stats are showing us the same pattern (massive amounts of genetic ancestry from Corded Ware-derived steppe populations in South Central Asia), a pattern that makes perfect sense of the uniparental genetic data, and of the linguistic situation in the region. In addition, it ties into the historical narrative produced by scholars.

But I know people are going to continue to make the same (quite weak) arguments, until we see aDNA from eastern West Asia or northern South Asia. So I guess we all need to wait.

Alberto said...


I agree that the Y-DNA is a good argument for European migration into South Asia. In my view, the strongest argument right now. But I'd like to see southern DNA for that too, just to check.

For the rest, I'd leave it as a matter of points of view, and It's certainly better to just agree to disagree about what each sees as more or less relevant or solid. Let's wait for ancient DNA (hopefully coming soon) and then I'd gladly continue the debate with a more clear picture in front of us.

Unknown said...

Honestly, it's fascinating watching this debate unravel. For most of the rest of us we can only take point by point you guys make (Sein, Matt, Alberto, Dave), and can only wait for the aDNA evidence.

But if the central Europe to Sintashto to South Asia model is correct, Then it alleviate some of my major problems with the classic step model- namely the utter impossibility of it. I won't elaborate for now

But for now, if central asians derive more from CWC than Yamnaya, then would it explain the D-stats thread posted by David a week or so back?

Alberto said...


You mean these ones that showed higher affinity to MA1 in S-C Asian populations than was expected from their Yamnaya affinity? If it's those, I'm not sure if being more CW than Yamnaya would explain them. Yamnaya has stronger MA1 affinity than CWC, so it should be the opposite. But with D-stats there are several factors at play, so maybe someone else does see a reason for it and can comment further.

Maybe these ones could be interesting too. David, any chance of running them?

Mbuti Pathan Georgian Corded_Ware_LN
Mbuti GujaratiA Georgian Corded_Ware_LN
Mbuti GujaratiD Georgian Corded_Ware_LN
Mbuti Dai Georgian Corded_Ware_LN

Unknown said...

Yep that's what I meant, but I see they should be opposite

Balaji said...

Davidski, this is old wine in a new bottle. You had already posted much the same results earlier.

Since then you had the following excellent post.

Any modeling Kalash as 70% Nordic, 20% Armenian and 10% Dai is wrong.

Davidski said...

Nordic doesn't mean modern Scandinavian, but rather Bronze Age North-Central European.

That's what TreeMix basically shows as well, using a couple of different ways. This is what I got with a high quality dataset (note the relatively short branches even for the ancient samples). Bootstrapping the run confirmed the result.

Alberto said...


But thinking about your question again, it's not really relevant if S-C Asians are autosomally closer to Yamnaya or Corded Ware, in this case. I don't think that no one here proposes a migration from Yamnaya people to S-C Asia, because there is no reason for it. No archaeology, no Y-DNA match, and still Yamnaya is too European.

Sintashta/Andronovo are more European, indeed. But if someone from the steppe migrated to India, it had to be them. We have Y-DNA and an archaeology, that might be controversial, but it's certainly better than nothing.

So the answer in which I think we all agree here is that if Steppe to India happened, it was with Andronovo people of European origin, not with Yamnaya types. My problem with the Andronovo model mostly comes to the amount of admixture proposed. I really can't see 60-70% admixture in S-C Asia, but I could see some 10% as possible (and that's all that's needed, in many cases).

Unknown said...


Yes I see. I think 60% is realistic for Afghanistan and northern Iran, given the demographic fluxes those areas were subject to in pre- history.

Davidski said...

Only places in Asia where you're likely to see 60% admixture from the EBA steppe are parts of the Pamirs and Hindu Kush.

Shugnan Tajiks have the most, and I'm pretty sure that when all is said and done the estimate will be over 60%.

Krefter said...

" I really can't see 60-70% admixture in S-C Asia, but I could see some 10% as possible (and that's all that's needed, in many cases)."

I investigated Steppe-mtDNA at my blog in-case you haven't seen. There's definitly could be Steppe-mtDNA in S-C Asia.

Today I just remembered about 100s mtDNA data from Afghanistan I had saved. There's no raw data but there is haplogroup frequencies. There's 5%+ U5 in Hazara and Tajik, but it's 2%- in Pathan, Uzbeck, and Turkmen.

In my data U4 is most popular in Volga/Ural and next in North Europe and Afghanistan. U5a/U4 are the main Steppe-mtDNA signals that can identified with HVR1 coverage.

The main U4 clade in Afghanistan and Volga/Ural is U4a. There are quite a few U4a1s from Ancient Steppe-people and in my European data(U4a needs CR coverage to be identified, but U4a1 just needs HVR1). I don't know, that could be a connection. I'll disucss more about this in my next post at my blog.

Matt said...

Davidski: Nordic doesn't mean modern Scandinavian, but rather Bronze Age North-Central European.

Interesting line of comment. I think this does raise the question that, for all that they are not what was in existence then, what actually happens if modern Norwegian or Polish or Mordovian people are used in place of Sintashta or Nordic LN/BA? Since in theory these populations should be relatively close to Nordic LN/BA (LNBA populations may have more EHG sharing relative to WHG sharing, for at least one factor though). On a similar tip, what happens when using Basque Spanish or Hungary BA in place of Nordic LN/BA?

If these guys don't work, then that's interesting because we know there is some divergence between LNBA and modern Europeans that must cause that, while if they do work, that's info as well.

Davidski said...

Using the same transversion sites and outgroups, but this time with modern samples and Bronze Age Hungarians as references.

Seinundzeit said...

Very interesting, the Belarusian model works great in terms of stats:

83.1% Belarusian + 8.7% Georgian + 8.1% Dai


tail probability=0.973115

It seems all the northeastern and eastern European models work (although only the Belarusian model is excellent), but the models with Hungarians, Basques, Greeks, and Italians are all infeasible. That makes sense, in light of the Sintashta/Andronovo models.

Matt said...

Thanks for doing that.

Not totally sure I'm reading the statistical results right, but in terms of the prior comments, my interpretation would be it looks like the North Europeans can essentially whip up enough Native American and East Asian balance relative to one another and Africans, that only a slight level of East Asian and Georgian admixture is needed to fit the Kalash's relationship between NA (ANE), EA, African. Same as with the ancient samples.

It's interesting that the Lithuanians model with a lower percentage than the other North Europeans - perhaps their ANE sharing is too high? The LNBA Nordic sample seems between the Lithuanians and other modern North Europeans in terms of the coefficients.

At the same time, looks to me like from Hungary down south, there isn't enough raw ANE affinity relative to other components to be able to work, and more so further south, even for Basques who seem relatively rich in another sort of HG ancestry (ANE related WHG) and who have some ANE.
OTOH, Hungary BA seems like it did work. Best coefficients: 78.6% Hungary BA, 13.3% Georgian, 8.1% Dai. Although not with a very good fit.

The two Hungary BA samples fit under the Haak models as having less Yamnaya ancestry than modern Southern Europeans (except Sardinians out of the modelled populations) and more direct WHG ancestry than anyone but the Lithuanians (quite a bit more extra WHG than Basques have). In large quantities WHG also adds on ANE affinity through their shared "North/West Eurasia HG" drift, and this explains why Hungary_BA can work OK despite relatively estimated Yamnaya ancestry?

Of course, while it seems like ANE relationships have to drive this, because that is what in theory differentiates Native Americans and East Asians, there is that finding that Sein cited for us where qpAdm runs which include MA1 and a theoretically pure ENF (BedouinB) in the left pops don't seem to find good contributions of these two together in Kalash ancestry. Which I don't know exactly what that means.

Unknown said...

Random question but what does modelling it the opposite way give ?

Ie modelling Eastern Europeans as some mix Pashtun , MNE, etc

Davidski said...

It doesn't work, because there's practically no South Asian admixture in Eastern Europe, not counting some people in the Balkans with Roma admixture.

And South Asian ancestry is so unique that it can't pass off as anything else, even when using f4 stats, which is what qpAdm runs on.

Unknown said...

Gotcha, thanks

Kristiina said...

Krefter, we repeat the same arguments over and over again, but I remind you that West Siberia, North East Europe and probably also Volga-Ural (ancient/mesolithic mtDNA is not available) has U4 and U5a from a very early date. 5000 BC Karelian sites yielded U4 and U5a. 4000 BC West Siberian Baraba Steppe hunter gatherer site yielded U2e, U4 and U5a. U5a was even detected at 5000 BC Baikal site. U4, U5a and U2e cannot be considered as any evidence of European origin unless the subtype is clearly of Eastern European origin.

Kristiina said...

Afanasievo: RISE507 U5a1a1, RISE508 U5a1a1, RISE 509 T2c1a2, RISE510 J2a2a, Rise511 J2a2a

Sintashta: RISE386 J1c1b1a, RISE391 N1a1a1, RISE392 J2b1a2a, RISE394 U2e1e, RISE395 U2e1h

Andronovo: RISE500 U4d1, RISE503 U2e2, RISE505 U4a1b, RISE512 U2e1

Yamnaya: RISE546 U5a1d2b, RISE547 T2a1a, RISE548 U4, RISE550 U5a1i, RISE552 T2a1a

They do not share any mtDNA subgroups!

Moreover, Tianshan Beilu, Hami (Xinjiang) also yielded U2e, U5a and U4 (ydNA 5xN, 1xC), 4000-3300 YBP.

In Allentoft paper Afanasievo samples were dated c. 4400 - 4000 YBP, and Andronovo samples c. 3300 YBP.

I googled that Motala samples dated at 8000 ybp are said to be U2e1 as well as Uznyi Oleni Ostrov samples dated at 7500 ybp.

In the Mesolithic, U2e, U4 and U5a1 were surely found in a wide area ranging from Fennoscandia and Steppe to the Urals.

Krefter said...


U5/U4 are certainly the best markers of Steppe-maternal ancestry. T2b/H1/J1c are great markers for EEF maternal ancestry but exist in West Asia too(at under 10%). There's a possibility U5a/U4 is of non-Stepe origin but it's greater that it is Steppe-origin. Especially for Iran where more typical Siberian mtDNA is rare.

Kristiina said...

I do not disagree that U5 and U4 are typical markers of Steppe-maternal ancestry but you definitely cannot claim that Bronze Age proto-IE's have an exclusive right to U5 and U4.

IMO, U5 and U4 may ultimately have their origin in Europe but that happened before 10 kya ago and they spread to a wide area well before Sintashta and other Bronze Age cultures.

Kristiina said...

As for Iranians, their do not possess a significant amount of steppe haplogroups. Their U2e1a1 looks recent. Their U4a2a also seems very recent. Instead, U5a1g is older, and it is said that

"Notably, six of eight U5a haplotypes found in Iranians also belong to U5a1a’g and four of them belong to the very rare sub-cluster U5a1g. It has been recognized in three individuals of European (Slovakia, England) and the southern Caucasus region ancestry. Coalescence age estimates for U5a1g is about 9 kya, thus placing its origin to the Holocene.

Iranians do have A4, B4b, G2a3, C4a, C5c and D4, but, of course, they are recent.

The share of East Eurasian haplogroups introduced from Central Asia is 3.97%, and the share of U2, U4 and U5 is 7.39%, but it also includes South Asian U2 lineages, West Asian U4 lineages and Caucasian (?) U5a1g, so the share of Steppe haplogroups is not significant in Iranians.

Kristiina said...

Sintashta J1c1b1a; Iranians J1b1a, J1b1b, J1c2, J1d
Sintashta J2b1a2a; Iranians no J2
Sintashta N1a1a1; Iranians N1a1b
Sintashta U2e1e and U2e1h; Iranians U2c1, U2d, U2e1a, U2e2

Afanasievo U5a1a1; Iranians U5a1a1, U5a1g, U5a1d2, U5b2a1a2
Afanasievo T2c1a2; Iranians T2a2b, T2b, T2c1, T2d1, T2g1, T2i, T2m, T2n
Afanasievo J2a2a; Iranians no J2

Andronovo U4d1; Iranians U4a2a, U4a, U4b, U4b1, U4c1a
Andronovo U2e2; Iranians U2e2
Andronovo U2e1; Iranians U2e1a

Yamnaya U5a1d2b; Iranians U5a1d2b
Yamnaya T2a1a; Iranians no T2a1a
Yamnaya U5a1i; Iranians no U5a1i

The shared sub-haplotypes I could find were the following: between Afanasievo and Iranians (U5a1a1 + T2c1); between Andronovo and Iranians (U2e2); and between Yamnaya and Iranians U5a1d2b.

Kristiina said...

I stick to my understanding that usually it is yDNA's that migrate to new areas / spread to the exclusion of original yDNA but mtDNAs are more local. However, I admit that there have been important migrations in particular as a result of climatic changes and new ways of making a living, e.g. arrival of Neolithic farmers to Europe when the climate became warmer and farming could support a much bigger population.

Krefter said...


We're in agreement about U5a/U4. Steppe is the best candidate source for U5a/U4 in Europe and S-C Asia because EHG was introduced into both regions(except in North and East Europe) by Steppe people.

"The shared sub-haplotypes I could find were the following: between Afanasievo and Iranians (U5a1a1 + T2c1); between Andronovo and Iranians (U2e2); and between Yamnaya and Iranians U5a1d2b."

No sunrise there are some matches of full-sequenced Iranian and Steppe mtDNA. Athough we have very few fully-sequenced ancient samples and many more matches could pop up if we had more.

"I stick to my understanding that usually it is yDNA's that migrate to new areas / spread to the exclusion of original yDNA but mtDNAs are more local."

Based on HVR1-coverage IMO there is certainly some Steppe-mtDNA in S-C Asia. It doesn't look like a lot, but I just started gathering mtDNA data from that region. Steppe-ancestry might be more represented in Male-lines in S-C Asia. Nothing is for sure though.

In Europe Steppe-Y DNA also looks more poplar than Steppe-mtDNA. Pretty much all European Y DNA outside of Italy and the Balkans that isn't I1/I2a2/N1c is R1b and R1a.

Seinundzeit said...


I've just started going through some d-stats for myself (David was kind enough to run these with my raw-data), and since my ancestry is wholly from South Central Asia, what I'm getting in relation to ancient hunter gatherers might be of interest. Strongest signal of gene-flow to weakest signal of gene-flow:

(Papuan, Sein, EHG, Mbuti) d-stat = -0.0645, z-score = -12.534

(Papuan, Sein, Motala_HG, Mbuti) d-stat = -0.0606, z-score = -13.936

(Papuan, Sein, WHG, Mbuti) d-stat = -0.0549, z-score = -11.671

(Papuan, Sein, MA1, Mbuti) d-stat = -0.0538, z-score = -8.427

So it seems I'm closest to EHG, SHG, WHG, and ANE, in that order. I'm surprised that WHG is closer to myself than ANE. Also, it's interesting that the most significant z-score involves an SHG sample.

For whatever it's worth, based on the results I've looked at so far, these are the strongest signals of gene-flow:

(Papuan, Sein, Sintashta, Mbuti) d-stat = -0.0815, z-score = -18.712

(Papuan, Sein, Yamnaya, Mbuti) d-stat = -0.0803, z-score = -19.269

Kristiina said...

"Athough we have very few fully-sequenced ancient samples and many more matches could pop up if we had more."

I do not believe that the big picture will change. Steppe haplogroups will be U2e1, U2e2, U5a1a1, U5b2/U5b2a1a2, U4a/U4a2a, U4b, U4c and probably all of A4, B4b, G2a3, C4a, C5c and D4.

The share of U5, U4 and U2 may be high(er) in some ancient Indo-Iranian tribes in S-C Asia that we will discover, mutta it will not change the conclusion that the share of steppe haplogroups in modern Iranians is not significant.

It is possible that U4a and U2e are autochtonous haplogroups of Northern Central Asia and Western taiga.

Balaji said...

Let us take a look at Davidski's “Smarter Bear” plot.

All Europeans are above the red line. Georgians are below the line. Dai are not shown on this figure but since they are about equally related to WHG and MA!, they will be close to the line. But Kalash are way below the line.

How then can any one believe that Kalash = 83.1% Belarusian + 8.7% Georgian + 8.1% Dai?

This is of course impossible. Kalash are not descended from Europeans. Shared ancestry is likely due to gene flow from South Asia to Europe (ASI is a problem but that can be explained).

Cultural influence was also from India to Europe as shown by another artefact found in a Danish peat bog - the iron age Gundestrup Cauldron. Compare Plate A of the cauldron to the Pashupati seal from Mohenjodaro in the Indus Valley Civilzation Also compare Plate B the Goddess Gajalakshmi

But the qpAdm models do need an explanation. Why do Kalash seem closer to Northern Europeans than to Southern Europeans or West Asians? I will suggest an explanation in another post.

Coldmountains said...


I agree that hardly any Central or South Asian population is 80% Belarusian-like. This is too much (even for many Europeans) and some of this affinity is likely caused by something older predating IEs in South and Central Asia. But something around 30-50% for South Central Asians and 50+% for Pamiri Tajiks sounds reasonable for me. There was no direct gene flow from South Asia into (East) Europe at least in the last 10000 years if we ignore Romani people. This additional "North European" affinity is likely caused by something EHG-like, which arrived in South Central Asia earlier from the north than Steppe Indo-Iranians.

Unknown said...


And why is there an apprent lack of ASI in europeans, and indeed a "South Asian" compnent in admixtures of modern and mesolithic and Eneolithic genomes (eg Karelia, Yamnaya, etc)- that is otherwise seen in Malta, or even Oase ?

My current opinion is, if there is South asian , it would go back to late upper palaeolithic or early holocene; unless we propose that R1a is linked to the " Teal phenomenon", and it's actual origins are in central- South asia.

Anonymous said...

This is confusing a lot of people. Let me help out.

IBS is not genetic distance. Someone earlier used the data like that. Which is false and shows misunderstanding of the method. IBS is used by thousands of researchers but someone online decides it is a 'problem' method? LoL, you read the funniest stuff from self declared experts online

South Central Asian sub structure is main reason for odd results with these new tools. Best off ignoring these until ancient data. Reich and his team will not 'stand by' these numbers the way some do here, for good reason. It is stats masturbation.

Also, no gene flow from South Asia to Europe. Balaji is wrong here. The gene flow was strictly Europe to South Asia. Closest thing to South Asian mixing is the strange West Asian - South West Asian signal in Yamna and Afanasiev.

Srkz from ABF posted IBS data for Ark-Sin and Andronovo weeks back. Everyone go check those out. End of the day, it is clear most non East Asians in Asia have some steppe / European admix but the amount will vary.

capra internetensis said...

Well, there were certainly trade links between South Asia, Central Asia, and West Asia (including the Caucasus) dating back to 4000 BC or so. There are Central and South Asian trade goods in the Maykop and Novosvobodnaya kurgans. At around the same period the Geoksyur oasis culture in SW Turkmenistan shows influence from Early Harappan cultures to the south, especially in ceramic manufacture, and later on the Geoksyur people reach to Zheravshan where they interact with Kel'teminar and also flow back to the Kopet Dag. Then in the Mature Harappan period there is renewed interaction, with the Harappan colony at Shortugai in Afghanistan, and the documented trade with Mesopotamia. However, this doesn't appear to reach north of the Central Asian deserts until early/proto-BMAC trade goods appear with Sintashta and Andronovo.

There was probably at least some gene flow associated with these interactions, though it would probably be very attenuated by time it reached Europe or NC Asia. This period may pre-date the main phase of mixture between ASI and ANI, in which case any South Asian gene flow wouldn't necessarily carry significant ASI. South Asian uniparental markers are widespread at low frequencies in West and Central Asia, including the Indian mtDNA that was found in ancient Syrian remains, but it is impossible to say as yet when they arrived.

Anonymous said...


All well and true (no expertise on Asian prehistory) but trade goods do not mean gene mixture as you admit. Look at Ark-Sin remains and contrast with CW. There is barely any difference between them. If there was BMAC mixture into steppes it will be near meaningless or localized to the south.

Steppe groups mixed 'out' and not 'in'. The ancient remains confirm this. They were not localized forms held together with an identical ancestry strand. Yamna and Afanasiev are basically identical in spite of thousands of miles distance. Sample applies with later CW and Ark-Sin.

A lot of people have gotten lost in either minutia detail (BMAC into Ark-Sin is basically meaningless) or are living in wild stats fantasy worlds (South Central Asians 70% Nordic Europeans? just LoL). Smart people like you, davidski and Alberta keep things as real as they can be for now.

Unknown said...

Lol !
Thanks for your input 'mementas'

Seinundzeit said...


Very interesting, so "South Asian substructure" magically accounts for what we see with qpAdm, TreeMix, and d-stats. Your insights are very profound.

Also, it's interesting that the Reich team "won't stand by this sort of stats-masturbation", since these are the very same methods they've used to crack prehistorical European genetic dynamics, and they are being used by David in the exact same way that the Reich team has done.

A small side note, nobody was presenting IBS as a distance metric, it was a conversation about the determinative nature of IBS output when considering questions of genetic affinity, and when considering the relative relational positions of ancient samples vis–à–vis living populations. The example presented was an attempt to show that IBS couldn't be used for those purposes.

Davidski said...

OK, settle down with the insults and stuff.

It's not necessary. This will be settled sooner rather than later with ancient DNA.

Bhatt said...

There were obviously multiple migrations of steppe groups into South Asia. Kalash, while related, might be the descendants of a different group of steppe people than the Vedic Aryans that spread throughout the plains of northern South Asia. They were a R1a heavy group as can be deduced based on the high percentages among higher castes and modern IA speakers. On the contrary, the Kalash (and there ancestors, if they are unchanged since BA) have only a moderate presence of R1a (18.2%) according to Firasat 2006 with large amounts of L (25%) and H (20.5%). Since the ancestral groups would be related, they likely had similar customs but were still different enough so that the ancient inhabitants of the "Aryavarta" considered the ancient inhabitants of the modern Kalash-Nuritani dominated regions as "Mlechhas" (outcastes).

Davidski said...

The Kalash number just a few thousand people and live in isolated valleys.

So the chances that their Y-haplogroups and their frequencies closely reflect those of their steppe ancestors are zero.

Too much isolation and founder effects.