Wednesday, March 16, 2016

Sintashta, BMAC and the Indo-Iranians

I'm perusing the online archives of Harvard Sanskrit Professor Michael Witzel. The links below are worth checking out for some background info on the prehistory of Eastern Europe and Central Asia. There's a very cool map on page 6 of the second PDF.

Sintashta, BMAC and the Indo-Iranians. A query.

Linguistic Evidence for Cultural Exchange in Prehistoric Western Central Asia.

The Home of the Aryans.

Autochthonous Aryans? The Evidence from Old Indian and Iranian Texts.

Looking back, these old school linguistics articles make a lot more sense than most of the supposedly cutting-edge population genetics papers coming out at around the same time dealing with the Indo-Aryan question.

Many population geneticists back then took the view that the ancestors of the Indo-Aryans could not have spread from the European steppes to India because Y-chromosome haplogroup R1a apparently showed the greatest haplotype diversity in the Indus Valley. Well, what a load of crock that turned out to be.

See also...

The Poltavka outlier


Very funny. You have shown your class no doubt...

Thank you very much.

It is amazing how ancient genomes proved most of Witzels and others theories about Proto-Indo-Iranians/Indo-Aryans. But we still know nothing about the Pre-Aryans of South Asia. They could be just a CHG/ASI mix or something much more exotic. I am also very curious about the Pre-Aryan EHG influence in Central Asia. I guess Central Asia had some phantom populations which went extinct. Anyways BMAC and related cultures are probably the main source of South Central Asian DNA but the steppe autosomal component among South Central Asians is significant and comparable to that of many South Eurpeans.

Then it just remains to explain the origins of the other branches of the aryan (i.e. caucausian) populations, west of the Suleiman mountain-range.

If you say so.

Does the map show little blop of Srubnaya between Caspian sea nad Garagum desert?

South Asians have EHG or EHG-related ancestry. There's no if ands or buts about it. The best representative for the West Eurasian ancestors of South Asians, are Bronze age North/East/Steppe Europeans.

Look at these results. I sucked out the West Eurasian-side of two South Asian population's DNA. The West Eurasian side of South Asians, are basically Bronze age Europeans.

GujaratiA's West Eurasian side(represents 50-60% of ancestry): 53.5% Yamnaya, 24.75% Anatolia Neolithic, 18.7% CHG, 3.7% WHG @ D=0.008472.
Tajik_Ishkashim's West Eurasian side(represents 60-80% ancestry) : 68.45% Yamnaya, 26.1% Anatoilia_Neolithic, 5.45% CHG @ D: 0.013808.

It can't a coincidence a group of people from Europe carrying Y DNA R1a Z93 settled in Asia and are the best representatives of the West Eurasin-side of South Asians who have a lot of Y DNA R1a Z93. It looks like South Asians are essentially European/Kharian hyprids.

The indo-aryan influx in India seems to be linked to y-dna J. Its brother-lines G,H and I seem to have populated the areas west of the Suleiman/Taurus mountains, including Europe.

Since these haplogroups share an immediate origin (F-), its worth considering that they may explain the first spread of the IE proto-language - already as the Eurasian continent were repopulated and the modern versions of the hgs F-GHIJ spread to their respective regions.

Thue later spread of agriculture and the hgs R1a/b - from the bactrian Massagetae to the irish Picts - would obviously have reinforced this IE influence.

Late bronze-age and later movements of Indo-Arians, out of India, may explain the appearance of hg J1 - as well as the dravidian influx - west towards the Levant and north to the Caspian area. The latter probably during the akkamenid expensions.

At a crossing-point and melting-pot for these migration noe may - as JP Mallory - find the vainakh people and their culture interesting:

Their mythology and art have links to the indo-aryan as well as the "celto-aryan":

Indo-European Dispersals across the Eurasian Steppe:

"The indo-aryan influx in India seems to be linked to y-dna J"
Good joke.

If hg J is a subgroup of hg F, and a brotherclade to hgs GHI, from where did they originate - and why are they dominant in their respective areas of å common, IE mainland?

The model with MA1 works really good for S-C Asia. A remarkable thing is that is that including Yamnaya doesn't change things at all, and it still picks Srubnaya.

If Sintashta/Andonovo people spoke Indo-Iranian or not I have no idea, but this model does provide support for the hypothesis (together with R1a-Z93). Time will tell, but as a theoretical model it looks solid and the numbers are credible.

"Caucasus_HG" 22
"MA1" 17.15
"Srubnaya" 16.4
"Anatolia_Neolithic" 15.1
"Dai" 14.3
"Dravidian_India" 7.15
"Onge" 5.2
"Esan_Nigeria" 2.7
"Yamnaya_Samara" 0

"Caucasus_HG" 24.3
"Dai" 21.4
"MA1" 17.4
"Anatolia_Neolithic" 12.6
"Onge" 8.15
"Dravidian_India" 6.85
"Srubnaya" 5.6
"Esan_Nigeria" 3.7
"Yamnaya_Samara" 0

Coldmountains said...


Not sure if you are serious or trolling. R1a-Z93 is by far the most frequent Y-DNA haplogroup among Indo-Aryan Brahmins and even among many Dravidian Brahmin communities in South India (they have origins in North India). Anyways Pashtuns for example have around 5-10% of j what is less than South Europeans have. Haplogroup J was likely present in both BMAC and IVC and has Neolithic origins.

@ Krefter I don't think S/SC Asians are European/Kharian Hybrids.
Most of the ANE in South Asians does not come from CHG but from MA-1 type.
So even the prehistoric indigenous South Asians would not be 100% ASI but a mix of ANE/ASI and probably a bit of ENF. Also there seems to be 2 kinds of ASI, one which is Paniya like and the other which was brought by SE Asian rice farmers.

As for the whole R1a-Z93 , its hard to dispute its Steppe origin, it is an Indo Iranian marker.
The only paradoxes are why it is common among Baloch/Brahui who have little or no steppe ancestry, as well some isolated South Indian tribals who have none at all.

Davidski said...

Why is R1b so common in some native American tribes?

have you attempted to model West Asian (as an inclusive category) IE speakers utilizing MA1/Okunev & Srubna/Andronovo (in particular Iranians)? I'd be very curious to see how this plays out. Sorry in case you've already done this and I missed it.

Here are a few other populations:

"Anatolia_Neolithic" 39.65
"Caucasus_HG" 26.3
"MA1" 9.7
"Srubnaya" 8.4
"Dai" 6.85
"Dravidian_India" 4.5
"Esan_Nigeria" 3.1
"Onge" 1.5

"Caucasus_HG" 28.85
"Anatolia_Neolithic" 24.6
"MA1" 15.05
"Dai" 9.25
"Srubnaya" 8
"Dravidian_India" 7.7
"Esan_Nigeria" 3.7
"Onge" 2.85

"Caucasus_HG" 24.95
"MA1" 17.9
"Anatolia_Neolithic" 17.15
"Srubnaya" 14.05
"Dai" 11.4
"Dravidian_India" 9.35
"Onge" 3.1
"Esan_Nigeria" 2.1

"Caucasus_HG" 26.2
"MA1" 16.7
"Anatolia_Neolithic" 15.95
"Dai" 12.65
"Srubnaya" 10.75
"Dravidian_India" 10.5
"Onge" 4.35
"Esan_Nigeria" 2.9

"Dai" 21
"Caucasus_HG" 19.9
"Srubnaya" 16.2
"MA1" 12.15
"Anatolia_Neolithic" 9.4
"Onge" 9
"Dravidian_India" 8.55
"Esan_Nigeria" 3.8

"MA1" 22.25
"Caucasus_HG" 22.05
"Anatolia_Neolithic" 19.4
"Srubnaya" 17.8
"Dai" 11.95
"Dravidian_India" 2.95
"Onge" 2.9
"Esan_Nigeria" 0.7

"MA1" 23.05
"Srubnaya" 22.6
"Caucasus_HG" 22.45
"Anatolia_Neolithic" 18.35
"Dai" 9.75
"Dravidian_India" 1.9
"Onge" 1
"Esan_Nigeria" 0.9

With Okunevo instead of MA1 (without adding Eastern_HG), things don't change that much. Okunevo takes MA1's place, plus some good part of Dai, while Srubnaya goes a bit up:

"Okunevo" 22.6
"Caucasus_HG" 21.4
"Srubnaya" 19.45
"Anatolia_Neolithic" 15.3
"Dravidian_India" 10.6
"Dai" 4.25
"Onge" 3.8
"Esan_Nigeria" 2.6

"Anatolia_Neolithic" 38.4
"Caucasus_HG" 24.2
"Okunevo" 11.7
"Srubnaya" 11.5
"Dravidian_India" 6.4
"Esan_Nigeria" 3.3
"Dai" 2.8
"Onge" 1.7

For Armenian and Armenian_BA, with MA1 but changing Srubnaya for Yamnaya:

"Anatolia_Neolithic" 52.25
"Caucasus_HG" 25.6
"Yamnaya_Samara" 9.05
"Dravidian_India" 5.1
"Dai" 3.45
"MA1" 2.05
"Esan_Nigeria" 1.8
"Onge" 0.7

"Anatolia_Neolithic" 41.65
"Caucasus_HG" 37.1
"MA1" 13.45
"Yamnaya_Samara" 3.3
"Dai" 2.7
"Esan_Nigeria" 1.8
"Dravidian_India" 0
"Onge" 0

Keep in mind that the reason the older North Eurasian samples are reducing the admixture proportions donated by the Bronze Age steppe samples is because they're ancestral to them.

In other words, you're breaking up the steppe ancestry into more basic components and thus allowing the algorithm to create better fits by compensating for the problem that we don't yet have the perfect steppe references for all modern populations.

So the fact that Karelia_HG or MA1 are shown to donate higher admixture proportions to West and South Asians doesn't provide evidence that this type of ancestry existed in West and South Asia before the Bronze Age.

"Caucasus_HG" 23.6
"MA1" 21.95
"Dai" 21.55
"Anatolia_Neolithic" 17.3
"Srubnaya" 5.65
"Dravidian_India" 4.95
"Onge" 2.9
"Esan_Nigeria" 2.1

"Caucasus_HG" 29.45
"Anatolia_Neolithic" 24.55
"MA1" 15.7
"Dravidian_India" 10.85
"Dai" 8.15
"Srubnaya" 5.45
"Esan_Nigeria" 3.8
"Onge" 2.05

Certainly without ancient DNA from Asia it's really not possible to know anything for sure. I've settled with MA1 as the option that looks more realistic, but it's still speculative.

In any case, MA1's strong affinity to South Asians seems difficult to explain by Bronze Age steppe admixture. Sintashta/Srubnaya didn't have pure MA1, but EHG and WHG. If I add EHG and WHG instead of MA1, Srubnaya disappears, but WHG still remains with 0%, so EHG is taking by its own the WHG that Srubnaya was providing. And the fit becomes quite worse:

"Dravidian_India" 32.7
"Caucasus_HG" 24
"Anatolia_Neolithic" 18.55
"Eastern_HG" 16.65
"Dai" 6.1
"Esan_Nigeria" 1.3
"Onge" 0.7
"Srubnaya" 0
"Loschbour_WHG" 0

What really goes up is Dravidian without MA1. And honestly, some 16% Srubnaya in GujaratiA or Punjabi_Lahore is as high as it can get in any theoretical model. Above that is unrealistic.

Davidski you must be seriously delusional if you think the tonne of ANE in South Asians was all brought by Sintashta people in the Bronze Age lmao. From Pashtuns all the way to South Indian tribals, all have pretty high ANE.

You do not have the genomes of South Asian hunter gatherers or the composite Indo Iranian populations which emerged from the Oxus/BMAC and moved into South Asia and Iran, or the genomes of the IVC people.

But I never said that all of the ANE-related ancestry in South Asia arrived there with Sintashta people. What I said was that it may have arrived there during the Bronze Age, and, come to think of it, also later.

Neither Indo-European nor Burushaski are native to South Asia. Dravidian probably isn't also. All of these language groups may have arrived in South Asia during the Bronze Age, along with people rich in EHG, CHG and other components with ANE-related input.

The Okunevo people have a lot of ANE and they may have accompanied the Indo-Iranians during their migration into South Asia.

I think it is pretty safe to use MA1, plus Sintashta. What Sintashta gives us is WHG, which is lacking before the Bronze Age. When taking out Sintashta and getting low WHG in all of these groups, I think that is a verification of Sintasha ancestry actually being in the 10-40% range, which seems much more logical.

I would try modeling SC Asians as Anatolian, Bedouin, Atayal, Nganasan, Onge, MA1, and Sintashta. That should really give us a good idea of what's there.

Also, include CHG. I forgot that one.

I highly doubt Okenova people would accompany them lol. South Asia had elevated ANE long before the Bronze Age . Your problem is your projecting Europe's peopling history in a much more complicated region. Its not a simple case of CHG/ASI mixing then Steppe admixture. The ANE aspect complicates it, the indigenous population harboured MA-1 like ANE long before CHG showed up on the scene. Another thing is ASI seems to be Paniya like or the Austro Asiatic type brought by Paleo Mongolid Rice Farmers from the North East of the region.

It's possible that the Indo-Aryans had an elevated level of ANE from Okunevo one way or another, since one of the Andronovo individuals does appear to have admixture of this type.

Also, Okunevo cultural and religious input into the Indo-Aryan culture that moved into South Asia has been proposed in at least one serious paper that I've read. So perhaps you shouldn't be so dismissive of the idea? Just sayin.

The siberian-shifted Andronovans were from the most northeastern part of the Andronovo horizon and not directly related to Indo-Aryans. They were some kind of Proto-Saka. Indo-Aryans probably not entered Siberi and moved from the Ural region (here they had contact with Proto-Uralics) directly southwards into BMAC areas. L657 is totally absent anywhere in Siberia as far as I know and it were Iranics who colonized Siberia.

CHG is only part ANE (1/3), even in many South Asian groups that would account for just a 1/5 or 1/6 of the ANE. Sintashta has even less ANE than those CHG samples. So whose bringing the rest of the ANE? Uralic peoples on reindeer?

If Burusho are so Steppe derived why do they have 0% WHG??
Your own tests don't support what your saying at all lool.

I would consider Dravidian languages local, by your line of thinking they would have to an OoA age to be considered local.

There is Siberian admixture in South Central Asia. It may have come with the eastern Iranians, although I do remember reading about Okunevo influences in Vedic culture.

But in any case, Okunevo or similar groups are the most likely source of Siberian admixture in South Central Asia, simply because they're in the Altai region not all that far away from the Hindu Kush and Pamirs. If that's the case, then we should expect that a relatively high ratio of ANE was also passed on, rather than something like 90/10 East Asian/ANE, as would be the case with, say, Ulchi or Nganasan admixture.

And nah, Dravidian languages aren't normally seen as native to South Asia. Some of the more serious linguists posit a tentative link to Mesopotamia.

There are indeed two distinct layers of West Eurasian ancestry in South Asia. This shows up in genome-wide structure (steppe vs just CHG-related) Y-DNA (R1a vs J2), and mtDNA. See here...

Definitely there is Siberian-admixture in South Central Asia. Some of it is recent and of Altaic origin and some has likely a Saka source. But most of it among Tajiks for example is from assimilated Turks in my opinion. Can you link any source which mention Okunevo influences among Vedic people? I not read anything about it but would not be surprised if it is true. In my opinion Indo-Aryans did not settle in Siberia at all and Iranics arrived there first.

Well, there's this paper that someone posted in the comments here a few days ago. I think it overstates its case, and it assumes that Harappa was part Indo-Aryan, but it's food for thought anyway.

The Southern Migration of the Sayan Archaeological Complex.

The paper is very speculative and if honest I don't see direct connections with Indo- Aryans. Okunevo, Karasuk and other Siberian cultures are in my opinion not connected to Vedic migrations. Anyways Okunevo and other ANE-rich pre-Aryan cultures of Central Asia penetrated South Central Asia likely long before Indo-Iranians arrived there and this is the best explanation for the extra ANE in South Central Asia and the archaeological links to Ancient Altai . I remember reading that Shiva had Siberian pre-Aryan origins bit this is again very speculative.

"And nah, Dravidian languages aren't normally seen as native to South Asia. Some of the more serious linguists posit a tentative link to Mesopotamia."

But that is 100% speculation. There is absolutely nothing definite about the languages of South-Asia before the Bronze Age. Dravidian languages could have derived from a hunter-gatherer language. There is no way to know because they have such a recent unique common-root or bottle-neck.

The mesopotamia stuff is pure fantasy at this point.

If we actually understood the reason for the rapid spread of Dravidian languages, then we might be able to deduce more.

MA1 should obviously suck up most of yDNA R2, given that he posessed R*. If that's an ENA linkage at all is questionable - we don't yet have a clear idea where the R1-R2 split took place, that may well have happened in South Asia. If that split took place near the Altai, it appaears that R2 almost completely sought LGM refuge south of the Hindukush-Himalayas, so we would be dealing with an UP migration here. Seriously - I strongly recommend to forget about MA1 when comparing West Eurasian with South Asian genetics, his inclusion is likely to create more confusion than clarity.

The paleolithic populations of nort-east Eurasia died out during the LGM.

Thus Ko14 and MA1 represents extinct 'sidelines' of the gene-pool that survived all the three deep-freezes of the late paleolithic - with Younger Dryas as the final bottleneck.

Consequently we have to consider the modern genepool accordingly, as a result of the håndfull survivors that managed to survive the extinction and start the repopulation of the arctic and semi-arctic climate-zones, i.e. Europe and Northern Eurasia, some 11.800 years. According to the updared data from the paleo-botanisk, paleo-zological and archaeological professions.

I'm not particulary familiær with your kind of humor. Nor do I have any time for trolling, in any shape or form.

Considering the origin of the first arians that arrived to the indian subcontinent there ære many theories, but omly few answers.

Noe of them connects to the postglacial appearance og hg J, presunably J2.

During bronzeage a branch of these indo-aryans moves west, reaching the Levant and the Black Ses - from where some Movies to the Caspian area - as a result of the Achamenid ("persian") expansion.

Thue the conjunction between dravidian, elamite and semite languages.

The start of thos movements - out of India - seem to coincide with the arrival of cows, corn and milkdrinking men with R1a/b to the indian subcontinent.

A 'sparse wave' of hunter-gatherers migrating rapidly out of an arctic refugia, to various tropical populations, could have made a strong and 'disproportionate' contribution to the genetic and linguistic legacy of the various regions.

According to the data obtained by recent resrarch.

This may explain part of the initial prehistoric dispersal pattern of the Indo-European languages, according to Otte and Adams.

Accordingly they consider this possibility en par to the hypotheses invoking the spread of the IE languages by early farmers or warlike cultures.

How can a tiny arctic hunter-gatherer population make a disproportionate impact on large southern populations? Logically they'd need a significant technological edge (in which case they wouldn't be hunter-gatherers) to even make an uniparental dent.

Thus, for instance, a paleolithic arctic spread of IE is much less likely than even the neolithic farmer theory.

@ Davidski, cheers for those stats.
"Try this one. Hopefully it doesn't have any errors."

Kharia is very helpful for South Asia:

Pathan: Kharia 24.05, Caucasus_HG 22.15, Karelia_HG 12.85, BedouinB 11.4, Armenia_BA 11.25, Anatolia_Neolithic 10, MA1 3.25, Nganasan 3.2, Onge 1.15, Papuan 0.7 (distance% = 0.3939 %)

(other test pops included Atayal 0, Dai 0, Esan_Nigeria 0, Itelmen 0, Masai_Kinyawa 0, Ulchi 0, Ust_Ishim 0, Western_HG 0, Yakut 0)

Kalash: Caucasus_HG 30.1, Kharia 19, Karelia_HG 14.45, BedouinB 10.6, Anatolia_Neolithic 9.6, Armenia_BA 5.9, Nganasan 4.95, MA1 3.25, Papuan 1.4, Onge 0.75 ( distance% = 0.3852 %)

Dravidian_India: Kharia 59.9, Caucasus_HG 14.35, MA1 7.05, BedouinB 6.7, Onge 3.7, Papuan 3.25, Anatolia_Neolithic 2.65, Armenia_BA 1.55, Masai_Kinyawa 0.6, Esan_Nigeria 0.25 (distance% = 0.6455 %)

(Without Armenia_BA:
Pathan: Caucasus_HG 26.25, Kharia 24.25, BedouinB 13.55, Anatolia_Neolithic 13.3, Karelia_HG 13.1, MA1 4.35, Nganasan 3.55, Onge 1.1, Papuan 0.55 (distance% = 0.4036 %)

Dravidian India: Kharia 59.95, Caucasus_HG 14.9, BedouinB 7.3, MA1 7.3, Onge 3.65, Papuan 3.25, Anatolia_Neolithic 2.9, Esan_Nigeria 0.4, Masai_Kinyawa 0.35 (distance% = 0.6454 %))

These set of stats also seem to have given a slightly different WHG percentage as well (just by 2% or so typically) for some of the West Eurasians, not sure quite what changed exactly, at least when I tried

English_Kent - Anatolia_Neolithic 43.2, Karelia_HG 20.8, Western_HG 14.6, Caucasus_HG 12.3, BedouinB 6.3, Nganasan 2.5

Yamnaya_Samara - Karelia_HG 52.05, Caucasus_HG 29.3, Anatolia_Neolithic 14.35, Western_HG 2.9, Onge 0.85, Papuan 0.55

Germany_MN - Anatolia_Neolithic 75.95, Western_HG 20.85, BedouinB 2.05, Onge 0.9, Papuan 0.25

Still kind of an Asian % there though.

Btw, re: BedouinB's stats, David, if at all possible would you mind running of a similar set of stats as the BedouinB2 column for a few other Near Eastern populations, like Palestinian, Syrian, Jordanian, as I'm quite interested to see if the pattern of strongest stats with the Early Neolithic and present day West Mediterranean is present there as well? Those Near Eastern pops Palestinian, Syrian, Jordanian are quite low down in relatedness to BedouinB, probably because of African relatedness.

Hmm... not ridiculously bad.

"Loschbour" 77.75
"Karelia_HG" 12.1
"Anatolia_Neolithic" 10.15
"Caucasus_HG" 0


@Matt, Alberto,

A good way to figure out what ANI is, is too suck out the non-ASI in South Asian populations, then do a nMonte/4mix/etc of that non-ASI part of their ancestry. There's a way to do that I can show you.

I've done nMonte on zombie ANIs, and they all come out mostly Yamnaya with the rest being EEF and CHG/Caucasus, but I haven't included MA1 and EHG as possible ancestors.

As concerns Okunevo, there is indication that Andronovo took over Askaraly, a major tin mining area in East Kazakhstan, from them. The Askaraly tin's lead isotope signature corresponds to bronzes found in the Troas (Troy 2g, ca. 2250 BC). This is no positive proof - other, yet undiscovered mines may have had a similar signature, but for geological reasons such mines would most likely also have been located in Cntral Asia.
One Central Asian tin mine, Muhiston in W.Tajikistan, has evidence of two separate mining phases, the first one from 2400-1900 BC, the second one from the middle to the end of the 2nd mBC. Archeometallurgical investigation by a Franco-German cooperation project is on-going, and we should know soon whether Muhiston qualifies as tin supplier to the Troas, or other regions in the NE.
The second mining phase of Muhiston, as well as contemporary operation of tin mines in the Zerafshan valley between Buchara and Samarkand, has through ceramic and other finds in the mines and nearby settlements been clearly linked to Andronovo. This is an archeological confirmation of Witzel's linguistic argumentation, which considerably widens the rationale for takeover. But the first Muhiston mining phase pre-dates Andronovo.

The question of tin supply to Mesopotamia, the Aegean and Egypt has long puzzled research, as the region lacks any major tin mines. The problem even goes further - how could tin bronze develop at all, if there wasn't tin available for experimental alloying?
In contrast to copper-arsenic, copper-antimony ores etc., there are few places worldwide providing copper-tin ores, and so allowing for accidental discovery of tin bronze. Only one of these places so far has positive evidence of EBA mining - Muhiston in Tajikistan. [There are a handful of Vinca Culture early tin bronzes, but the technology apparently didn't gain traction. Copper-tin ores exist in the Panagyurishte district (BG), with mining evidence from Thracian times. WP furthermore reports a prehistoric tin-copper mine in Thailand ("citation needed")]
Thus, possibly, tin bronze was "(re-)invented" in Muhiston, which would push back that mine's exploitation somewhat further towards 3000 BC. The 2400 BC dating refers to underground galleries, while exploitation would have started as alluvial and/or open pit mining.

Who made that invention is unclear. A connection to the Caucasus and/or the Iranian plateau, with its long tradition in copper alloying, seems likely. But we don't know yet whether the first miners arrived from the West (NW Iran), or the North (Afanasievo).
While the Chinese Bronze Age only commences in earnest after 2000 BC, there are a few early, scattered NW Chinese tin bronze finds. The earliest, dated ca. 2800 BC is from SW Gansu on the upper Yellow River, and may be plausibly connected to Afanasievo influence. This indicates Afanasievo knowledge of tin bronze, and access to tin ressources, possibly from Muhiston. That knowledge apparently passed on to the subsequent Okunewo culture, which opened up additional mines in E. Kazakhstan and set in motion the Sejma-Turbino phenomenon, interpreted as having been spread by migrating metallurgists. A Sejma-Turbino metal workshop has been found on the Taymir peninsula, today's home of the Nganasans.

The "old" BMAC settlers don't appear to have played an active role in tin mining. There is, e.g., little indication that the rich and well accesible mines in the Zeravshan valley had been exploited prior to 1600 BC (Andronovo horizon). They seem, however, to have been actively involved in tin trade, building on established networks for precious stone export (Lapis Lazuli from Badakshan, NE-Afg. is well documented in Egypt from 2900 BC onwards).
An IVC colony in BMAC is archeologiclly evidenced. Many BMAC seals reveal "foreign" motives with Mesopotamian, Anatolian and Cretan analogies, suggesting cultural and trade contact, possibly also at least limited, trade-related migration.

This new datasheet gives the most sensible results so far, very solid.

Considering the fact that we don't have aDNA from Central/South/West Asia, considering the fact that we lack aDNA belonging to the exact vectors of Indo-Europeanization in those regions, and considering the fact that we are dealing with results that are arrived at via a Monte Carlo simulation, I think our fits need to involve only "basal" populations, ones that aren't mixtures of each other (at least in a shallow sense). Anyway, I've always wanted to try some models along those lines, and this input data finally allows me to do so.

These fits should give us a very good idea of where populations stand (same setup as far as reference populations are concerned, for all of them):

Central/South Asia

Tajik (Shugnan)
25.95% Caucasus_HG
22.10% MA1
20.80% Anatolia_Neolithic
9.85% Karelia_HG
9.45% Atayal
8.65% BedouinB
2.05% Onge
1.15% Western_HG

32.40% Caucasus_HG
19.50% MA1
13.35% Anatolia_Neolithic
11% Atayal
9.30% BedouinB
6.10% Ust_Ishim
5.20% Karelia_HG
3.15% Onge

27.10% Caucasus_HG
17.10% MA1
15.35% Anatolia_Neolithic
11.45% Atayal
10.75% BedouinB
8.65% Ust_Ishim
5.25% Karelia_HG
3.95% Onge

24.10% Caucasus_HG
21.70% Ust_Ishim
16.60% MA1
15.35% Atayal
12.55% Anatolia_Neolithic
7.20% Onge
1.50% Karelia_HG
1% BedouinB

35.45% Ust_Ishim
21.35% Atayal
18.35% Caucasus_HG
11.05% Onge
8.75% MA1
5.05% Anatolia_Neolithic

39.40% Atayal
39.35% Ust_Ishim
12.15% Onge
5.10% MA1
4% Caucasus_HG

West Asia

25.55% Caucasus_HG
25.30% BedouinB
24.55% Anatolia_Neolithic
7.20% Ust_Ishim
6.75% Karelia_HG
5.40% Atayal
3.80% MA1
1.45% Onge

38.10% Anatolia_Neolithic
20.45% Caucasus_HG
19.10% BedouinB
8.30% Atayal
7.85% Karelia_HG
5.10% MA1
1% Onge
0.05% Western_HG


32.20% Caucasus_HG
29.10% Anatolia_Neolithic
19.55% Karelia_HG
11.30% BedouinB
5.65% Atayal
1.60% MA1
0.60% Onge

40.25% Caucasus_HG
35.90% Anatolia_Neolithic
12.15% BedouinB
7.40% Karelia_HG
3.50% Atayal
0.80% Onge


38.25% Anatolia_Neolithic
27.75% Karelia_HG
18.80% Western_HG
11.85% Caucasus_HG
3.20% Atayal
0.15% Onge

34.60% Anatolia_Neolithic
32.30% Karelia_HG
13.95% Western_HG
10.65% Caucasus_HG
8.25% Atayal
0.25% Onge

English (Kent)
43.35% Anatolia_Neolithic
22.60% Karelia_HG
13.65% Western_HG
12.15% Caucasus_HG
6.10% BedouinB
1.65% Atayal
0.50% Onge

69.55% Anatolia_Neolithic
12.35% Western_HG
8.20% BedouinB
5% Caucasus_HG
2.95% Atayal
1.95% Karelia_HG

Observations: It seems populations in South Central Asia have some very serious amounts of North Eurasian ancestry (combined MA1 + Karelia_HG score). In this respect, they are comparable to Northern Europeans, with the Shugnan population matching Finnish people in terms of North Eurasian admixture.

Yet, this isn't the case within South Asia itself. Dravidian_India has only moderate levels, and the Kharia tribal people have even less. So, it has to be "intrusive" to the region. Ultimately, it must come from the steppe. Since these things are so finely balanced, the algorithm latches more unto MA1, but I'm sure this is the same kind of North Eurasian ancestry we see in Sintashta/Andronovo.

Also, Ust_Ishim is clearly acting as a proxy for an unsampled ancient Eurasian meta-population, one which has an important relationship with South Asia. It can't be ENA, since we have both the Onge and the Atayal here.

And as we've all noticed before, there seems to be low-to-modest levels of ENA admixture everywhere in West Eurasia.

I've sucked out the non-Dravidian-like side of Kalash and modeled it in several ways. The possible ancestors I used are: Steppe, CHG, EHG, Anatolia_Neolithic, Caucasus, Near East, Iran, Turksih. It fits as EHG+CHG+Anatolia_Neolithic+(sometimes)Caucasus, just as well as it fits as Andronovo+CHG+(Sometimes)Caucasus.

When I take out Steppe, Anatolia_Neolithic, and EHG, but keep MA1 and add WHG, it doesn't score any MA1 but scores over 20% WHG. There's no doubt the non-Dravidian side of South Asians has WHG. Could WHG have ever lived in S/C Asia? Probably not. The WHG, makes it likely their EHG signal is not local and is from Europe.

There's more CHG in the non-Dravidian side of Kalash than in anyone in the Caucasus today. So, IMO, Indus Civilization genomes will mostly be Dravidiantype+CHG.

Krefter, how do you "suck out" part of the ancestry?

As annex to the previous post, here some background reading:

1. Good general overview

2. BMAC area Andronovo tin mining

3. BMAC contacts w. Mesopotamia and EastMed

4. Early Chinese Bronze

Also interesting may be the following research on the neolithic-eneolithic transtition (4-5mBC) in (proto-)BMAC. Plant production is dominated by wheat and barley, i.e. typical Near Eastern Neolithic crops. Idols, however, show Zebu-type, i.e South Asian cattle, A significant South Asian connection is also suggested by frequent ivory finds.

In line with the thread opener, here a nice example from another Witzel paper that exemplifies the spread of words & technologies from SA to the East Med:
- Santali, Mundari (AuA) i-til "grease, fat", Sant. til-min "oilseed"
- Ved. Sanskrit: tila "Seame", taila "Sesame oil" (loan from Proto-Munda)
- S.Drav. el(lu) "Sesamum Indicum"
- Sumerian ili "sesame"
- Akkad ellu/ulu "Sesame oil"
- OGr. elai-wa "Olive" (borrowing from a Pre-Greek Med. source)->oil

Also interesting Proto-Austroasiatic *ka vs. Elam./Sum./Hith. "Ka", all meaning "fish".

Finally some speculation on the BMAC/IVC collapses, which seem to be contemporary, and were possibly caused by the same factors. Climate and epidemics have been discussed before, but the "trade collapse" point deserves some deepening:

1. Around 1700 BC, Cyprus emerged as new and dominant copper supplier to the East Med, to the detriment of Oman that had previously dominated. This cut massively into Elam's trade, and surely also affected Mesopotamian traders further down the line.

2. During the early 2nd mBC, the EastMed diversified their tin sources. The isotopic signature of Ulunburu shipwreck tin ingots points towards the Bolkardağ (Taurus) mountains. Minor sources on Sardinia and the Tuscan coast, and the not so minor NW Iberian and Erzgebirge sources need to be given consideration.
A look at the R1a-Z93 maps in Underhill 2014, which in general correlate well with tin mining locations, suggests that M780 may extend beyond India into SEA. This puts the coastal "tin belt" from Mandalay via Phuket and Maysia into Sumatra, and the large tin mines in Yunnan, into focus. Theoretically, tin ore or ingots could have relatiely easily been shipped from the coast of the Andaman Sea to the Persian Gulf. Sufficient nautical knowledge was available, as demostrated by the early Lapita (Austronesian) expansion into Micronesia.
Muhiston (W. Takj.) shows an exploitation break between 1900 and 1600 BC that might eventually relate to increasing competition. The uptake uf CA tin production after 1600 BC, during the Andronovo phase, however, demonstrates that a generally rising Eurasian tin demand also put CA back into business.

3. The Thera eruption/ Tsunami by around 1600 BCs had major effects on the EastMed. Subsequent turmoil in Egypt, and the Mykenan takeover of Crete are probably only the tip of the iceberg. Devastation of coastal areas, including destruction of many trade ships, should have been felt far into the Levante and Mesopotamia, and may have provided the final punch to an already crumbling EBA trade network that spanned from the Hindukush to the Balkans.

Perhaps you need to update or scrutinize your general comprehension of the term "hunter-gatheres"?

The boat-building and seavoyaging "hunter-gather" caucasians that emerged out of the Younger Dryas where obviously amongst the pioneers of technology, architecture, agriculture, art and academia:

To explain the emergence of these trsditions we obviously have to go back to the ancestors of the mesolithic Europeans:

Göbekli Tepe was in what was the opposite of an arctic refugia. You really shouldn't bring the immediate pre-farming stage of Fertile Crescent up to demonstrate pre-Neolithic arctic hunter-gatherers' possible exploits. We also do not have a reason to believe Neolithic pre-IE "Old Europeans" (a migration of farmers into Europe) of Gimbutas represented a population continuity with the "Really Old Europeans" such as the makers of the paleolithic Haute Garonne figurine.

Yes, those numbers look good. I'll try this latest Dstats to see what difference they make from previous ones.

One thing I don't understand, though, is why you deem MA1 as intrusive to S-C Asia, where it peaks. Certainly not intrusive from Europe, where it's absent (though ANE is present through EHG and CHG, both late arrivals to Europe). ANE in absolute terms also peaks in the Hindu Kush in Eurasia.

One interesting thing about the possible origins of ANE:

We have stats with EHG showing this:

Loschbour EHG Kinh Chimp -0.0146 -2.722 341763
Loschbour EHG Atayal Chimp -0.014 -2.551 341763
Loschbour EHG Dai Chimp -0.0137 -2.506 341763

However, when placing MA1 in the position of EHG, the stats become insignificant. This has been interpreted as ANE and WHG being equally related to ENA, while EHG having some amount of ENA admixture. I have argued before that the stats with MA1 in direct comparisons are not accurate because MA1 shows reduced affinity to *all* Eurasians compared to EHG. So I was waiting for a better ANE reference to test this. However, with this other method where the total numbers don't matter so much, I can workaround that limitation. So for example:

"Eastern_HG" 100
"Ami" 0
"Onge" 0
"Dai" 0

As expected, since WHG has less affinity to ENA than EHG. And if I do the opposite:

"Loschbour_WHG" 64.85
"Onge" 21.75
"Ami" 13.25
"Dai" 0.15

Also as expected, since EHG has higher ENA affinity as per the Dstats above. But now with MA1:

"Loschbour_WHG" 41.65
"Onge" 38.15
"Ami" 20.2
"Dai" 0

The ENA is almost double than EHG, and EHG appears as expected intermediate between WHG and ANE.

This means that ANE being equally related to ENA as WHG is not correct, and therefor not a pure sister clade of WHG. It most likely has ENA, and more related to ASI than to East Asian.

Now I wanted to test if WHG was more basal, as sometimes suggested, so I included Anatolia Neolithic:

"Eastern_HG" 100
"Ami" 0
"Onge" 0
"Anatolia_Neolithic" 0

So it doesn't really seem that WHG has any Basal Eurasian. The surprise comes when doing the opposite:

"Loschbour_WHG" 59.25
"Onge" 19.45
"Ami" 11.6
"Anatolia_Neolithic" 9.7

Interesting. Might EHG has traces of Basal Eurasian? If Anatolia Neolithic is about 40% BE, that would some 4% BE in EHG. And it's normal that a small amount goes unnoticed when it also has some ENA that might have been hiding it. Now MA1:

"Onge" 34.35
"Loschbour_WHG" 32.8
"Ami" 17.5
"Anatolia_Neolithic" 15.35

Again, EHG appearing intermediate between ANE and WHG, as expected, and MA1 showing a theoretical 6% Basal Eurasian, again possibly hidden by the ENA.

Of course this is no definitive proof, but it's an interesting hint. And putting things together:

- ANE seems to have a decent amount of ENA, more related to ASI than to East Asian (possibly related to the common ancestor of Onge and Dravidian)
- ANE could have a small amount of Basal Eurasian
- ANE peaks in the Hindu Kush among Eurasians (though it's even a bit higher in Native Americans)
- MA1 and EHG belong to haplogroup R

If I had to guess where is the origin of ANE, I'd say that it shouldn't be far away from the Hindu Kush.

For whatever it's worth, I'd be much more inclined to view EHG/ANE as local to South Central Asia, if actual South Asian populations also had huge amounts of it.

But based on the models I've tried, Indian tribals like the Kharia have much less EHG/ANE than West Asians/Caucasians, and Dravidian_India is at the lower end of the West Asian/Southern European range.

It's really among Hindu Kush/Pamir populations that this sort of ancestry becomes very noticeable. These populations equal, and sometimes even exceed, the most northern-shifted Europeans when it comes to this stream of ancestry.

Also, we have to remember that MA1 and EHG almost constitute a clade together, and EHG doesn't seem to have WHG admixture. Rather, it has much more WHG affinity in comparison to MA1, probably because it shares the same temporal space as WHG. Basically, I think MA1 is too old to be compared to WHG. Rather, a more reasonable comparison might be MA1 versus K14.

Honestly, I think we can't really ask where streams of genetic ancestry like "EHG/ANE" originate, since that might involve a category error of sorts. Anyway, human populations have always been confusingly entangled/admixed.

At the end of the day, we can only look at actual aDNA samples, and construe those samples as proximate vectors for different populational affinities. And right now, the only populations who seem to have spread this sort of ancestry are steppe populations with roots in Europe, or ancient indigenous populations on the steppe with affinities to Siberia/America (Okunevo, etc).

It's possible that aDNA from South Asia could change that, but I doubt this at the moment.

How do you than explain the presence of R2, Q and other rare North Eurasian lineages in South Asia and South Central Asia. They unlikely arrived with or after Indo-Europeans in this region. Some of Q arrived later for sure but some of the clades must be older in my opinion. Anyways I an more inclined to see much of ANE in South Central Asia as pre-Aryan and the Indo-Iranian steppe DNA between 20% - 40% among South Central Asians. This is still significant and more logical from a historical and anthropological point of view. I am open to other possibilities but South Central Asia is not really that far away from Siberia and zero ANE among Pre-Aryans would be a big surprise for me. Farmers replaced most Hunter Gathers but had in most cases some Hunter Gather ancestry in the end. Pre-Aryans of South Central Asia were in my opinion mainly CHG with 20-30% ASI but some ANE. Probably not that different from Burusho excluding there higher ENA ancestry.

batman said...

The sudden appearance of the proto-mesolithic monuments of Gobekli Tepe, right after the YD needs a paleolithic explanation.

Where do you find that paleolithic background, if not in the so-called "kulturpumpe-model" - from which we may explain Gobekli Tepe as well as the sea-borne proto-minoans and the pre-Vinca?

I agree with your priorities. But I don't understand your argument about nMonte being a Monte Carlo simulation.
Ultimately nMonte is just another method to minimize the distance, just like 4Mix.
The critical question should be: is minimizing the euclidean distance a proper method to estimate the components of a mixture?

"We also do not have a reason to believe Neolithic pre-IE "Old Europeans" (a migration of farmers into Europe) of Gimbutas represented a population continuity with the "Really Old Europeans" such as the makers of the paleolithic Haute Garonne figurine."

Except from the results of the genetic analyzis of K14 and MA1, of course - which proves that the genetic basis of the European genome, diversifying during the mesolithic and the neolithic, came out of the European paleolithic.

Foragers, herders and farmers included.

Presuming that Willerslevs team is right, of course - and my comprehension of his Danish tongue is correct.

To say that you have an extremely poor understanding of the ancient European DNA published to date would be a severe understatement.

You really need to pull your head out of your ass before you continue your commentary here.

"Krefter, how do you "suck out" part of the ancestry"

1=Assumed Ancestor's D-stat score.
2=Test population's D-stat score
3=Ghost Ancestor(part of ancestry that isn't from assumed ancestor)'s D-stat score.

X=assumed percentage of ancestry from 1(assumed ancestor)
Y=assumed percentage of ancestry from 3(ghost ancestor).


Here's an example with numbers filled in. It's for Georgians. I assume they're 40% CHG, and trying to find what their 60% non-CHG side scores in D(Chimp, Test)(Mbuti, EEF).


You can run this a million differnt times depending on how much ancestry you think test population has from ghost or assumed ancestor. If there are good outgroups, only a few assumed ancestry percentages will give realistic D-stat results for 3(Ghost ancestor).

For example, when Andronovo is modeled as coming 50%+ from assumed EEF ancestor, the D(Chimp, non-EEF)(Mbuti, EHG) score is off the charts(like 0.6). We know that isn't realistic so can ignore those results. The only results for Ghost-ancestor that make sense, is when Andronovo is modeled as 20% from assumed EEF ancestor, and the 80% non-EEF side comes out exactly like Yamnaya.

Except from misquotations, outdated conceptions, blunt sensorship and mancho-styled bashing - isn't there any intelligent way for you to discuss the ramafications of Willerslev's work, conclusions and comments?

Forget what you think Willerslev said. Stick to the ancient DNA data we have. And make sure you understand its implications. Please believe me when I tell you that right now you don't.

@ Batman

"Given that this is the only documented refugia of the Caucasian lineages of the Human Genome -it gives a certain ramafication to the furter analyzis of the implies haplotypes and groups."

Can you clarify. What is the only documented refugia of Caucasian genomes. WHich genomes ? what "Caucasians" and where , from which period ?

I also recommend reading Asko Parpola's works (Roots of Hinduism,..). It would be interesting to know how much genetically similar BMAC and IVC were to each other because I personally see a lot of similarities. The mythological conflict between Rigvedic Aryans and Dasas is in my opinion describing real historical events but I don't think it was just a conflict between Aryans and "black" Pre-Aryans. I agree with Asko Parpola that this conflict is rather about an inner Indo-Iranian rivalry between Vedic Aryans and other BMACized Indo-Iranians which adopted much of the local religions . The Dasa had both Aryan and non-Aryan names in Rig-Veda and the rigvedic description of their forts matches quite well BMAC architecture. Rigvedic Aryans were probably more conservative and that is visible in their preservation of the Deva cult which got demonized by Iranic.

"Important clues to an archaeological understanding of the Rgvedic invasion are provided by the references to the enemies of the Rgvedic Aryans. Indra and his protégés, namely the earliest Rgvedic kings, are said to have destroyed the strongholds of these enemies. When Sir Mortimer Wheeler unearthed the huge defensive walls of Harappa in 1946, he identified the Dasa forts as the fortified of the Indus Civilization (Wheeler 1947: 78-82). This hypothesis was widely accepted until 1976, when Rau published his study of relevant Vedic passages which showed that, unlike the rectangular layout of the Indus cities, the Dasa forts had circular, and often multiple concentric, walls. Moreover, the Dasa forts were not regularly inhabited cities but functioned as temporary shelters, particularly for the protection of cattle. I have argued that the Dasas, Dasyus, and Panis were actually Indo-Iranian speaking BMAC tribes, and that the battles against them described in the Rgveda took place in and around northern Bactria, before entrance to Gandhara on the eastern side of the Hindukush (Parpola 1988: 208-218)."

I just happen to know e-x-a-c-t-l-y what Willerslev said about K14 (link above). Which apparently differ from what you think he said.

Considering the ancient data so far revealed I don't see any professional results or comments that contradicts a paleolithic origin of the extant European and Eurasian genome.

Which is the only implication I have adressed, actually - based on genetical data, only.

My issue have been to connect the updates from the genetical realm to the updates and historical timelines goven by the other historical professions, as most of them seem to be a bit beside the major expertise of this blog.

Supplementing new data from the archaeological and related professions my main point is that we now have a clear-cut, archaeological timeline connecting the paleolithic boat-culture of Jersey, Hamburg, Bromme and Pertuna (12.700-15.000 bp) that connects directly to the early mesolithic culture of Orkney,Ahrensburg, Lyngby and Swidrien (12.100 - 11.500 bp) via the Scanian refugia at 12.500 bp.

Add the ramafication of K14 and you have a pinpointed chronolgy to explain the bottleneck described by Pinhasi and others - in both time and space.

Given that this is the only documented refugia of the Caucasian lineages of the Human Genome -it gives a certain ramafication to the furter analyzis of the haplogroups related to the caucasian (aryan) populations.

Moreover it settles one question about the possible origins - of the modern sequences - of the haplotypes connected to these caucasian and semi-caucasian etnicities.

Simultaniously a known, nort-western refugia sheds a new light on the old question about a possible continuity from the paleo-eurasian language to the present language-families of Eurasia. In that respect one may view Eske Willerslevs remark about the old Scandinavians, Finns and Russiansas an interesting implication. The same one may say about the Orchadian and Scandinavian aproximity to Samara and Yamna...

How these historical events, documented by old and new data alike, may show up in the genetic matrial is obviously a matter of interest to the readers of this and other blogs disussing anthropological subjects.

Since the establihment of said chronology this question can't be debunked or overlooked - by sheer ignorance or blunt arrogance.

However your stats are compiled they have to for with the historical data from a number of other professions - to reach any kind of scientific value or validation.

The refugia I just described, that survived the entire decimation period that hit all of Eurasia and N-America, from 25.000 to 12.000 ybp, which during N-America Europe and northern Asia became arctic desert, taiga and tundra. Consequently the larger lans-animals died out.

Especially critical was the last two chills, called older and younger dryas, when the very last mamoths died out from Brittain, Denmark and Estonia - where they had roughed it until 13.000 ybp. The Younger Dryas is today set to the period 12.900 - 11.800.

The human settlement that made it through this last and most critical period, in the proximity of Scania, were obviously descendants to a family-line relative to K14 and MA1.

English (Kent)
12.15% Caucasus_HG
5.00% Caucasus_HG
10.65% Caucasus_HG

Two observation, too much CHG in Finland if they never spoke IE. And yes, Sardinia should be used as a proxy for late neolithic farmers in western Europe.

@Ariele Iacopo Maggi

Anything from Volga-Kama/Volga-Oka region between 4000-5000 bp (the presumed Proto-Uralics) will probably come with significant CHG.

looking at the nMonte code was going to be something this weekend - just shows one should always start at the end and not the beginning :)

"Ultimately nMonte is just another method to minimize the distance, just like 4Mix."

Completely agree.

"The critical question should be: is minimizing the euclidean distance a proper method to estimate the components of a mixture?"

What do we have in the way of known mixtures we can test this on?

@ tchaz
Sorry tchaz, the coming weeks I want to do some practical work with nMonte.
If you are interested in the theoretical question, why don't you ask a mathematician? Maybe the answer is known already.

Matt said...

Seinundzeit: I think our fits need to involve only "basal" populations, ones that aren't mixtures of each other (at least in a shallow sense). Anyway, I've always wanted to try some models along those lines, and this input data finally allows me to do so.

This is a good point. On the converse in favour of fitting using Kharia:

- Kharia is actually from South Asia, even if it isn't exactly likely to be exactly what ASI is. For MA1 we don't have strong reason to necessarily think it was ever there.

- When we look at South Asia in ADMIXTURE or PCA, we see a single cline there, more or less from South-Central Asia to South India (with slight varying levels of ADMIXTURE components associated to early Yamnaya, Southwest Asia, East Asia).

So fits where you have more or less varying mixture between a Caucasus+Yamanya like base (with quite slight variations) mixing with a single population (Kharia) should be preferable to ones where you have a different ancient population fading in and out (Ust-Ishim and MA1).

- Also Kharia is both a row and column; MA-1 and Ust-Ishim are only rows (by necessity since they are both single samples), so hard I'd think for any fitting method (4mix or nMonte) to distinguish between them and something with quite similar relationships to columns, but not actually the same thing.

- Finally Kharia actually gives better fits as well, as shown by what happens when you allow it to be a donating population, and the nMonte script gives it in preference to MA-1 and Ust Ishim (although there is *some* MA-1 still).

I'm fairly confident that when we do (if we do ever) have adna that's a good proxy for pre-Neolithic South Asia (and Onge is not a great proxy), then we'll find much more that the none of South+South-Central Asian groups will have any extra MA-1 clade ancestry that isn't explained what is either in their recent West Eurasian (Sintashta, CHG) ancestors or more or less a single ASI population. It'll be a lot more like the fits with Kharia (except with a "proper" ASI instead of Kharia), and not much like the fits with only Ust-Ishim and MA-1.

Ariele, I really would not like to overemphasize the role of uniparental markers in the autosomal makeup of populations, but nevertheless I remind you that the Mesolitihic Oleni Ostrov site in Karelia, where Karelian HG comes from, yielded yDNA J and mtDNA H2a2b and Bolshoy Oleni Ostrov site in Kola Peninsula yielded T, all of which could easily be connected with CHG rich populations. IMO, those people definitely did not speak any IE or Uralic languages.

I don't think using Kharia is a good fit, they are more closely related with Austro Asiatic Rice Farmers who spread from the North East of the subcontinent. I would think Paniya or Veddah are better fits as they are more indigenous.

I don't think there is nearly enough ANE in CHG/Baloch like or Sintashta to account for the whopping amounts of ANE you see in some of the NW South Asian groups . Last time I checked Sintashta do not have as much ANE as most South Asian groups.

I agree with Alberto's conclusions.

While I like nMonte, there are a few things that concern me. For instance, Dstats cannot tell us which direction the gene flow is, so there can be some mistakes depending on the admixing populations you pick. Also, certain populations and those admixing populations you pick vary in their relationship to other populations. Which will affect how it selects admixing populations. Here are some examples.

Here, we see that Dai are closer to Onge than the Atayal are.
Chimp Onge Dai Atayal -0.0029 -1.157

Paniyas are a little closer to Dai than the Onge
Chimp Paniya Onge Dai 0.0044 1.229
Chimp Paniya Onge Atayal -0.0013 -0.335

All West Asians are closer to Dai than Onge, some quite significantly
Chimp BedouinB Onge Dai 0.0057 1.996
Chimp BedouinB Onge Atayal 0.0053 1.751
Chimp Anatolia_Neolithic Onge Dai 0.0058 2.058
Chimp Anatolia_Neolithic Onge Atayal 0.0058 1.969
Chimp CHG Onge Dai 0.0070 2.026
Chimp CHG Onge Atayal 0.0068 1.908
Chimp Armenian Onge Dai 0.0083 2.992
Chimp Armenian Onge Atayal 0.0066 2.242
Chimp Georgian Onge Dai 0.0091 3.260
Chimp Georgian Onge Atayal 0.0070 2.361
Chimp Iraqi_Jew Onge Dai 0.0091 3.190
Chimp Iraqi_Jew Onge Atayal 0.0076 2.559

This is where it gets interesting. Now, going from the West Asians to Paniya, Onge are very much favored as the admixing population, some are quite significant, and all would be if the SNP count were 500k, instead of 120k.

CHG Paniya Onge Dai -0.0056 -1.725
CHG Paniya Onge Atayal -0.0111 -3.207
Georgian Paniya Onge Dai -0.0064 -2.668
Georgian Paniya Onge Atayal -0.0095 -3.950
Armenian Paniya Onge Dai -0.0061 -2.510
Armenian Paniya Onge Atayal -0.0100 -4.059
Iraqi_Jew Paniya Onge Dai -0.0074 -2.809
Iraqi_Jew Paniya Onge Atayal -0.0119 -4.653
BedouinB Paniya Onge Dai -0.0035 -1.358
BedouinB Paniya Onge Atayal -0.0087 -3.405

So, again, I would exercise some caution here. Especially, those throwing out the Onge simply on Dstats across a whole genome, rather than looking at admixing sources.

To predict ANE ancestry in moderns you need to know who their non-ANE ancestors are(Correct me if I'm wrong David). Like for example, WHG is closer to MA1 than pretty modern Eurasians are, even though WHG had no MA1. The non-ANE side of differnt Eurasians is differently related to MA1.

So, we shouldn't take older ADMIXTURE ANE scores as the ultimate truth about ANE. The only people we know for a fact have ANE, are Amerindians, Siberians, North Caucasus, Europeans, and S/C Asians. EHG was directly related to MA1, and it's quite clear Siberians, North Caucasus, Europeans, and S/C Asians have EHG ancestry.

I'm hesistent to say S/C Asians have more ANE than Sintashta.

a southwards spread of Okunev material culture is the only migration out of the Bronze Age North-Central Asia that can be attested with reasonable certainty. Compared to this, the spread of Andronov/Srubna seems to have been a comparatively minor phenomenon. What appears to be most likely is that the genetic components associated with the Ponto-Caspian grasslands are something that got caught up in a pre-existing network spanning from the Minusinsk basin all the way down the Karakorum highway and - at a later point in time - west at least up to Tepe Hissar.

This doesn't preclude the possibility that something ANE-like existed in South Asia before, of course. If Bronze Age Armenians are any indication, there has been a gradual decline in much of Asia of this particular component caused by numerous migrations from various directions. For example, both the Ponto-Caspian component and the Dai component we see in northern Afghanistan & Pakistan are obviously inflated since this region was the centre of Indo-Scythian rule and subject to multiple incursions of easterners. For Armenians, it would have been a very similar dynamic (gradual genetic contribution from the north), except we have to substitute East Asian influence with southern influence resulting from Afro-Asiatic expansions.

Now all we need is ancient DNA from ancient South-Central Asia to be able to disentangle those migrations. It is obvious however that out of all those only Okunev left a lasting legacy of material culture, which after ample time of peaceful coexistence gradually merged with local traditions.

Look at that Srubnaya spot in Northeast Iran, almost exactly located where Yaz culture is. What did I say? Sintashta is not Proto Iranic. Yaz culture is much better fit archeologically and at least for West Iranic groups the Kura Araxes culture that had obvious heavy cultural
connections to Srubnaya and Yamna, and also it's collapse fits perfectly with the appearance of Mitanni/Medes and Persians, makes also sense.

Srubnaya on the other hand seems to be ancestral to a Iranic branch that isn't classified under East or West and this is Cimmerian which is pretty old. Also take account of BMAC there.

It is good to see there are many knowledgeable and objective people out there.
Here's to the hope for more aDNA to solidify the proper details of Bronze Age Eurasia.

that is Yaz culture that is.

Sintashta/Andronovo/BMAC seems more like being ancestral to Indo_Aryans.

Based on the European results we're seeing, minimizing Euclidean distance does seem to be a proper method for estimating admixture proportions. Basically, your nMonte method is quite solid, as the European results match what we have come to expect from the literature, while the West Asian/Central Asian/South Asian results are the best that we can currently produce (with the aDNA that we have).


I think it's better not to use the Kharia, if our goal is to use "basic" putative mixing populations. Mainly, because they are a contemporary population, one which has been subject to all the genetic vagaries of their geographic/historical situation. For example, we know that the Kharia have had a substantial pulse of recent East Asian ancestry (EDAR frequencies, ADMIXTURE components, etc). Also, they still display minor levels of EHG and CHG in most models.

In an ideal world, we'd have a Mesolithic Indian sample/population in these models, which I hope isn't too far off in the future. In that case, it'll be very exciting to find out what ASI really was, and how far it extends across Eurasia.

But looking at the data right now, it is quite clear that ASI wasn't really ENA, or at the very least was quite far from being predominately ENA. The best model for the Kharia has them as a mix between Atayal + Ust-Ishim + ANE/EHG + CHG + Onge. Interestingly, the Ust-Ishim signal predominates in Dravidian_India, and it also appears in every population which displays South Asian admixture/affinity in other analyses (ADMIXTURE, PCA, etc).

Atayal + Onge + EHG/ANE + CHG is much worse than what they get with Ust-Ishim, but even their best model performs pretty badly! The fact that we can't model them with current references, coupled with the predominance of a true "Basal Eurasian" (Ust-Ishim) in their admixture percentages, is a good indication that we are dealing with a third Eurasian lineage (quite distinct from both ENA and K14/WHG/EHG/MA1). Then again, perhaps a lineage that is distinct from, yet closer to, West Eurasia. In the latter scenario, the West Eurasian admixture percentages we see in South Indian tribal populations could simply reflect a greater affinity between them and West Eurasia (in comparison to them and ENA), rather than actual admixture from West Eurasia.

Regardless, if we don't aim for models involving "basal" populations, using the Kharia is perfectly fine. Here are some models for South Central Asia, they look quite sensible:

39.35% Andronovo
30.05% Kharia
17.90% Caucasus_HG
11.35% BedouinB
1.30% Anatolia_Neolithic
0.05% Atayal

41.45% Andronovo
25.65% Kharia
23.15% Caucasus_HG
8.50% BedouinB
1.25% Atayal

Tajik (Shugnan)
60.25% Andronovo
14.35% Kharia
13.50% Caucasus_HG
6.10% BedouinB
3.60% Atayal
2.20% Anatolia_Neolithic

This brings us back to having South Central Asians as being predominately LN/EBA European, with very heavy admixture from ASI (whatever that turns out to really be), and West Asia. Honestly, I think this represents the truth of how these populations stack up in terms of ancient genetic ancestry. Having Pamiri peoples at around 60% LN/EBA European, and Pashtuns/Kalash at 40% LN/EBA European, seems entirely reasonable. Also, in terms of stats, these fits are pretty good. But, that is another conversation.

Sintashta is surely not Proto-Indo-Aryan. They belonged to R1a-Z2124 what makes them more likely ancestral to Iranics. Andronovo was mainly Iranic i guess and just in the south tribes ancestral to Indo-Aryans existed. The Tazabagyab culture around The Aral Sea could be linked to Indo-Aryans.

We don't really know who was what, but generally speaking, Indo-Aryans are supposed to have come from Andronovo-Potapovka-Sintashta, while at least some Iranians, in particular Cimmerians, from Srubnaya. And the whole lot is supposed to derive from Catacomb.

In fact, I remember reading that Tazabagyab was in part of Andronovo origin, and formed when Andronovo pastoralists started moving south from the steppe on a seasonal basis, during summer I think.

Try this...

I'm not very confident in Dstats involving admixed populations (unless used in a specific way). While I'm not sure about the exact reason (I was discussing some hypothesis with Tobus that could explain it, but who knows), the thing is that while the results are quite probably technically correct, they are not telling us what we want to know from a population genetics point of view.

I think that you can see the effect I'm referring to quite clearly with the stats of populations like Jordanian or Palestinian. For example, Jordanians get the highest figure with Hungary_CA and Anatolia Neolithic. The first modern Near Eastern population that appears is Iraqi_Jew, position 50, just below Estonian. Saudi appears on 89, below Saami, BedouinB and Palestinian in 99 and 101, with Kalash in between. As technically true as this might be, it's a different truth from the one we want to know.

I didn't think hard enough yet to know for sure what kind of effect this might have in using them as columns. In general it's clear that we want to have as many columns as possible, but maybe only as long as they are good references. With Dstats, good references would mean populations that are not admixed and in which drift is the main factor. I'm aware that this is a bit utopical, but at least we can try to get as close to it as possible (with what we know and with what we have available). Though I'm not sure, maybe you've think into it harder and still think that these populations are good references as columns?

The reason for distance West Asians have with each other could be related to why East Asians are closer to Kharia and Onge than South Asians(xKharia) are. Maybe heavily admixed populations are less close in D-stats to their own kind than others are.

A good way to test this theory is to do D-stats with Latinos and African Americans. David do you have any Latino and African American genomes?

Indo-Aryans are of course derived from Corded Ware <Abashevo(?)< Andronovo but the exact location of them in the Late Bronze Age is hard to determine. I would definitely associate Indo-Aryans with R1a-Z93<R1a-L657 which is typical for modern day Brahmins and frequent anywhere in South Asia and to a lesser extent also in South Central Asia. Srubnaya, Sintashta and the Siberian Andronovans were all R1a-Z2124 what makes them not ancestral to L657 carrying Indo-Aryans. L657 Indo-Aryans were likely a small clan and finding ancient remains of them will be really hard. L657 is also younger than Z2124 and most likely entered West Asia and South Asia earlier but without ancient DNA we don't know. Also religious and tribal conflicts between Iranic and Indo-Aryan tribes ended with the extinction of Indo-Aryans anywhere west and north of the Hindukush. But I think early southern Andronovo subcultures in Tajikistan, Uzbekistan and Turkmenistan will belong to L657. Here L657 is still found today. Earlier than in Andronovo L657 will be probably found in Potapovka and in earlierst Indo-Iranian remains from the Ural region .

Which definitely begs the question "why not?"

Matt said...

@ David, thanks for those. There's mostly the same pattern in those - The non-Mbuti side of the Palestinian, Jordanian, Syrian is, I guess, most closely related to Mediterranean Europeans and EEF, more than other members of the same population Near East.

@ Sein: But looking at the data right now, it is quite clear that ASI wasn't really ENA, or at the very least was quite far from being predominately ENA.

This brings us back to having South Central Asians as being predominately LN/EBA European, with very heavy admixture from ASI (whatever that turns out to really be), and West Asia. Honestly, I think this represents the truth of how these populations stack up in terms of ancient genetic ancestry. Having Pamiri peoples at around 60% LN/EBA European, and Pashtuns/Kalash at 40% LN/EBA European, seems entirely reasonable.

Yeah, I think that's all pretty likely to be true. Although I would just say about the word "predominantly", it feels like it suggests at least a majority, while ASI seems likely to be majority ENA (and a large minority non-ENA), and the Pashtuns/Kalash at majority non-LNBA steppe (with a large minority, around 40% LNBA steppe). But that's just words, what you actually say about proportions, I think is basically reasonable and likely.

For a comparison just for my curiosity and show the fit with Andronovo is very consistent with the fits with just CHG, A_N, Karelia, and WHG, separately, if I fit Andronovo, then

Andronovo: Karelia_HG 36.65, Anatolia_Neolithic 24.55, Caucasus_HG 23.35, Western_HG 7, BedouinB 4.8, Nganasan 3.35, Onge 0.3

So if you insert that into Sein's fit

Pashtuns: Andronovo 39.35, Kharia 30.05, Caucasus_HG 17.90, BedouinB 11.35, Anatolia_Neolithic 1.30, Atayal 0.05

to replace the Andronovo proportion then you get

Pashtuns: Kharia 30.05, Caucasus_HG 27.09, Karelia_HG 14.42, BedouinB 13.24, Anatolia_Neolithic 10.96, Western_HG 2.75, Nganasan 1.32, Onge 0.12, Atayal 0.05

which fits fairly nicely with

Pathan: Caucasus_HG 26.25, Kharia 24.25, BedouinB 13.55, Anatolia_Neolithic 13.3, Karelia_HG 13.1, MA1 4.35, Nganasan 3.55, Onge 1.1, Papuan 0.55

albeit with slightly more Kharia and no MA-1.

Table of fits SCA and India:

@ Krefter: You can run this a million differnt times depending on how much ancestry you think test population has from ghost or assumed ancestor. If there are good outgroups, only a few assumed ancestry percentages will give realistic D-stat results for 3(Ghost ancestor).

I was doing some similar things with the regression equations, rather than the ratio method you describe (the regression method is might work better and it's very easy to do in Past3) and it is interesting but the problem is that it's all pretty subjective on how much of X ancestry you think a population has. I was using differences between D-stats to try and estimate %s, then using that to make the regression equation. Worth trying, just very subjective still though.

Parpolas works are truly impressive and rather important to our comprehension of the amcient relationships across Eurasia.

His reassesment of the realtionship between the Uralic and the Sumerian (as well as the Vedic) is quite intriguing - to day the least.

A similar relation have been proposed by Hungarian linguists.

Besides Sumerian they find series of cognates between fenno-ugric and Etruscan;

All of the associations of Sumerian with modern linguistic group (Basque, Uralic, Munda, Kartvelian, Dravidian, Sino-Tibetan among others have been connected to it) are fringe theories. As for Etruscan...yeah.

Yes, the effect is more visible when the admixing populations are more divergent from each other, like in the case of ENA and Basal Eurasian, or with SSA and any kind of Eurasian ancestry. Jordanian and Palestinian are good examples because they're neighbours and genetically close, so from a population genetics point of view we want to use a method that makes them cluster together. If we do something like:

D(Chimp, Jordanian; Icelandic, Palestinian)

And get a negative result, then the method is not working as we expect (I don't know for this stat exactly, but that's what the results above suggest and it's the kind of thing we've seen several times with other populations).

That's why I'm a bit unsure if it's a good idea to include the populations that are more prone to showing this effect as reference populations.

According to pairwise similarity scores Jordanians are less related to every non-SSA population than Palestinians and more related to every SSA population than Palestinians.

The Palestinians aren't as close to them as several populations with less SSA as well. So this is not just a quirk of D-stats, but a real difference in their genetic makeup. Whether it's caused by differing types of Eurasian or African in them or biases (ascertainment/SNP set) in the dataset isn't absolutely clear unless full genome sequences show the same pattern though.

Yes, I'm not saying it's a quirk of D-stats. The difference is real and the result of the stat is technically correct. But it's not the kind of information that we want to know from a population genetics point of view (I think quite obviously, in this case).

To get that information and show that Jordanians and Palestinians are very similar as populations, we can use D-stats, but we need a different strategy. Like the one we're using now to compare populations, which is all based on D-stats. The question is if we want to use the kind of populations that give these unexpected results as reference ones or not.

For whatever it's worth, in all cases, Dravidian_India and Kharia get rather poor fits (when using "basal" populations). This, coupled with the fact that Dravidian_India and Kharia need a huge helping Ust-Ishim (Ust-Ishim is the largest component for Dravidian_India, and would be for Kharia as well, if they didn't have recent East Asian ancestry), despite the presence of both Atayal and Onge, and the fact that they score substantial West Eurasian percentages, is probably indicative of them having very little in the way of ENA ancestry. Instead, much of their ancestry probably doesn't fit either ENA (which includes Onge, most of what constitutes Australian/Papuan ancestry, and East Asians) or West Eurasia (K14/WHG/EHG/ANE). This also explains why South Central Asians seem admixed with something more basal than ENA (yet still distinct from West Eurasia), in other analyses.

It'll be very interesting to find out how Mesolithic Indian samples look, when analyzed with something like TreeMix or qpGraph.

Also, based on the models above, the LN/EBA European percentage constitutes the largest share of Kalash and Pashtun admixture proportions (around 40% for both, followed by 25%/30% Kharia). In addition, the Pamiri peoples are all around 60% Andronovo in those models. That's pretty much why I stated that South Central Asians are predominately LN/EBA European, with very heavy admixture from ASI (whatever that may turn out to be) and West Asia. But we agree on this already.

"Sintashta is surely not Proto-Indo-Aryan. They belonged to R1a-Z2124 what makes them more likely ancestral to Iranics. Andronovo was mainly Iranic i guess and just in the south tribes ancestral to Indo-Aryans existed. "

We have absolutely nothing at hand that would proof Sintashta/Andronovo being ancestral to Iranics. DO we have linguistic evidence of this? No, All we have is archeological data that shows similarities to general Indo_Iranic tribes, what basically means we are dealing here with an Indo_Iranic culture.

It could represent just another reflection of an early Indo_Iranic culture that got nowadays extinct and didn't left behind much expect possibly some genetic admixture. But when we believe that map which also sees more connection between the Yaz Culture and Srubnaya instead of Andronovo. And Yaz culture being regarded as reflection of early East Iranic culture. Than this can only means at least East Iranic languages are connected to the Yaz_Srubnaya connection and not Andronovo. Also there is generally accepted strong similarities between Srubnaya and Andronovo as well I see strong similarities with the Kura-Araxes culture (Being a farming-herding culture that had wagons, had Flat graves like Srubnaya and Kurgans like Andronovo and around 3000 BC Horses, wagons and it's collapse fits perfectly with the appearance of Umman Manda, the Mitanni/Medes).

Point is, Sintashta/Andronovo is an Indo_Iranic culture but I doubt it represent the ancestors of all Iranic speakers let alone being Proto Indo-Iranic.

Point is, Sintashta/Andronovo is an Indo_Iranic culture but I doubt it represent the ancestors of all Iranic speakers let alone being Proto Indo-Iranic.

Yes ''an'' and we even don't know, if it was actually bilingual type or not..

From where did the founders of Goebli Tepe and Knossos arrive?

batman said...

Since your command of the Uralian LanguageS seems to be impeccable and your mastery of the Sumerian phonology complete, it seems clear that we can debunk Parpola as well as Horvath and Alinei.

As for etruscan, do you think we can drop the entire idea of any such language existing, outside the sabatean/latino realm?

I'm glad you finally made us aware that these uralian theories are as fringe as Majas Kurgan-hypothesis used to be. As you may know, I've long been proposing that her Kurgan-culture had Baltic roots.ønsalen

Just like the yet older Pitgrave/Single-grave-cultures.

Alberto: Yes, I'm not saying it's a quirk of D-stats. The difference is real and the result of the stat is technically correct. But it's not the kind of information that we want to know from a population genetics point of view (I think quite obviously, in this case).

To get that information and show that Jordanians and Palestinians are very similar as populations, we can use D-stats, but we need a different strategy. Like the one we're using now to compare populations, which is all based on D-stats. The question is if we want to use the kind of populations that give these unexpected results as reference ones or not.

Kind of reasoning this through myself (could be right or wrong), I think it's more that the stats here are giving us information we are looking for. Just that information is actually: how related are the test population to the non-African side of present day Levantine ancestry. and it's just that the populations who are closest to the non-African side of Levantine ancestry are *not* present day Levantines, rather they are EEF / West Mediterranean.

Or more precisely, than non-African, non-Mbuti. For example:

D(Chimp,Esan_Nigeria)(Mbuti,Palestinian/Jordanian/Syrian) - D(Chimp,Esan_Nigeria)(Mbuti,Anatolia_Neolithic)

is positive for Esan_Nigeria only. Obvious implication? The non-Mbuti side of Palestinian/Jordanian/Syrian contains genetic information (ancestry) that makes it more closely related to West Africans than Anatolia_Neolithic is.

Now, if you ran the stat D(Mbuti,Jordanian)(Mbuti,Palestinian) that *may* place them closer together than, e.g. D(Mbuti,Anatolia_Neolithic)(Mbuti,Palestinian), because it's only comparing the non-Mbuti like side of both, where D(Chimp,Jordanian)(Mbuti,Palestinian) is comparing all Jordanians' drift (i.e. drift from Chimp, including what they get from the African part of their ancestry) to Palestinians' drift from Mbuti.

Any of that make seem to make sense? Or am I off base here?

I'm hesistent to say S/C Asians have more ANE than Sintashta.

Well most South Asian groups do. After Karitiana Indians its peaking in the Hindu Kush/ NW South Asian groups ( ie Kalash, Burusho and Pathan). Sintashta are not the same as Afansievo.

South Asians look like they have a lot of ANE because most of their ancestry comes from the Caucasus and the steppe, and they carry relatively low levels of Anatolian Neolithic admixture, which is what really reduces the ANE signal in present-day Europe and even the Middle to Late Bronze Age steppe.

In fact, most of the ANE-related signal in South Asia is probably from Caucasus hunter-gatherer related populations.

South Asian hunter-gatherers might also be ANE related in some way, maybe forming a clade with a part of the ancestry carried by Caucasus hunter-gatherers?

So the seemingly high level of ANE in South Asia does not preclude the possibility of largescale migrations from the steppe into South Asia.

I think everyone's efforts have been marvellous and (if/ when it comes) aDNA will only fine tune rather than invalidate predictions.

The Z93 speaks of a steppe founder effect. However it would be interesting to see what this means in a specific archaeological south Asian context as well as within a global "IE" comparison (ie were the "conquerors", or simply integrated pastoralists who by virtue of their mobility facilitated culture & language spread)

Yes, I realized after that those stats with Jordanians and Palestinians probably also include Mbuti as an outgroup, which is obviously amplifying the effect. So what you're saying does make sense. But even if we left out an African outgroup (and use Gorilla, for example), we'd still see the effect (though not so strongly).

We've seen for example stats with Balochi and Brahui where both are closer to Armenians or Kalash than to each other, in the form: D(Chimp, Balochi; Armenian/Kalash, Brahui) with a negative result. If you model those 4 populations involved with nMonte, it will become obvious that Balochi and Brahui are indeed very similar, while Kalash and Armenians are different enough from them. Or a similar real stat that Tobus ran for me:

Chimp Pathan1 Andronovo Pashtun_Afghanistan1 -0.0097 -1.711 9554 9742 133294

From a population genetics point of view, I thin it's clear that such stat makes little sense, when Pathan and Pashtun are two modern and neighbour populations (basically the same ethnic group at both sides of the Afghan/Pakistan border) and Andronovo is a 3000+ y.o. population that no longer exists, and whose closer living relatives are NE Europeans.

So one question, going back to Jordanians and Palestinians could be: Let's assume that both of these populations have some 10% admixture from a Yoruba-like population (just for the sake of simplicity). Now, if you get a Palestinian individual and a Jordanian individual, what are the chances that both of them have ended up having the exact same Yoruba-specific mutations at the same loci? Both have 10% of them, but are they the exact same 10%? Because if they aren't (and that's what it seems by the stat results), then the D-stats have no way to know that those specific mutation came from the same population (Yoruba-like). IOW, for the D-stat it's irrelevant if one has 10% Yoruba-like and the other one has 10% Onge-like, since they are mutations that simply don't match with each other or with the non-Yoruba part of the genome of both individuals.

So it might be a problem that we're simply not giving the stat enough information to tell us what we want to know. If we provided the stat with a Yoruba-like population (like for f4 ratios), then the stats can "get" that both Jordanian and Palestinian have a more or less equal amount of mutations that come from that population. But if we just compare one individual (Palestinian) to another (Jordanian) (or even a group of individuals to another group), the stat doesn't have that information. So it tells us that Jordanian has more matches with Anatolia_Neolithic than with Palestinian, which is correct in itself. But it's not the whole picture that we need to know for population genetics. We can get that whole picture using D-stats, but we can't do it with just one D-stat alone comparing both populations one in each side of the stat (this actually leads to confusion).

That's why I think that the most reliable D-stats (and so the ones to use for the columns) should be with populations that are as unadmixed as possible. Though that's not really possible, and I'm not sure how much uncertainty using mixed ones might introduce. Maybe it won't affect much in most of the cases.

Shaikorth said...

Balochi and Brahui show considerable differences in their relation to Mbuti and Kotias, they may not be as similar as ADMIXTURE and whatever show. IBS with Kotias:












IBS with Mbuti:





















Balochis most likely are not more distant because they are more of an admixed population than Brahuis either, because most likely they are not as admixed as Bengalis, Makrani, Uzbeks and Hazaras.

Alberto said...


These results are based on D-stats, not ADMIXTURE:

"Dravidian_India" 34.05
"BedouinB" 25
"Afanasievo" 23.3
"Caucasus_HG" 15.05
"Anatolia_Neolithic" 2.6
"Dai" 0

"Dravidian_India" 33.35
"BedouinB" 28.15
"Afanasievo" 22.05
"Caucasus_HG" 16.45
"Dai" 0
"Anatolia_Neolithic" 0

"Anatolia_Neolithic" 39.4
"Caucasus_HG" 19.55
"BedouinB" 18.35
"Afanasievo" 13.25
"Dravidian_India" 7.7
"Dai" 1.75

"Afanasievo" 36.55
"Dravidian_India" 31.05
"Caucasus_HG" 14.75
"BedouinB" 7.7
"Anatolia_Neolithic" 5.95
"Dai" 4

The difference is that this is based on a lot of D-stats using a strategy where they can get all the needed information, instead of a single D-stat where they will give us a result that while technically correct, it's not the information that we want to know for population genetics.

Just imagine 2 siblings (non-twins) from the same 2 Japanese parents. If you do something like D(Chimp, Brother; Gorilla, Sister), it will show a very high shared drift. But imagine the same with 2 siblings of a Japanese father and a Yoruba mother. The shared drift will drop very significantly, because each one will inherit different Yoruba-specific mutations and different Japanese-specific mutations from each parent, even if both will be 50-50 Yoruba/Japanese mix.

So from one point of view, these latter 2 siblings will be much more genetically different than the former 2. That's true and correct. But from a population genetics point of view, we want to use a method that tells us that 2 individuals that share the exact same ancestors cluster together. It's a different kind of information, so we have to use a different method to get that information.

@ Alberto
"That's why I think that the most reliable D-stats (and so the ones to use for the columns) should be with populations that are as unadmixed as possible."

I understand your argument. But I would point to the fact that the argument does not apply if the Dstat is the average over a lot of samples.
Also I would suggest to give some pops a 'prominence status', for instance the European Bell Beakers.

@Alberto, in that example the siblings should be equally related to Yoruba and Japanese though. This is not what we see with Balochi and Brahui, who show quite differentiated relation to Kotias and Mbuti, in IBS which is not dependant on outgroups and unlike nmonte doesn't use the double outgroup (primate+african) method either. FWIW I expect there might be some changes again when whole-genome sequences are used.

Alberto said...


in that example the siblings should be equally related to Yoruba and Japanese though.

Exactly. And that's what we really want to know. That's why a single D-stat is not that helpful when comparing admixed populations.

This is not what we see with Balochi and Brahui, who show quite differentiated relation to Kotias and Mbuti, in IBS which is not dependant on outgroups and unlike nmonte doesn't use the double outgroup

It could be related to individual variation, I don't know. If we could run those IBS stats with 100 individuals from each population and average them, it would probably be more consistent. D-stats with double outgroup might not be ideal, and IBS is in general quite reliable, I agree with that. Whatever ideas to improve the methods are welcome.

(But in any case, this is not about Balochi and Brahui specifically. See the stat with Pathan and Pashtun too. Similar case, even more obvious as they are compared to Andronovo).


I understand your argument. But I would point to the fact that the argument does not apply if the Dstat is the average over a lot of samples.

I thought so too. But it was hard to test for the simple reason that we don't have genomes from 500 Pathan and 500 Pashtun individuals (or such numbers from any other population). Testing with only 1 individual or using a small group (3 and 19 samples each group) made a small difference, but not quite enough. So for now I don't know of any workaround except trying to use the unadmixed populations when possible, and with those are references make indirect comparisons as we are doing with these D-stats + nMonte.

As I mentioned earlier, there is not enough ANE in CHG to account for the amounts they have. CHG forms a part of their genome but not all of it.
Steppe peoples mixed up with larger Southern populations of the BMAC and we do not have their genomes. Just reconstructions, and they more or less look like Tajiks and Pashtuns you find today. Also as you move deeper into South Asia proper, save for some Upper Caste groups in the North West, steppe admixture drastically falls. So steppe admixture cannot account for that.

Alberto: But even if we left out an African outgroup (and use Gorilla, for example), we'd still see the effect (though not so strongly).

I'm not so sure that we would. I think we'd have to run something like the sets D:

Chimp Anatolia_Neolithic Chimp Jordanian
Chimp Hungary_CA Chimp Jordanian
Chimp Syrian Chimp Jordanian
Chimp Palestinian Chimp Jordanian

Mbuti Anatolia_Neolithic Mbuti Jordanian
Mbuti Hungary_CA Mbuti Jordanian
Mbuti Syrian Mbuti Jordanian
Mbuti Palestinian Mbuti Jordanian

Yoruba Anatolia_Neolithic Yoruba Jordanian
Yoruba Hungary_CA Yoruba Jordanian
Yoruba Syrian Yoruba Jordanian
Yoruba Palestinian Yoruba Jordanian

to be sure. If that's still producing closer stats with the EEF and Levantine populations than between one another, might need some more thought.

Although even then...

Now, if you get a Palestinian individual and a Jordanian individual, what are the chances that both of them have ended up having the exact same Yoruba-specific mutations at the same loci? Both have 10% of them, but are they the exact same 10%?

Like Huijbregts, I would think this should average out (over lots of locii, and both if not individual samples), even with quite low sample sizes.

Unless there is very strong African substructure here, such that very slight variances in the exact African population are very important.

Shaikorth said...

Africans do have high variation and the African in Jordanians and Palestinians is not from a single African group. The effects of that complicate things further.

Re: those Balochi vs Brahui IBS numbers, I don't think individual variation in the HGDP samples would be enough to cause differences that high (higher than Georgian-Uzbek in relation to Mbuti for instance).

As I said, I did try to check if by testing a single individual or a group would make a difference. This stat is with one single Pashtun and one single Pathan (thanks to Tobus for testing it):

Chimp Pathan1 Andronovo Pashtun_Afghanistan1 -0.0097 -1.711 9554 9742 133294

This other one is with 19 Pathan and 3 Pashtun:

Chimp Pathan Andronovo Pashtun_Afghanistan -0.0065 -1.926 9679 9806 134262

There is a difference, but it's really not quite enough. Maybe if we had 50 of each, or 200 (who knows how many would be needed), we could get the stat to become significantly positive, but I'm not sure we can even test that.

As for the possible African variation, yes, that's a possibility. Though it doesn't look like the most likely thing that very different African groups mixed with Palestinians and with Jordanians. Even more unlikely is that very divergent ASI groups mixed with Pathan and Pashtun.

So I'll try to put it another way. Look at those D-stats comparing directly Pathan to Pashtun and Andronovo. They are giving us a result that is correct. But we can use a different strategy too. For example, if we ran stats in the form: D(Pathan, Pashtun; X, Mbuti), where X are a number of different populations (WHG, SHG, EHG, CHG, Anatolia_Neolithic, Selkup, Ulchi, Nganasan, Karitiana, Han, Ami, Papuan, Onge... We could even include modern West Eurasian pops, like Sardinian, Basque, Lithuanian, Lezgin, Georgian, Armenian, BedouinB,...) quite probably not even one of the stats would be significant. Then we run stats in the form: D(Pathan, Andronovo; X, Mbuti), where X are the same populations mentioned. In this second case, I think that ALL of the stats would be significant.

So we are in both cases using D-stats to check how similar Pathans are to Pashtun and to Andronovo. The first strategy tells us that Pathans are closer to Andronovo. The second one tells us that Pathans are very similar to Pashtun, and quite dissimilar to Andronovo.

All the stats are correct, obviously. But they are telling us different things. The information that is useful for our purposes is the one we get using the second strategy. The information we get from the first one is not very useful (and actually quite deceiving).

So this is what's all about. Using the D-stats in a way that will help us to know what we want to know, and avoiding using them in a way that will tell us something that we don't want to know and is deceiving for our purposes.

(BTW, the effect is more obvious with populations that are admixed with very divergent branches, but it actually affects all admixed populations. This is probably why all modern Europeans share more drift -on direct comparison or f3 stat- with WHG than with Bell Beakers, even if Bell Beakers as a population are quite more similar to modern Europeans than WHG are).

Hey, Alberto, just to comment on this because I didn't want you to get the impression I was totally ignoring that.

The thing is I don't really know how relevant that result is to the outgroup stat result, or whether there would be a large and systematic effect there, whether the numbers of markers etc could have an effect. This is why I'm focusing on what could affect the outgroup stat only and not looking so much at all D-stats between closely related populations.

Definitely there are some arguably phylogenically non-intuitive results

English Corded_Ware_LN Lithuanian Chimp, D: -0.0042, Z: -1.554, 353010
Belarusian Corded_Ware_LN Lithuanian Chimp, D: 0.0007, Z: 0.271, 353010
Ashkenazi_Jew Corded_Ware_LN Cypriot Chimp, D: -0.0056, Z: -2.012, 353010
Georgian Corded_Ware_LN Cypriot Chimp, D: 0.0006, 0.214, 353010

It just seems simpler to take each set of stats on a case by case basis. I'm not keen to go back to using the IBS results so much.

Shaikorth said...

Alberto said...


Thanks. Yes, each stat might have its own reasons, and it's probably hard to say exactly what is happening in each case. But definitely there are some cases where the results are quite unexpected and we have to be aware of those kind of cases (and having other methods, even based on D-stats too, to get the kind of information that we really want to know seems like a decent alternative).

The variation within Africans is considerably higher than elsewhere, so even if Jordanian and Palestinian African comes from same regions, it should be much more heterogenous than it would be if it came from North Caucasus or some other Eurasian region.

Afghan Pashtuns have less Onge affinity or more Dai affinity than Pathans, this is probably a confounding factor.

result: Gorilla Pashtun_Afghanistan Onge Dai 0.0119
result: Gorilla Kalash Onge Dai 0.0117
result: Gorilla Tajik_Ishkashim Onge Dai 0.0108
result: Gorilla Pathan Onge Dai 0.0090

In IBS tests of Kurd, Pathans were the closest relative of Pashtuns. Brahui was the closest relative of Makrani and Balochi, but they weren't closest to Brahui (some Kurds and South Caucasians were by a small margin).

Pathans and Pashtuns show a difference in their IBS relation to Iron Age Scythian too.

39 Pathan 66.47%
40 Greek 66.47%
41 Tajik_Afghan 66.47%
43 Sindhi 66.46%
44 Sicilian 66.46%
46 Turkish 66.45%
47 Brahui 66.42%
48 Armenian 66.42%
49 Balochi 66.42%
50 Iranian 66.41%
51 LBK_EN 66.40%
53 RISE_irAltai 66.38%
55 Druze 66.38%
56 Uzbek 66.37%
58 Pashtun_Afghan 66.35%

Same with Kotias, though then Pashtun are closer. In comparisons to Kostenki and Ust-Ishim their results are identical to each other but also to Armenians in the former case and to Ulchi in the latter case so not too much can be inferred from that.

Shaikorth: Lithuanians and Belarusians should mostly have a shared population history to the exclusion of English (this is especially visible with TreeMix), I think those results do make sense.

IIUC, the stat D(English, Corded_Ware_LN, Lithuanian, Chimp) D: -0.0042, Z: -1.554, 353010 should imply Corded Ware closer to Lithuanian than English are to Lithuanian though, which was what seemed strange to me since CW should have a more different balance of ancient ancestries than the Lithuanian to English difference?

(e.g. nMonte with the d-stats

Corded Ware Germany: Karelia HG_35.7, Anatolia_Neolithic 30.45, Caucasus_HG 22.8, WHG 10.25, Nganasan 0.8

Lithuania: Karelia_HG 24.7, Anatolia_Neolithic 38.15, Caucasus_HG 12.2, WHG 20.35, Nganasan 4.5

England Cornwall: Karelia_HG 22.25, Anatolia_Neolithic 47.45, Caucasus_HG 14.8, Western_HG 12.45, Nganasan 2.75

English_Kent: Karelia_HG 20.75, Anatolia_Neolithic 43.5, Caucasus_HG 12.5, Western_HG 14.55, BedouinB 5.8, Nganasan 2.5, Onge 0.4

Belarusia: Karelia_HG 25.45, Anatolia_Neolithic 38.55, Caucasus_HG 13.45, Western_HG 14.45, Nganasan 3.65, BedouinB 3.1, Onge 1, Dai 0.25)

On the other hand, qpAdm can easily fit Lithuanians as almost fully Corded Ware:

RISE_baCorded_Ware 0.974
EN_MN_LN_European 0.014
Nganasan 0.012

chisq 0.024 tail prob 0.987829

or something like 95% Sintashta 5% Nganasan

So maybe nMonte picks a combination that approximates that but just looks different for one reason or another?

I feel like the converse is with qpAdm that only using the outgroups as it does can have weak power to tell the difference between a population, and a combination of populations that have the same approx. relationship to an outgroup (like the models of Pathans and Kalash as 70-80% Belarusian or Sintashta, while something like the 40% Andronovo Sein has found above probably makes more sense as consistent with evidence).

And bear in mind, qpAdm with those populations does not have a lot of freedom with only Corded Ware, MN_European, Nganansan to vary the levels of CHG, WHG, EHG and Anatolia_Neolithic there, independently (Lithuanian seems closest in its level of "HG" overall to Corded Ware, rather than Germany_MN).

Few models for a few pops above with Corded Ware as an ancestor population in nMonte:

English_Cornwall: Corded_Ware_Germany 64.5, Anatolia_Neolithic 27.35, Western_HG 5.45, Nganasan 1.95 - distance% = 0.4131 %

Lithuanian: Corded_Ware_Germany 59.35, Anatolia_Neolithic 18.85
Western_HG 14.7, Nganasan 4, Karelia_HG 3.1 - distance% = 0.5814 %

Bell_Beaker_Germany: Corded_Ware_Germany 63.45, Anatolia_Neolithic 24.15, Western_HG 7.7, Karelia_HG 2.4, BedouinB 0.85, Masai_Kinyawa 0.55, Dai 0.4, Esan_Nigeria 0.2, Papuan 0.2, Onge 0.1 - distance% = 0.2715%

Or modeling with just Corded Ware, Iberia MN and non-West Eurasians (modern or ancient):

English Cornwall: Corded_Ware_Germany 67.15, Iberia_MN 30.2, Yakut 1.45, Masai_Kinyawa 0.5, Onge 0.45, Papuan 0.15, Ust_Ishim 0.1 - distance% = 0.7528 %

Lithuanian: Corded_Ware_Germany 76, Iberia_MN 21.1, Itelmen 2.9 - distance% = 1.126 %

Bell_Beaker_Germany: Corded_Ware_Germany 72.5, Iberia_MN 26.15, Masai_Kinyawa 1.05, Papuan 0.15, Esan_Nigeria 0.1, Onge 0.05 - distance% = 0.428 %

Hungary_BA and Corded_Ware_Germany:

English_Cornwall: Hungary_BA 59.35, Corded_Ware_Germany 38.45, Yakut 1.2, Onge 0.55, Masai_Kinyawa 0.45 - distance% = 0.8822 %

Bell_Beaker_Germany: Hungary_BA 53.95, Corded_Ware_Germany 44.85, Masai_Kinyawa 1.15, Onge 0.05 - distance% = 0.3628 %

Lithuanian: Corded_Ware_Germany 51.15, Hungary_BA 45.9, Itelmen 2.95 - distance% = 1.028 %

Might be true for South Asians, but in case of something like Lithuanians, Corded Ware was already born around the same zone as they much earlier. I don't think there was a large movement of EEF's or CHG's to the area post-Corded Ware period, and the pops there being primarily CW is feasible. So in that case qpAdm looks to provide the most plausible fit.

Matt said...

Primarily, sure. I don't think something like the 97% estimate exactly is likely, and something more like a 60% German Corded Ware, 40% unusually HG and farmer admixed culture sounds possible (perhaps archeology contradicts this though). The affinities to Yamnaya (EHG and CHG), EEF and WHG really seem too different between Corded Ware and Lithuanians for 97% or anything like it though (which is what directs the nMonte results to be what they are). If qpAdm is not measuring that directly, I'm not convinced it gets it right, but I think we can agree to disagree on that, and this could get kind of circular!

On the other hand, just getting back to the question of whether Lithuanian should be closer to Corded Ware or English as per the above D stats, I was surprised to find that when I put the D stat data used for nMonte into a PCA, the Corded Ware did actually tend to come out closer to the Lithuanian on dimensions 1, 2 and 3 which make up the lion's share of the variation - (only 1 and 2 shown). I'm not sure this was always the case in prior PCA run, but that would certainly agree with the D stat I posted above. Funnily enough, in this view, the Estonians actually are very close to the Corded Ware (and to Srubnaya). (And at the same time, the Ukrainians seem pretty much identical to Sintashta, the English to the German Bell Beakers, the Norwegians to Unetice...). Less true if you remove the Eskimo and Karitiana columns though, which have a good contribution to the above PCA.