search this blog

Thursday, February 22, 2024

Berkeley, we have a problem

A new preprint at bioRxiv by Kerdoncuff et al. makes the following, somewhat surprising, claim:

One of the individuals, referred to Sarazm_EN_1 (I4290) described above that was discovered with shell bangles showing affiliation with South Asia, has significant amount AHG-related ancestry, while a model without AHG-related ancestry provides the best fit for Sarazm_EN_2 (I4210) (Table S4.5).

First of all, the authors are actually referring to sample ID I4910 not I4210.

The aforementioned table, based on qpAdm output, shows that I4290 has 15.9% AHG-related ancestry and basically no Anatolian farmer-related ancestry. It also shows that I4910 has no AHG-related ancestry but 17.9% Anatolian farmer-related ancestry.

AHG stands for Andaman hunter-gatherer. The authors are using it as a proxy for South Asian hunter-gatherer ancestry.

However, I've looked at I4290 and I4910 in great detail over the years using ADMIXTURE, Principal Component Analysis (PCA), and qpAdm. And I'm quite certain that they do not show any obvious, above noise level South Asian ancestry. Indeed, I'd say that if they do have some minor South Asian ancestry, then I4910 probably has more of it than I4290.

Kerdoncuff et al. used the following "right pops" or outgroups: Ethiopia_4500BP.SG, WEHG, EEHG, ESHG, Dai.DG, Russia_Ust_Ishim_HG.DG, Iran_Mesolithic_BeltCave and Israel_Natufian.

This means they mixed data that were generated in very different ways (DG, SG and capture) and included some poor quality samples. For instance, the highest coverage version of Iran_Mesolithic_BeltCave offers just ~50K SNPs.

Mixing different types of data and relying on low coverage samples, even in part, often has negative consequences when using qpAdm. So I suspect that the above mentioned mixture results for I4290 are skewed by a poor choice of outgroups.

When I run qpAdm I try to stick to one type of data and avoid low quality singletons in the outgroups. This is the best qpAdm model that I can find for Sarazm_EN:

right pops:

Kazakhstan_Botai_Eneolithic 0.113±0.017
Turkmenistan_C_Geoksyur_subset 0.887±0.017
P-value 0.06392

Sarazm_EN_1 (I4290)
Kazakhstan_Botai_Eneolithic 0.129±0.021
Turkmenistan_C_Geoksyur_subset 0.871±0.021
P-value 0.11019

Sarazm_EN_2 (I4910)
Kazakhstan_Botai_Eneolithic 0.104±0.021
Turkmenistan_C_Geoksyur_subset 0.896±0.021
P-value 0.07427


Andaman_hunter-gatherer -0.018±0.020
Kazakhstan_Botai_Eneolithic 0.123±0.019
Turkmenistan_C_Geoksyur_subset 0.895±0.020
P-value 0.0298403
(Infeasible model)

Please note that Turkmenistan_C_Geoksyur_subset is made up of just three relatively high quality individuals: I8504, I12483 and I12487. That's because it's not possible to model the ancestry of Sarazm_EN using the full Geoksyur set, probably due to subtle genetic substructures within the latter.

Below is a PCA plot that, more or less, reflects my qpAdm model. I4290 and I4910 are sitting right next to each other in a cluster of ancient Central and Western Asians, and it's actually I4910 that is shifted slightly towards the South Asian pole of the PCA. Indeed, I can confidently say that there's no way to design a PCA in which I4290 is shifted significantly towards South Asia relative to I4910.


Kerdoncuff et al., 50,000 years of Evolutionary History of India: Insights from ∼2,700 Whole Genome Sequences, bioRxiv, posted February 20, 2024, doi:

See also...

The Nalchik surprise

A comedy of errors

Monday, February 12, 2024

The Nalchik surprise

If, like Iosif Lazaridis, you subscribe to the idea that the Yamnaya people carry early Anatolian farmer-related admixture that spread into Eastern Europe via the Caucasus, then I've got great news for you.

We now have a human sample from the Eneolithic site of Nalchik in the North Caucasus, labeled NL122, that packs well over a quarter of this type of ancestry (see here). Below is a quick G25/Vahaduo model to illustrate the point (please note that Turkey_N = early Anatolian farmers).

Target: Nalchik_Eneolithic:NL122
Distance: 2.1934% / 0.02193447
60.8 Russia_Steppe_Eneolithic
26.2 Turkey_N
13.0 Georgia_Kotias

On the other hand, if, again like Iosif Lazaridis, you subscribe to the idea that the Indo-European language spread into Eastern Europe via the Caucasus in association with this early Anatolian farmer-related admixture, then I've got terrible news for you.

That's because NL122 is apparently dated to a whopping 5197-4850 BCE (see here). This dating might be somewhat bloated, possibly due to what's known as the reservoir effect, because the Nalchik archeological site is generally carbon dated to 4840–4820 BCE.

However, even with the younger dating, this would still mean that early Anatolian farmer-related ancestry arrived in the North Caucasus, and thus in Eastern Europe, around 4,800 BCE at the latest. That's surprisingly early, and just too early to be relevant to any sort of Indo-European expansion from a necessarily even earlier Proto-Indo-Anatolian homeland somewhere south of the Caucasus.

This means that NL122 effectively debunks Iosif Lazaridis' Indo-Anatolian hypothesis. Unless, that is, Iosif can provide evidence for a more convoluted scenario, in which there are at least two early Anatolian farmer-related expansions into Eastern Europe via the Caucasus, and the expansion relevant to the arrival of Indo-European speech came well after 5,000 BCE.

I haven't done any detailed analyses of NL122 with formal stats and qpAdm. But my G25/Vahaduo runs suggest that it might be possible to model the ancestry of the Yamnaya people with around 10% admixture from a population similar to NL122.

Target: Russia_Samara_EBA_Yamnaya
Distance: 3.4123% / 0.03412328
72.6 Russia_Progress_Eneolithic
18.2 Ukraine_N
9.2 Nalchik_Eneolithic

However, I don't subscribe to the idea that the Yamnaya people carry early Anatolian farmer-related admixture that spread into Eastern Europe via the Caucasus (on top of what is already found in Progress Eneolithic). Based on basic logic and a wide range of my own analyses, I believe that they acquired this type of ancestry from early European farmers, probably associated with the Trypillia culture. For instance...

Target: Russia_Samara_EBA_Yamnaya
Distance: 3.2481% / 0.03248061
80.2 Russia_Progress_Eneolithic
13.6 Ukraine_Neolithic
6.2 Ukraine_VertebaCave_MLTrypillia
0.0 Nalchik_Eneolithic

Another way to show this is with a Principal Component Analysis (PCA) that highlights a Yamnaya cline made up of the Yamnaya, Steppe Eneolithic and Ukraine Neolithic samples. As you can see, dear reader, there's no special relationship between the Yamnaya cline and Nalchik_Eneolithic. The Yamnaya samples, which are sitting near the eastern end of the Yamnaya cline, instead seem to show a subtle shift towards the Trypillian farmers.

Indeed, I also don't exactly understand the recent infatuation among many academics, especially Iosif Lazaridis and his colleagues, with trying to put the Proto-Indo-Anatolian homeland somewhere south of the Caucasus. Considering all of the available multidisciplinary data, I'd say it still makes perfect sense to put it in the Sredny Stog culture of the North Pontic steppe, in what is now Ukraine.

Please note that all of the G25 coordinates used in my models and the PCA are available HERE.

See also...

The Caucasus is a semipermeable barrier to gene flow