A new preprint at bioRxiv by Kerdoncuff et al. makes the following, somewhat surprising, claim:
One of the individuals, referred to Sarazm_EN_1 (I4290) described above that was discovered with shell bangles showing affiliation with South Asia, has significant amount AHG-related ancestry, while a model without AHG-related ancestry provides the best fit for Sarazm_EN_2 (I4210) (Table S4.5).
First of all, the authors are actually referring to sample ID I4910 not I4210.
The aforementioned table, based on qpAdm output, shows that I4290 has 15.9% AHG-related ancestry and basically no Anatolian farmer-related ancestry. It also shows that I4910 has no AHG-related ancestry but 17.9% Anatolian farmer-related ancestry.
AHG stands for Andaman hunter-gatherer. The authors are using it as a proxy for South Asian hunter-gatherer ancestry.
However, I've looked at I4290 and I4910 in great detail over the years using ADMIXTURE, Principal Component Analysis (PCA), and qpAdm. And I'm quite certain that they do not show any obvious, above noise level South Asian ancestry. Indeed, I'd say that if they do have some minor South Asian ancestry, then I4910 probably has more of it than I4290.
Kerdoncuff et al. used the following "right pops" or outgroups: Ethiopia_4500BP.SG, WEHG, EEHG, ESHG, Dai.DG, Russia_Ust_Ishim_HG.DG, Iran_Mesolithic_BeltCave and Israel_Natufian.
This means they mixed data that were generated in very different ways (DG, SG and capture) and included some poor quality samples. For instance, the highest coverage version of Iran_Mesolithic_BeltCave offers just ~50K SNPs.
Mixing different types of data and relying on low coverage samples, even in part, often has negative consequences when using qpAdm. So I suspect that the above mentioned mixture results for I4290 are skewed by a poor choice of outgroups.
When I run qpAdm I try to stick to one type of data and avoid low quality singletons in the outgroups. This is the best qpAdm model that I can find for Sarazm_EN:
right pops:
Cameroon_SMA
Morocco_Iberomaurusian
Israel_Natufian
Levant_N
Iran_GanjDareh_N
Turkey_N
Russia_Karelia_HG
Russia_WestSiberia_HG
Mongolia_North_N
Brazil_LapaDoSanto_9600BP
Sarazm_EN
Kazakhstan_Botai_Eneolithic 0.113±0.017
Turkmenistan_C_Geoksyur_subset 0.887±0.017
P-value 0.06392
Sarazm_EN_1 (I4290)
Kazakhstan_Botai_Eneolithic 0.129±0.021
Turkmenistan_C_Geoksyur_subset 0.871±0.021
P-value 0.11019
Sarazm_EN_2 (I4910)
Kazakhstan_Botai_Eneolithic 0.104±0.021
Turkmenistan_C_Geoksyur_subset 0.896±0.021
P-value 0.07427
Also...
Sarazm_EN
Andaman_hunter-gatherer -0.018±0.020
Kazakhstan_Botai_Eneolithic 0.123±0.019
Turkmenistan_C_Geoksyur_subset 0.895±0.020
P-value 0.0298403
(Infeasible model)
Please note that Turkmenistan_C_Geoksyur_subset is made up of just three relatively high quality individuals: I8504, I12483 and I12487. That's because it's not possible to model the ancestry of Sarazm_EN using the full Geoksyur set, probably due to subtle genetic substructures within the latter.
Below is a PCA plot that, more or less, reflects my qpAdm model. I4290 and I4910 are sitting right next to each other in a cluster of ancient Central and Western Asians, and it's actually I4910 that is shifted slightly towards the South Asian pole of the PCA. Indeed, I can confidently say that there's no way to design a PCA in which I4290 is shifted significantly towards South Asia relative to I4910.
Citation...
Kerdoncuff et al., 50,000 years of Evolutionary History of India: Insights from ∼2,700 Whole Genome Sequences, bioRxiv, posted February 20, 2024, doi: https://doi.org/10.1101/2024.02.15.580575
See also...
The Nalchik surprise
A comedy of errors