search this blog

Showing posts with label South Asia. Show all posts
Showing posts with label South Asia. Show all posts

Thursday, February 22, 2024

Berkeley, we have a problem


A new preprint at bioRxiv by Kerdoncuff et al. makes the following, somewhat surprising, claim:

One of the individuals, referred to Sarazm_EN_1 (I4290) described above that was discovered with shell bangles showing affiliation with South Asia, has significant amount AHG-related ancestry, while a model without AHG-related ancestry provides the best fit for Sarazm_EN_2 (I4210) (Table S4.5).

First of all, the authors are actually referring to sample ID I4910 not I4210.

The aforementioned table, based on qpAdm output, shows that I4290 has 15.9% AHG-related ancestry and basically no Anatolian farmer-related ancestry. It also shows that I4910 has no AHG-related ancestry but 17.9% Anatolian farmer-related ancestry.

AHG stands for Andaman hunter-gatherer. The authors are using it as a proxy for South Asian hunter-gatherer ancestry.

However, I've looked at I4290 and I4910 in great detail over the years using ADMIXTURE, Principal Component Analysis (PCA), and qpAdm. And I'm quite certain that they do not show any obvious, above noise level South Asian ancestry. Indeed, I'd say that if they do have some minor South Asian ancestry, then I4910 probably has more of it than I4290.

Kerdoncuff et al. used the following "right pops" or outgroups: Ethiopia_4500BP.SG, WEHG, EEHG, ESHG, Dai.DG, Russia_Ust_Ishim_HG.DG, Iran_Mesolithic_BeltCave and Israel_Natufian.

This means they mixed data that were generated in very different ways (DG, SG and capture) and included some poor quality samples. For instance, the highest coverage version of Iran_Mesolithic_BeltCave offers just ~50K SNPs.

Mixing different types of data and relying on low coverage samples, even in part, often has negative consequences when using qpAdm. So I suspect that the above mentioned mixture results for I4290 are skewed by a poor choice of outgroups.

When I run qpAdm I try to stick to one type of data and avoid low quality singletons in the outgroups. This is the best qpAdm model that I can find for Sarazm_EN:

right pops:
Cameroon_SMA
Morocco_Iberomaurusian
Israel_Natufian
Levant_N
Iran_GanjDareh_N
Turkey_N
Russia_Karelia_HG
Russia_WestSiberia_HG
Mongolia_North_N
Brazil_LapaDoSanto_9600BP

Sarazm_EN
Kazakhstan_Botai_Eneolithic 0.113±0.017
Turkmenistan_C_Geoksyur_subset 0.887±0.017
P-value 0.06392

Sarazm_EN_1 (I4290)
Kazakhstan_Botai_Eneolithic 0.129±0.021
Turkmenistan_C_Geoksyur_subset 0.871±0.021
P-value 0.11019

Sarazm_EN_2 (I4910)
Kazakhstan_Botai_Eneolithic 0.104±0.021
Turkmenistan_C_Geoksyur_subset 0.896±0.021
P-value 0.07427

Also...

Sarazm_EN
Andaman_hunter-gatherer -0.018±0.020
Kazakhstan_Botai_Eneolithic 0.123±0.019
Turkmenistan_C_Geoksyur_subset 0.895±0.020
P-value 0.0298403
(Infeasible model)

Please note that Turkmenistan_C_Geoksyur_subset is made up of just three relatively high quality individuals: I8504, I12483 and I12487. That's because it's not possible to model the ancestry of Sarazm_EN using the full Geoksyur set, probably due to subtle genetic substructures within the latter.

Below is a PCA plot that, more or less, reflects my qpAdm model. I4290 and I4910 are sitting right next to each other in a cluster of ancient Central and Western Asians, and it's actually I4910 that is shifted slightly towards the South Asian pole of the PCA. Indeed, I can confidently say that there's no way to design a PCA in which I4290 is shifted significantly towards South Asia relative to I4910.

Citation...

Kerdoncuff et al., 50,000 years of Evolutionary History of India: Insights from ∼2,700 Whole Genome Sequences, bioRxiv, posted February 20, 2024, doi: https://doi.org/10.1101/2024.02.15.580575

See also...

The Nalchik surprise

A comedy of errors

Tuesday, July 21, 2020

The oldest R1a to date


My popular map of the oldest instances of Y-haplogroup R1a in the ancient DNA record has a new entry: PES001 from the recent Saag et al. preprint. PES001 comes from a burial site in what is now northwestern Russia and is dated to a whopping 10785–10626 calBCE.


Indeed, I'm not aware of any R1a samples older than PES001 among the treasure trove of thousands of ancient samples waiting to be published. So it's likely that this individual will remain the oldest member of our R1a clan for some years to come.

See also...

Y-haplogroup R1a and mental health

Like three peas in a pod

The mystery of the Sintashta people

Wednesday, September 11, 2019

Y-haplogroup R1a and mental health


I've updated my map of pre-Corded Ware culture R1a samples with a couple of new entries from Central and South Asia (the original is still here). However, before any of you get overly excited, please note that these samples aren't older than the Corded Ware culture. The reason I added them to my map is to counter the ongoing absurd claims online that South Asian R1a isn't derived from European R1a.


Just in case the map can't be viewed in all of its glory in some devices, here's what the fine print says:

The oldest example of R1a in ancient DNA from Central Asia is dated to 2132-1940 calBCE (ID I3770, Narasimhan 2019). Moreover, this sequence is closely related to much older R1a samples from Central, Eastern and Northern Europe, and phylogenetically nested within their diversity. Thus, it must surely represent a population expansion from Europe to Central Asia. Indeed, it's also associated with the Bronze Age Andronovo archeological culture, which is usually seen as an offshoot of the Corded Ware culture (CWC) of Late Neolithic Europe. The vast majority of present-day R1a lineages in Central Asia are closely related to that of I3770, and so must also ultimately derive from Europe.

The oldest instance of R1a in ancient DNA from South Asia is dated to just 1044-922 calBCE (ID I12457, Narasimhan 2019). This sequence, as well as the vast majority of present-day South Asian R1a lineages, are closely related to much older R1a samples from Central, Eastern and Northern Europe, and phylogenetically nested within their diversity. Thus, they must surely represent a population expansion from Europe to South Asia via Central Asia, in all likelihood during the Bronze Age. Even if R1a existed in South Asia before the Bronze Age, which is extremely unlikely, because it's found in samples from indigenous European hunter-gatherers, the vast majority of present-day R1a lineages in South Asia must be ultimately from Europe.

The idea that most, if not all, South Asian R1a is derived from European R1a seriously scares a lot of people. This is obvious in many online discussions on the topic. I suspect they're so frightened by it because, in their minds, it has the potential to encourage discrimination and even racism, perhaps by re-defining the colonization of much of the world by European nations in the recent past as the natural order of things?

In any case, clearly we're dealing with some sort of mass phobia here. I've got advice for those of you suffering from this problem: if you're honestly worried that the geographic provenance and expansion history of some Y-haplogroup is going to negatively impact on your life in any meaningful way, then it's time to find yourself a quality mental health professional. All the best with that.

See also...

The mystery of the Sintashta people

The Poltavka outlier

Yamnaya isn't from Iran just like R1a isn't from India

Thursday, September 5, 2019

On the surprising genetic origins of the Harappan people (Shinde et al. 2019)


The long awaited paper with ancient DNA from the Indus Valley Civilization (IVC) site of Rakhigarhi has finally arrived. Courtesy of Shinde et al. at Current Biology:

An ancient Harappan genome lacks ancestry from Steppe pastoralists or Iranian farmers

The bad news is that the paper features just one low coverage IVC genome, and it belongs to a female, so there's no Y-haplogroup. However, importantly, this individual is very similar to genetic outliers from Bronze Age West and Central Asia known as Indus_Periphery. So much so, in fact, that they could easily be from the same gene pool.

This, of course, gives strong support to the idea that Indus_Periphery is a useful stand-in for the real IVC population (see here).

Surprisingly, despite being largely of West Eurasian origin, the IVC people possibly didn't harbor any ancestry from the Neolithic farmers of the Fertile Crescent or even the Iranian Plateau.

That's because, according to Shinde et al., their West Eurasian ancestors separated genetically from those of the early Holocene populations of what is now western and northern Iran around 12,000 BCE. In other words, well before the advent of agriculture.


This surely complicates matters for those arguing that Indo-European languages may have arrived in the Indian subcontinent with early farmers via the Iranian Plateau. The more widely accepted theory is that Indo-European languages spread into South Asia with Bronze Age pastoralists from the Eurasian steppes. See here...


Update 05/09/2019: I had a quick look at the ancient Rakhigarhi individual with qpAdm, just to confirm for myself that she was indeed largely of West Eurasian origin and practically indistinguishable from Indus_Periphery. The genotype data that I used are freely available here.

IND_Rakhigarhi_BA
IRN_Ganj_Dareh_N 0.711±0.065
Onge 0.232±0.067
RUS_Tyumen_HG 0.057±0.059
chisq 13.251
tail prob 0.0392147
Full output

Indus_Periphery
IRN_Ganj_Dareh_N 0.674±0.015
Onge 0.237±0.014
RUS_Tyumen_HG 0.090±0.012
chisq 14.877
tail prob 0.0212326
Full output

Indus_Periphery
IND_Rakhigarhi_BA 0.946±0.074
Onge 0.054±0.074
chisq 10.358
tail prob 0.169152
Full output

This does appear to be the case, although it's also obvious that my models are missing something important because their statistical fits are rather poor. I'm guessing the main problem is trying to use the Onge people of the Andaman Islands as a proxy for the indigenous foragers of the Indian subcontinent.

See also...

Y-haplogroup R1a and mental health

Friday, April 13, 2018

On the doorstep of India


One of the most remarkable discoveries in the recent Narasimhan et al. 2018 preprint has to be the presence of what are essentially Eastern European migrant populations within the Inner Asian Mountain Corridor (IAMC) during the Middle to Late Bronze Age (MLBA). Remarkable for so many reasons, but seemingly under-appreciated by a lot of people, judging by the online discussions that I've seen about the preprint, and even, I'd say, the authors themselves.

Narasimhan et al. labeled these groups as belonging to the "forest/steppe MLBA" complex (for instance, see the main figure from the preprint here). This is indeed what they are in terms of their genetic structure, but certainly not geography, because the IAMC is well south of the steppe. Thus, in my Principal Component Analysis (PCA) I'm going to label them as part of the "post-steppe herder expansion Turan" complex.

Strikingly, most of these people cluster with Bronze Age Eastern Europeans, and even some Bronze Age Central Europeans. They're also sitting very close to the more easterly present-day Slavic-speakers from Russia and Ukraine, and indeed closer to the bulk of the European cluster than some present-day Turkic and Uralic groups from the Volga-Ural region. Even I never predicted such an outcome. Sure, I was expecting to see ancient genomes from South Central Asia with some very heavy steppe influence, but not this. The relevant datasheet is available here.


Two of the MLBA IAMC individuals are from Kashkarchi in the Ferghana Valley, in what is now Uzbekistan, and basically on the doorstep of the Indian subcontinent. I've made special mention of them on the plot, and I've also highlighted a pair of individuals from the Bronze Age Central Asian sites of Gonur Tepe and Shahr-i Sokhta, who are, in all likelihood, unadmixed migrants from the Indus Valley (for more on that, see here).

It's surely not a coincidence that the ancient and present-day South Asians on the plot (including those from Pakistan's Swat Valley dated to the Iron Age) form an almost prefect cline between these two pairs of individuals. It's also surely not a coincidence that the MLBA IAMC groups are rich in Y-haplogroup R1a-M417, and in particular its R1a-Z93 subclade, which is today an especially frequent marker in Indo-European-speaking South Asians.

Forget about the pre-MLBA populations from the forests, steppe, or IAMC, like those represented by Dali_EBA; they're practically irrelevant to this story. How do I know? Because they have little to no impact on the above mentioned cline. And this can be easily verified with mixture models based on multiple Principal Components (PCs) and formal statistics (for instance, see here).


Clearly, many populations in South Asia, particularly those speaking Indo-European languages, derive the bulk of their steppe-related ancestry from the peoples of the MLBA IAMC, and/or their very close relatives. And if you do believe that this inference is just based on coincidences, then I'm sorry to say this, but obviously a new, much less mentally challenging, hobby or profession beckons. All the best with that.

Just to help put all of this in a geographic perspective, here's a topographical map of Eurasia. I've marked the location of the Ferghana Valley. The close relatives of Kashkarchi_BA most likely skirted their way around those winding high mountains and slipped into India via the Khyber Pass, which I've also marked on the map.


And the rest, as they say, is history, including the history described in the ancient Indo-Aryan Sanskrit texts known as the Vedas. I'm sure we'll soon be learning about these events in great detail when many more ancient samples from Pakistan and, hopefully, the first ancient samples from India, are published.

Citation...

Narasimhan et al, The Genomic Formation of South and Central Asia, Posted March 31, 2018, doi: https://doi.org/10.1101/292581

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Monday, January 11, 2016

The Poltavka outlier


Anyone who still thinks that Y-chromosome haplogroup R1a originated in South Asia should burn this map into their brains. It'll come in useful over the next few years as we learn from ancient DNA about the conquest of the Indian subcontinent, and indeed much of Asia, by pastoralists from the western Russian and Ukrainian steppes.


X marks the spot of the burial site of Poltavka sample I0432 from the Mathieson et al. 2015 dataset. This individual belongs to Y-chromosome haplogroup R1a-Z93(Z94+), which today accounts for well over 90% of the R1a lineages in Asia and peaks in frequency at over 60% in the northern parts of South Asia.

Moreover, the dating of his burial site, 2925-2536 calBCE, suggests that he lived not long after the Z93 and Z94 mutations came into existence. That's because Z93 doesn't appear to be much older than 5,000 years based on full Y-chromosome sequence data (see here and here, including the comments).

So I0432 could well turn out to be a crucial piece in the puzzle of the peopling of South Asia.

Interestingly, this individual was flagged as an outlier in the Poltavka sample set by Mathieson et al., hence his other moniker: the Poltavka outlier. However, this wasn't because of any ancestry from South or even Central Asia. In fact, it was because he was too western.

Principal Component Analyses (PCA) featuring a wide range of present-day and ancient samples from Europe and Asia, like the one below, show that Poltavka outlier clusters further west than most Corded Ware individuals from Germany. Right click and open in a new tab to view full size.


In the past, using qpAdm, I modeled Poltavka outlier as 63.7% Yamnaya Samara and 36.3% German Middle Neolithic. This is probably not very far from the truth, but qpAdm offers a supervised mixture test in which the results are heavily reliant on the choice of outgroups, so I thought I'd revisit the issue with TreeMix, which allows an unsupervised analysis.

In a dataset including seven relatively high coverage Copper Age (CA), Early Bronze Age and Middle Neolithic (MN) European genomes, TreeMix picked out Poltavka outlier as the most likely sample to be admixed, showing a mixture edge of 33% from the base of the branch leading to the Iberian MN individual to that of Poltavka outlier.



This outcome is very similar to my qpAdm model, but it suggests an even more western source of admixture in Poltavka outlier. Could this admixture actually be from Iberia? I wouldn't discount this possibility, considering the presence of Bell Beaker communities, possibly of Atlantic or even Iberian origin, as far east as present-day Poland. Indeed, according to Cassidy et al. 2015, German Beakers show high affinity to MN and CA Iberians (see page 51 in the supp info here).

I double checked my TreeMix result with D-stats, and yep, when placed in a clade with Poltavka or Samara Yamnaya, Poltavka outlier shows the strongest signal of admixture from the Iberia MN individual.

At the same time, however, the signal from the Early Neolithic (EN) Iberian fails to reach significance (Z=<3), which suggests that, in fact, TreeMix and D-stats might be seeing the Iberia MN sample as the most attractive mixture source due to her high level of Western European hunter-gatherer (WHG) ancestry, which Poltavka outlier also has plenty of, rather than anything specific to Iberia.



In any case, it's clear enough that Poltavka outlier was the result of mixture between Yamnaya-related western steppe pastoralists and the descendants of Middle Neolithic Europeans with a high ratio of WHG ancestry. Where this admixture actually took place and which archaeological cultures were involved will have to be resolved with further sampling of ancient remains from Central and Eastern Europe.

However, it's already impossible to place the origin of Poltavka outlier anywhere in Asia, which suggests that both Z93 and Z94 are also from well inside the generally accepted borders of Europe.

This obviously has implications for the origins of the Indo-Iranians, because the widespread presence of these mutations in Asia gels very nicely with the idea, and indeed academic consensus, that Indo-Iranian languages expanded rapidly from the Eurasian steppe into Asia during the Bronze Age.

Considering that Poltavka outlier came from a Kurgan burial, and was therefore an individual of some social standing, he might be the direct ancestor of many millions of present-day Asians. If so, this won't be very difficult to prove in the near future as ancient DNA research revs up a few notches.

On a related note, apparently there's a paper on the way with ancient DNA results from Rakhigarhi, a Harappan site in Haryana, northern India (see here). As far as I know, the results will include Y-chromosome haplogroups of three males, but I don't think we'll see any decent genome-wide data at this stage. However, hopefully I'm wrong and the paper will come out with full ancient genomes.

Feel free to post your predictions in the comments. I'm tentatively expecting a couple of instances of J2 and maybe an L or H. Razib made basically the same prediction recently so I'm not being original. What I do know is that we won't see any R1a-Z93. The only way that might happen is if, say, someone coughed or sneezed on the Harappan remains.

Data source and reference...

Mathieson et al., Genome-wide patterns of selection in 230 ancient Eurasians, Nature, 528, 499–503 (24 December 2015), doi:10.1038/nature16152

See also...

The beast among Y-haplogroups