Eurogenes Blog

Monday, April 30, 2018

Zoroastrian genetic origins revisited

About a year ago I found that the ancestry of present-day Iranians was best explained as largely a mixture between early Anatolian and Iranian farmers and Sarmatians from the Pontic-Caspian steppe (see here).

Things have now changed somewhat after the release of several hundred ancient samples from across Eurasia. Below are the best qpAdm models that I was able to find for various Iranian ethnic/regional populations based on my new dataset.

Iranian_Fars
Ganj_Dareh_N 0.363±0.031
Hajji_Firuz_ChL 0.481±0.029
Karagash_MLBA 0.156±0.019
taildiff: 0.753635
Full output

Iranian_Jew
Ganj_Dareh_N 0.056±0.042
Hajji_Firuz_ChL 0.883±0.039
Karagash_MLBA 0.061±0.027
taildiff: 0.862141
Full output

Iranian_Kerman
Ganj_Dareh_N 0.598±0.048
Hajji_Firuz_ChL 0.244±0.045
Karagash_MLBA 0.158±0.030
taildiff: 0.604908
Full output

Iranian_Lor
Dashti_Kozy_BA 0.143±0.025
Ganj_Dareh_N 0.286±0.034
Hajji_Firuz_ChL 0.571±0.029
taildiff: 0.994129
Full output

Iranian_Mazandarani
Ganj_Dareh_N 0.309±0.035
Hajji_Firuz_ChL 0.556±0.029
Yamnaya_Samara 0.134±0.019
taildiff: 0.383344
Full output

Iranian_Mazandarani(2)
Ganj_Dareh_N 0.279±0.045
Hajji_Firuz_ChL 0.600±0.048
Yamnaya_Samara 0.073±0.048
West_Siberia_N 0.048±0.033
taildiff: 0.413456
Full output

Iranian_Persian
Ganj_Dareh_N 0.417±0.033
Hajji_Firuz_ChL 0.464±0.031
Karagash_MLBA 0.120±0.020
taildiff: 0.777933
Full output

Iranian_Zoroastrian
Bustan_BA 0.352±0.053
Dashti_Kozy_BA 0.168±0.031
Hajji_Firuz_ChL 0.480±0.036
taildiff: 0.921955
Full output

However, all of the Iranian groups are still scoring a fair amount of ancient steppe ancestry, with the Zoroastrians ahead of the rest, which is potentially important, because they're basically a population relict from pre-Islamic Persia. Hence, this might be betraying their stronger ties to pre-Turkic, early Indo-Iranian Central Asia relative to the other Iranians. Also worth noting:

- As far as I can see, the Zoroastrians are the only Iranians in this analysis that really benefit from the addition of an Bactria Margiana Archaeological Complex (BMAC) reference population to their model, which might also be important, for the same reason outlined above

- There's no point modeling most of the Iranian groups as partly of Western Siberian forager (West_Siberia_N) origin, except perhaps the Mazandarani Iranians

- Indeed, Mazandarani Iranians are also the only group better modeled as part Yamnaya rather than Steppe_MLBA, which might be explained by Yamnaya-related incursions into what is now Northwestern Iran during the Early Bronze Age (see here)

- No matter what, I can't find a working model (taildiff >0.05) for the Bandari Iranians using the new set of right pops aka outgroups, probably because the Bandaris harbor recent admixture from outside of Iran, including from Africa

On a related note, there's yet another feature in the Indian media about the impending publication of ancient DNA from the Harappan burial site at Rakhigarhi (see here). I've lost count of how many articles like this I've read over the last few years. But unlike the rest, this one actually reveals some specific information about the results: no Y-haplogroup R1a and no steppe ancestry in the Harappan sample or samples. So this time, I'd say that we're only days or weeks away from the publication of the relevant paper.

My final prediction in this context is that we'll see an ancient genome, or, hopefully, genomes, basically identical to the Indus_Periphery samples from Narasimhan et al. 2018 (see here). And then, apart from a few crazy people still shouting online that we need many more Harappan genomes because almost anything is yet possible, it'll be game over.

See also...

The mystery of the Sintashta people

On the doorstep of India

Indian smoke and mirrors

Friday, April 27, 2018

The mystery of the Sintashta people

During the Middle to Late Bronze Age, the steppes southeast of the Ural Mountains, in what is now Russia, were home to communities of metallurgists who buried their warriors with horses and the earliest examples of the spoked-wheel battle chariot.

We don't know what they called themselves, because they didn't leave any written texts, but their archaeological culture is commonly known as Sintashta. It was named after a river near one of their main settlements; an elaborate fortified town that has also been described as an ancient metallurgical industrial center. Another of their well known settlements, very similar to Sintashta, is Arkaim, pictured below courtesy of Wikipedia.

Sintashta is arguably one of the coolest ancient cultures ever discovered by archaeologists. It's also generally accepted to be the Proto-Indo-Iranian culture, and thus linguistically ancestral to a myriad of present-day peoples of Asia, including Indo-Aryans and Persians. No wonder then, that its origin, and that of its population, have been hotly debated issues.

The leading hypothesis based on archaeological data is that Sintashta is largely derived from the more westerly and warlike Abashevo culture, which occupied much of the forest steppe north of the Black and Caspian Seas. In turn, Abashevo is usually described as an eastern offshoot of the Late Neolithic Corded Ware Culture (CWC), which is generally seen as the first Indo-European archaeological culture in Northern Europe (see here).

Below is a Principal Component Analysis (PCA) featuring 38 Sintashta individuals from the recent Narasimhan et al. 2018 preprint. Note that the main Sintashta cluster overlaps almost perfectly with the main CWC cluster. The relevant datasheet is available here.

Moreover, many ancient and present-day South and Central Asians, particularly those identified with or speaking Indo-Iranian languages, appear to be strongly attracted to the main Sintashta cluster, forming an almost perfect cline between this cluster and the likely Indus Valley diaspora individuals who show no evidence of steppe ancestry.

This is in line with mixture models based on formal statistics showing significant Sintashta-related ancestry in Indo-Iranian-speakers (for instance, see here), and high frequencies of Y-haplogroup R1a-Z93 in both the Sintashta and many Indo-Iranian-speaking populations.

Some of the Sintashta samples are outliers from the main Sintashta cluster, and that's because they harbor elevated levels of ancestry related to the Mesolithic and Neolithic foragers of Eastern Europe and/or Western Siberia. This is especially true of a pair of individuals who belong to Y-haplogroup Q. However, this doesn't contradict archaeological data, which suggest that the Sintashta community may have been multi-cultural and multi-lingual. Indeed, it's generally accepted based on historical linguistics data that there were fairly intense contacts in North Eurasia between the speakers of Proto-Indo-Iranian, Proto-Uralic and Yeniseian languages.

Thus, it appears that there's not much left to debate because ancient DNA has seemingly backed up the most widely accepted hypotheses about the origin of Sintashta and its people, and their identification mainly as Proto-Indo-Iranian-speakers.

However, a sample from a Sredny Stog II culture burial on the North Pontic steppe, in what is now eastern Ukraine, has complicated matters somewhat. This individual, known as Ukraine_Eneolithic I6561, not only clusters very strongly with the most typical Sintashta samples, but also belongs to Y-haplogroup R1a-Z93. On the other hand, none of the CWC remains sequenced to date belong to this particular subclade of R1a (although, obviously, they do belong to a host of near and far related R1a subclades).

I've never seen anyone worth reading propose that Sintashta might derive from Sredny Stog II instead of Abashevo. And no wonder, because Sredny Stog II was long gone when Sintashta appeared in the archaeological record.

But if CWC remains continue to fail to produce R1a-Z93, while, at the same time, the steppes of eastern Ukraine and surrounds are shown to be a hotbed of R1a-Z93 from the Sredny Stog to the Sintashta periods, which I think is possible, then ancient DNA might well force a serious re-examination of how the awesome Sintashta culture and people came to be.

See also...

Sunday, April 22, 2018

Likely Yamnaya incursion(s) into Northwestern Iran

Despite being stratigraphically dated to 5900-5500 BCE (ie. the Chalcolithic period), ancient sample Hajji_Firuz_ChL I2327 from Narasimhan et al. 2018, belongs to Y-haplogroup R1b-Z2103 and shows minor, but unambiguous, Yamnaya-related ancestry on the autosomes. Why is this a problem? Because both R1b-Z2103 and the Yamnaya culture are dated to the Bronze Age, and Yamnaya samples from Kalmykia and Samara regions of what is now western Russia are exceptionally rich in R1b-Z2103.

Thus, pending a successful radiocarbon (C14) dating analysis, it seems unlikely that Hajji_Firuz_ChL I2327 was alive during the Chalcolithic. Rather, it appears that he's partly of Yamnaya or closely related origin and has been wrongly dated. His remains are likely to be from a secondary burial from the Bronze Age that collapsed into the layer below, right into a Chalcolithic bin ossuary burial full of much older bones.

This scenario is strongly corroborated by data from two other ancient individuals from what is now Northwestern Iran:

- Hajji_Firuz_BA I4243 (also from Narasimhan et al. 2018 and from the same site as Hajji_Firuz_ChL I2327) was initially also stratigraphically dated to the Chalcolithic, but is now labeled as a Bronze Age sample after a radiocarbon (C14) analysis of the remains revealed a date of 2465-2286 calBCE. Moreover, this individual packs around 50% Yamnaya-related ancestry.

- Iran_IA F38 (from Broushaki et al. 2016) from an Iron Age burial at Tepe Hasanlu, which is just a few miles from Hajji Firuz, also belongs to Y-haplogroup R1b-Z2103 and harbors some sort of steppe ancestry on the autosomes (see here).

Below is a Principal Component Analysis (PCA) showing how this trio compare in terms of genome-wide ancestry to C14-dated Chalcolithic samples from Hajji Firuz and the nearby Seh Gabi. The relevant datasheet is available here.

Clearly, they're shifted "north" relative to the C14-dated Chalcolithic samples and thus closer to the ancient Eastern Europeans, suggesting that they carry ancestry from north of the Caucasus that was missing, or at least much less pronounced, in the region before the Bronze Age.

I used D-stats of the form D(Outgroup,Pop1)(Pop2,X), in which Pop1 are Eastern European Hunter-Gatherers (EHG) and Pop2 the C14-dated Chalcolithic samples, to test this more directly. And indeed, the D-stats showed that Hajji_Firuz_ChL I2327 and Hajji_Firuz_BA I4243 shared significantly more alleles with EHG than the Chalcolithic samples did (Z≥3).

Mbuti EHG Hajji_Firuz_ChL Hajji_Firuz_ChL_I2327 D 0.0151 Z 3.737
Mbuti EHG Hajji_Firuz_ChL Hajji_Firuz_BA_I4243 D 0.0496 Z 12.072

Mbuti EHG Seh_Gabi_ChL Hajji_Firuz_ChL_I2327 D 0.0188 Z 4.803
Mbuti EHG Seh_Gabi_ChL Hajji_Firuz_BA_I4243 D 0.0531 Z 12.832

To further elucidate this EHG-related signal, I ran a series of mixture models with the qpAdm software. Below are among the statistically most successful models that I could find for Hajji_Firuz_ChL I2327, Hajji_Firuz_BA I4243 and Hajji_Firuz_ChL. Note that each of these models features Yamnaya from Samara as the proxy for the northern, EHG-related ancestry.

Hajji_Firuz_ChL_I2327
Barcin_N 0.087±0.045
Seh_Gabi_ChL 0.768±0.045
Yamnaya_Samara 0.145±0.033
chisq 8.802
tail prob 0.551007
Full output

Hajji_Firuz_ChL
Barcin_N 0.232±0.027
Seh_Gabi_ChL 0.736±0.029
Yamnaya_Samara 0.033±0.018
chisq 4.269
tail prob 0.93439
Full output

Hajji_Firuz_BA_I4243
Barcin_N 0.201±0.038
Seh_Gabi_ChL 0.244±0.042
Yamnaya_Samara 0.554±0.031
chisq 9.209
tail prob 0.512363
Full output

Yamnaya_Samara-related admixture appears in all of the models. But considering the standard errors, the Yamnaya_Samara-related ancestry proportion for Hajji_Firuz_ChL is very close to zero. Moreover, importantly, Hajji_Firuz_ChL can be modeled successfully without any such ancestry, while Hajji_Firuz_ChL I2327 and Hajji_Firuz_BA I4243 can't (refer to the full output files for the details).

I also tested whether ancient samples from what is now Armenia might make better proxies than Yamnaya for the northern ancestry harbored by Hajji_Firuz_ChL I2327. That's a negative (see here and here).

I don't have a clue who these people were who brought Yamnaya or, at least, Yamnaya-like ancestry to the South Caspian region during the Bronze Age and perhaps also the Chalcolithic. It's rather unlikely that they were the early Iranians, who probably arrived in the region from Central Asia during the Late Bronze Age or even Iron Age (for instance, see here). Perhaps they were the Hittites? Indeed, in his book In Search of the Indo-Europeans, archaeologist James Mallory suggested that the ancestors of the Hittites and other Anatolian-speakers entered the Near East via the Caucasus route:

Most arguments for an Indo-European invasion from the northeast concern the appearance of a new burial rite at the end of the fourth and through the third millennium BC. At that time, both north of the Black Sea and the Caucasus, burials on the Russian-Ukrainian steppe were typically placed in an underground shaft and covered with a mound (kurgan in Russian). Before 3000 BC there begin to appear in the territory of the indigenous Transcaucasian (Kuro-Araxes) culture somewhat similar burials such as the royal tomb of Uch-Tepe on the Milska steppe. As tumulus burials are previously unknown in this region, some would explain their appearance by an intrusion of steppe pastoralists who migrated through the Caucasus and subjugated the local Early Bronze Age culture. More importantly, a status burial inserted into a mound at the site of Korucu Tepe in eastern Anatolia has been compared with somewhat similar burials both in the Caucasus and the Russian steppe. The discovery of horse bones on several sites of east Anatolia such as Norsun Tepe and Tepecik are seen to confirm a steppe intrusion since, as mentioned earlier, the horse, long known in the Ukraine and south Russia, is not attested in Anatolia prior to the Bronze Age.

Another option, however, is that they belonged to some other extinct Indo-European group, such as the Gutians (see here). In any case, keep an eye out for more Bronze Age samples from this part of the world. I have a strong feeling that, unlike their Neolithic and Chalcolithic predecessors, they will be rich in steppe ancestry and R1b-Z2103.

See also...

The Hajji Firuz fiasco

Yamnaya: home-grown

Big deal of 2018: Yamnaya not related to Maykop

Ahead of the pack

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, April 18, 2018

Protohistoric Swat Valley peoples in qpGraph

If I was to add one thing to the Narasimhan et al. 2018 preprint, it'd be a series of uncomplicated qpGraph trees that back up, very simply and directly, the main conclusions in the manuscript. Such as this:

If some of you think that it's possible to show pretty much anything in these sorts of graphs, then you're wrong. For instance, it's not possible to swap West_Siberia_N for Sintashta, because the highest Z score usually blows out from almost nothing to well over five. And it's not possible to push Sintashta-related ancestry into Dravidian-speakers from South India. But if you think it is, then, by all means, have a go. The graph file is here.

See also...

Protohistoric Swat Valley peoples in qpGraph #2

Friday, April 13, 2018

On the doorstep of India

One of the most remarkable discoveries in the recent Narasimhan et al. 2018 preprint has to be the presence of what are essentially Eastern European migrant populations within the Inner Asian Mountain Corridor (IAMC) during the Middle to Late Bronze Age (MLBA). Remarkable for so many reasons, but seemingly under-appreciated by a lot of people, judging by the online discussions that I've seen about the preprint, and even, I'd say, the authors themselves.

Narasimhan et al. labeled these groups as belonging to the "forest/steppe MLBA" complex (for instance, see the main figure from the preprint here). This is indeed what they are in terms of their genetic structure, but certainly not geography, because the IAMC is well south of the steppe. Thus, in my Principal Component Analysis (PCA) I'm going to label them as part of the "post-steppe herder expansion Turan" complex.

Strikingly, most of these people cluster with Bronze Age Eastern Europeans, and even some Bronze Age Central Europeans. They're also sitting very close to the more easterly present-day Slavic-speakers from Russia and Ukraine, and indeed closer to the bulk of the European cluster than some present-day Turkic and Uralic groups from the Volga-Ural region. Even I never predicted such an outcome. Sure, I was expecting to see ancient genomes from South Central Asia with some very heavy steppe influence, but not this. The relevant datasheet is available here.

Two of the MLBA IAMC individuals are from Kashkarchi in the Ferghana Valley, in what is now Uzbekistan, and basically on the doorstep of the Indian subcontinent. I've made special mention of them on the plot, and I've also highlighted a pair of individuals from the Bronze Age Central Asian sites of Gonur Tepe and Shahr-i Sokhta, who are, in all likelihood, unadmixed migrants from the Indus Valley (for more on that, see here).

It's surely not a coincidence that the ancient and present-day South Asians on the plot (including those from Pakistan's Swat Valley dated to the Iron Age) form an almost prefect cline between these two pairs of individuals. It's also surely not a coincidence that the MLBA IAMC groups are rich in Y-haplogroup R1a-M417, and in particular its R1a-Z93 subclade, which is today an especially frequent marker in Indo-European-speaking South Asians.

Forget about the pre-MLBA populations from the forests, steppe, or IAMC, like those represented by Dali_EBA; they're practically irrelevant to this story. How do I know? Because they have little to no impact on the above mentioned cline. And this can be easily verified with mixture models based on multiple Principal Components (PCs) and formal statistics (for instance, see here).

Clearly, many populations in South Asia, particularly those speaking Indo-European languages, derive the bulk of their steppe-related ancestry from the peoples of the MLBA IAMC, and/or their very close relatives. And if you do believe that this inference is just based on coincidences, then I'm sorry to say this, but obviously a new, much less mentally challenging, hobby or profession beckons. All the best with that.

Just to help put all of this in a geographic perspective, here's a topographical map of Eurasia. I've marked the location of the Ferghana Valley. The close relatives of Kashkarchi_BA most likely skirted their way around those winding high mountains and slipped into India via the Khyber Pass, which I've also marked on the map.

And the rest, as they say, is history, including the history described in the ancient Indo-Aryan Sanskrit texts known as the Vedas. I'm sure we'll soon be learning about these events in great detail when many more ancient samples from Pakistan and, hopefully, the first ancient samples from India, are published.

Citation...

Narasimhan et al, The Genomic Formation of South and Central Asia, Posted March 31, 2018, doi: https://doi.org/10.1101/292581

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, April 11, 2018

Bronze Age Central Asia: terra incognita no longer

I've updated my Global25 datasheets with the samples from the Narasimhan et al. 2018 preprint (look for these labels). Feel free to use this output for anything you like, and please show us the results in the comments below.

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient

Also, here's my Principal Component Analysis (PCA) of ancient West Eurasia featuring most of the new samples. Note the cline made up of ancient and present-day South Asians running from the likely Indus Valley diaspora individuals (from the Gonur Tepe and Shahr-i Sokhta archaeological sites, in present-day Turkmenistan and Iran, respectively) towards the Bronze Age steppe. The relevant datasheet is available here.

I have little doubt that these are indeed migrants from the Indus Valley Civilization (IVC). Their relatively unusual genetic structure - which includes ancestry from a West Eurasian ghost population that is inferred to have been exceedingly poor in Anatolian-related ancestry, as well as significant indigenous South Asian ancestry - leaves little scope for plausible alternatives. If you're wondering what they may have been doing so far north of the IVC, Frenez 2018 has a detailed discussion on the topic. From the paper:

An alternative and intriguing hypothesis is instead supported by significant archaeological and textual data from comparable socio-economic or geographical contexts, which suggest that the likely high commercial and ideological value of ivory and of the expertise required to carve it made also possible and economically profitable the presence in Central Asia of independent itinerant ivory carvers native to or trained in the Indus Valley. These itinerant artisans might have provided at the same time both the raw material and the unique skills to transform it into finished objects.

...

Moreover, the existence of itinerant ivory workers in ancient South Asia is also described in a few literary sources. The Guttila Jātaka mentions a group of ivory carvers who traveled from Benares to Ujjain to offer their products and skills to the local elites (Pal, 1978: 46), while a Buddhist Sanskrit Vinaya tells the story of an Indian master ivory carver who traveled “up to the land of the Yavanas”, most likely the Hellenistic Bactria, to put his superior expertise at the service of a renown local artist (Dwivedi, 1976: 19).

Citation: Frenez, D., Manufacturing and trade of Asian elephant ivory in Bronze Age Middle Asia. Evidence from Gonur Depe (Margiana, Turkmenistan), Archaeological Research in Asia (2017), http://dx.doi.org/10.1016/j.ara.2017.08.002

See also...

On the doorstep of India

Saturday, March 31, 2018

Andronovo pastoralists brought steppe ancestry to South Asia (Narasimhan et al. 2018 preprint)

Over at bioRxiv at this LINK. Note that the Andronovo samples that are shown to be the best fit for the steppe ancestry in South Asians are labeled Steppe_MLBA_East (ie. Middle to Late Bronze Age eastern steppe). Below is the abstract and a couple of key quotes from the paper and its supp info PDF. Emphasis is mine:

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

...

Third, between 3100-2200 BCE we observe an outlier at the BMAC site of Gonur, as well as two outliers from the eastern Iranian site of Shahr-i-Sokhta, all with an ancestry profile similar to 41 ancient individuals from northern Pakistan who lived approximately a millennium later in the isolated Swat region of the northern Indus Valley (1200-800 BCE). These individuals had between 14-42% of their ancestry related to the AASI and the rest related to early Iranian agriculturalists and West_Siberian_HG. Like contemporary and earlier samples from Iran/Turan we find no evidence of Steppe-pastoralist-related ancestry in these samples. In contrast to all other Iran/Turan samples, we find that these individuals also had negligible Anatolian agriculturalist-related admixture, suggesting that they might be migrants from a population further east along the cline of decreasing Anatolian agriculturalist ancestry. While we do not have access to any DNA directly sampled from the Indus Valley Civilization (IVC), based on (a) archaeological evidence of material culture exchange between the IVC and both BMAC to its north and Shahr-i-Sokhta to its east (27), (b) the similarity of these outlier individuals to post-IVC Swat Valley individuals described in the next section (27), (c) the presence of substantial AASI admixture in these samples suggesting that they are migrants from South Asia, and (d) the fact that these individuals fit as ancestral populations for present-day Indian groups in qpAdm modeling, we hypothesize that these outliers were recent migrants from the IVC. Without ancient DNA from individuals buried in IVC cultural contexts, we cannot rule out the possibility that the group represented by these outlier individuals, which we call Indus_Periphery, was limited to the northern fringe and not representative of the ancestry of the entire Indus Valley Civilization population. In fact, it was certainly the case that the peoples of the Indus Valley were genetically heterogeneous as we observe one of the Indus_Periphery individuals having ~42% AASI ancestry and the other two individuals having ~14-18% AASI ancestry (but always mixes of the same two proximal sources of AASI and Iranian agriculturalist-related ancestry). Nevertheless, these results show that Indus_Periphery were part of an important ancestry cline in the wider Indus region in the 3 rd millennium and early 2 nd millennium BCE. As we show in what follows, peoples related to this group had a pivotal role in the formation of subsequent populations in South Asia.

...

These results—leveraging our rich data from ancient samples closer in time to the Bronze Age—show that the group(s) that contributed Iranian agriculturalist-related ancestry to South Asia shared more genetic drift with the Iranian agriculturalist-related groups in our dataset that are temporally and geographically closest, compared to Caucasus HGs (CHG) or early Zagros related agriculturalists previously shown to be related to source populations for South Asians (11, 81). We are not only able to exclude these early farming and hunter-gathering groups, but also Copper and Bronze Age groups in western Iran (Seh_Gabi_C and Hajji_Firuz_C), and even in eastern Iran and Turan (Tepe_Hissar_C, Gioksiur_EN, and BMAC). Our detailed analyses in Text S3 indicate that what is driving the failure of these models is an excess of Anatolian agriculturalist-related ancestry in all of these groups, suggesting that the Iranian agriculturalist-related population that mixed into South Asia had less Anatolian agriculturalist-related ancestry than all of these. However, we find that mixtures using the Indus_Periphery sample (a pool of three outlier individuals from the BMAC site of Gonur and from Shahr-i-Sokhta), provides an excellent source population for the Iranian agriculturalist-related ancestry in South Asia when combined with any individuals in the Steppe_MLBA cluster (Srubnaya, Sintashta_MLBA, Steppe_MLBA_West or Steppe_MLBA_East).

Narasimhan et al, The Genomic Formation of South and Central Asia, Posted March 31, 2018, doi: https://doi.org/10.1101/292581

Update 12/04/2018: The dataset from the prerprint has been made available early at the Reich Lab website here. I've already started analyzing it. You can see the results in several new threads, for instance here, here and here.

Sunday, March 25, 2018

Central Asia as the PIE urheimat? Forget it

Right or wrong, the main contenders for the title of the Proto-Indo-European (PIE) homeland, or urheimat, are Eastern Europe, Anatolia and Transcaucasia, in that order. Central Asia, is, at best, one of the also-rans in this tussle, much like India and the Arctic Circle.

However, if you've been following the discussions on the topic in the comments at this blog over the last couple of years, you might be excused for thinking that Central Asia was in fact a natural choice for the PIE homeland, and thanks to new insights from ancient DNA, on the cusp of being proven to be the only choice.

Well, it's already been a very busy year for insights from ancient DNA, including in regards to Central Asia.

For instance, back in February a paper in Science by Gaunitz et al. revealed that the Botai people of Eneolithic Central Asia kept a breed of horse that was ancestral to the Przewalski's horse (see here). This is potentially a crucial fact in the PIE homeland debate, because the horse is the most important animal in early Indo-European religion. However, the Przewalski's horse is a significantly different clade of horse from the modern domestic horse. Thus, even if the Botai people were the first humans to domesticate the horse, then so what, because they didn't domesticate the right type of horse.

It remains to be seen who domesticated the right type of horse, and apparently there's a least one major ancient DNA paper on the way that will try to solve this problem. But we already know that the Middle Bronze Age Sintashta people - who lived on the border between Eastern Europe, Central Asia and Western Siberia - did keep the right type of horse, and it was also phylogenetically somewhat more basal, and thus ancestral, to most modern-day horse breeds.

Interestingly, by far the most basal horse genome within the domestic horse clade is Duk2, from an Early Bronze Age archaeological site near the city of Dunaujvaros in Hungary. But it's not certain who this horse belonged to exactly or where it really came from, because the site in question was probably a major trading post, where livestock and crops were exchanged for bronze articles. In other words, Duk2 may have been imported from somewhere nearby or afar. My bet is that it came from the Pontic-Caspian steppe. Let's wait and see.

Moreover, earlier this week the New York Times ran a feature on the work that David Reich and his colleagues at Broad MIT/Harvard are doing with ancient DNA. The article included an image of Reich standing in front of a whiteboard, and this whiteboard just happened to have on it a migration and mixture model based on ancient human DNA for Central Asia focusing on the period 2200-1500 BCE (scroll down the page here).

I've already analyzed this model in as much detail as I could in an earlier blog entry (see here). However, in the context of this blog entry, it's important to note that the model clearly shows major population movements from Europe and West Asia into Central Asia, rather than the other way around (ie. all of the really big arrows are pointing east). The paper with the final version of this model is apparently coming soon, and after it does come, we'll probably be having our last ever discussion here about Central Asia as a potential PIE homeland. I can't wait.

Update 01/04/2018: The preprint of the paper on ancient Central Asia that I mentioned above is now available at bioRxiv. See here.

Update 03/04/2019: In a surprising twist, Duk2 has significant ancestry from a native Iberian lineage that didn't contribute any direct ancestry to modern domesticates (see here).

See also...

Of horses and men

The mystery of the Sintashta people

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Thursday, March 22, 2018

Siberian ancestry and Y-haplogroup N1c spread across Northern Europe rather late in prehistory (Lamnidis et al. 2018 preprint)

A claim often made in popular culture is that the Saami people of Fennoscandia and Northern Russia are the last indigenous Europeans. I saw some guy blurt this out on a random cooking show the other day. But it's been obvious for a while now, thanks to analyses of modern-day DNA, that the Saami, and indeed almost all other Uralic-speaking groups in Europe, have a somewhat more complex population history than the majority of non-Uralic-speaking Europeans.

Now, ancient DNA is helping to cement these findings. The quotes and figure below are from a new preprint at bioRxiv by Lamnidis et al. [LINK] focusing on the spread of Siberian ancestry across Northeastern Europe from the late stone age onwards. It's a phenomenon that had the biggest impact on the Uralic-speaking populations of Fennoscandia, and is, in all likelihood, related in a profound, albeit complex, way to the ethnogenesis and expansion of the proto-Uralic people. Emphasis is mine:

The six ancient individuals from Bolshoy show substantially higher proportions of the Siberian component, which comprises about half of their ancestry (49.4-65.3 %), whereas the older Mesolithic individuals from Motala do not share this Siberian ancestry. The Siberian ancestry seen in EHG probably corresponds to a previously reported affinity towards Ancient North Eurasians (ANE) [2,24] , which also comprises part of the ancestry of Nganasans. Interestingly, results from uniparentally-inherited markers (mtDNA and Y chromosome) as well as certain phenotypic SNPs also show Siberian signals in Bolshoy: mtDNA haplogroups Z1, C4 and D4, common in modern Siberia 18,25,26 , in individuals BOO002, BOO004 and BOO006, respectively (confirming previous findings [18] ), as well as Y-chromosomal haplotype N1c1a1a (N-L392) in individuals BOO002 and BOO004. Haplogroup N1c, to which this haplotype belongs, is the major Y chromosomal lineage in modern North-East Europe and European Russia, especially in Uralic speakers, for example comprising as much as 54% of Eastern Finnish male lineages today [27]. Notably, this is the earliest known occurrence of Y-haplogroup N1c in Fennoscandia.

...

We formally tested for admixture in north-eastern Europe by calculating f3(Test;Siberian source, European source) using Uralic-speaking populations - Estonians, Saami, Finnish, Mordovians and Hungarians - and Russians as Test populations. Significantly negative f 3 values correspond to the Test population being admixed between populations related to the two source populations [34]. Additionally, the magnitude of the statistic is directly related to the ancestry composition of the tested source populations and how closely those ancestries are related to the actual source populations. We used multiple European and Siberian sources, to capture differences in ancestral composition among proxy populations. As proxies for the Siberian source we used Bolshoy, Mansi and Nganasan, and for the European source modern Icelandic, Norwegian, Lithuanian and French. Our results show that all of the test populations are indeed admixed, with the most negative values arising when Nganasan are used as the Siberian source (Supplementary Table 3).

...

Consistent with f3-statistics above, all the ancient individuals and modern Finns, Saami, Mordovians and Russians show excess allele sharing with Nganasan when used as Test populations. Of all Uralic speakers in Europe, Hungarians are the only population that shows no evidence of excess allele sharing with Nganasan, consistent with their distinct population history as evidenced by historical sources (see ref 35 and references therein).

...

While the Siberian genetic component described here was previously described in modern-day populations from the region [1,3,9,10], we gain further insights into its temporal depth. Our data suggest that this fourth genetic component found in modern-day north-eastern Europeans arrived in the area around 4,000 years ago at the latest, as illustrated by ALDER dating using the ancient genome-wide data from Bolshoy Oleni Ostrov. The upper bound for the introduction of this component is harder to estimate. The component is absent in the Karelian hunter-gatherers (EHG) [3] dated to 8,300-7,200 yBP as well as Mesolithic and Neolithic populations from the Baltics from 8,300 yBP and 7,100-5,000 yBP respectively [8]. While this suggests an upper bound of 5,000 yBP for the arrival of Siberian ancestry, we cannot exclude the possibility of its presence even earlier, yet restricted to more northern regions, as suggested by its absence in populations in the Baltic during the Bronze Age.

...

The large Siberian component in the Bolshoy individuals from the Kola Peninsula provides the earliest direct genetic evidence for an eastern migration into this region. Such contact is well documented in archaeology, with the introduction of asbestos-mixed Lovozero ceramics during the second millenium BC [47], and the spread of even-based arrowheads in Lapland from 1,900 BCE [48,49]. Additionally, the nearest counterparts of Vardøy ceramics, appearing in the area around 1,600-1,300 BCE, can be found on the Taymyr peninsula, much further to the east [48,49]. Finally, the Imiyakhtakhskaya culture from Yakutia spread to the Kola Peninsula during the same period [18,50]. Contacts between Siberia and Europe are also recognised in linguistics. The fact that the Siberian genetic component is consistently shared among Uralic-speaking populations, with the exceptions of Hungarians and the non-Uralic speaking Russians, would make it tempting to equate this component with the spread of Uralic languages in the area. However, such a model may be overly simplistic. First, the presence of the Siberian component on the Kola Peninsula at ca. 4000 yBP predates most linguistic estimates of the spread of Uralic languages to the area [51]. Second, as shown in our analyses, the admixture patterns found in historic and modern Uralic speakers are complex and in fact inconsistent with a single admixture event. Therefore, even if the Siberian genetic component partly spread alongside Uralic languages, it likely presented only an addition to populations carrying this component from earlier.

This generally looks like a very solid preprint, so I don't expect any major changes between now and formal publication. I have to be honest though, the qpAdm analysis looks like crap. Also, the authors are using the Russian sample set from the Human Origins dataset, which comes from the Kargopol district in Northern Russia. This was actually an Uralic-speaking region until not long ago. No wonder then, that they're inferring that Russians are very similar to Uralic-speaking populations.

But I know from my own analyses that there's quite a bit of genetic substructure within European Russia. For instance, Russians from southwest of Moscow are much less Uralic-like than the Kargopol Russians, and indeed very difficult to distinguish from other East Slavs, and even West Slavs. Hence, it might be useful to sample and run a couple more regional ethnic Russian groups for comparison. This might help to strengthen the argument that Siberian ancestry is somehow intimately intertwined with the expansion of Uralic languages in Europe.

Citation...

Lamnidis et al., Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe, bioRxiv, Posted March 22, 2018, doi: https://doi.org/10.1101/285437

See also...

The Uralic cline in the Global25

The whiteboard

David Reich's book, Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past, is coming out next Tuesday (see here). Chapter 6 has the potentially controversial title The Collision that Formed India, and indeed I know for a fact that Bronze Age steppe pastoralists, who seem to induce panic attacks amongst a lot of people, and especially Out-of-India proponents, get a big hat tip in this chapter.

But I can't really say more than that until after the book launch. So in the meantime, let's focus on this intriguing photo of a messy whiteboard that was published in the New York Times this week along with a feature on the Reich Lab's work with ancient DNA. The version below was edited by me to highlight and fill in a few details. The original can be viewed by scrolling down here.

Clearly, this is a mixture and migration model for Central Asia and surrounds covering the crucial period 2200-1500BCE, when, according to a consensus amongst historical linguists, waves of Indo-European speakers moved into the region from the steppes. It's probably from a jam session about an upcoming ancient DNA paper. Here's my interpretation of the model:

- nodes 1, 2, 3 and 4 track the migration of Bronze Age pastoralists from the Pontic-Caspian steppe deep into Central Asia, while nodes B and C follow the expansion of Neolithic farmers from east of Anatolia (probably from somewhere in present-day Iran) into Central and South Asia (nodes 1 and B aren't actually visible in the original pic, but must be there, and more or less where I marked them)

- node 2 probably represents the formation of late Corded Ware Culture (CWC) populations across Northern Europe around 2900 BCE, via the mixture of Yamnaya or Yamnaya-related steppe pastoralists (node 1) with European farmers, who were themselves a mixture of Anatolian farmers and Western European Hunter-Gatherers (WHG)

- Sintashta and Andronovo_NW at node 3 derive directly from the mixture event at node 2, so either they're offshoots of late CWC or a closely related population

- intriguingly, and perhaps crucially, nodes 2 and 3 only take one pulse of admixture from node 1 (red X), while the branch leading to Andronovo_SE at node 4 takes two such pulses, with one apparently later than 1900 BCE, possibly suggesting that Andronovo_SE was more Yamnaya-like compared to late CWC, Sintashta and Andronovo_NW

- moreover, the branch leading to Andronovo_SE absorbs significant admixture from Western Siberian Hunter-Gatherers (West_Siberian_HG) and possibly a Central Asian ghost population, no doubt resulting in a further reduction of Anatolian farmer and WHG ancestry ratios in Andronovo_SE compared to Sintashta and Andronovo_NW

- thus, Andronovo_SE, unlike Sintashta, might fit the bill statistically as enough Yamnaya-like to be the Yamnaya-related steppe pastoralists who "crashed" into India during the Bronze Age (see here), although, admittedly, this isn't actually shown on the whiteboard

- on the other hand, if, perhaps, the model includes a migration edge from node 1 to B, then this would suggest that Yamnaya-related ancestry arrived in South Asia with a very different population than Andronovo_SE, and possibly much earlier than 1500 BCE, but we don't know because David Reich is (strategically?) blocking that part of the whiteboard.

Also worth noting is that there's actually nothing about India in the model. The most proximate region that gets a mention is "Turan/Northern South Asia". So should we be concerned that the supposedly imminent publication of ancient DNA from Rakhigarhi and other Indian prehistoric sites has been pushed back indefinitely, perhaps for political reasons? Normally I'd say no, but in recent weeks I've been hearing rumors that this is indeed the case.

Update 01/04/2018: The preprint of the paper on ancient Central Asia that I mentioned above is now available at bioRxiv. See here.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

search this blog