search this blog

Monday, September 19, 2022

Dear Iosif...Yamnaya


Even though the Yamnaya culture probably originated in what is now Ukraine, the earliest Yamnaya samples currently available are from the modern-day Samara region of Russia. They mostly date to around 3,000 BCE. I can analyze their ancestry using Principal Component Analysis (PCA) data.

Target: RUS_Yamnaya_Samara
Distance: 3.2816% / 0.03281581
81.0 RUS_Progress_En
14.4 UKR_N
4.6 HUN_Vinca_MN
0.0 ARM_Aknashen_N
0.0 ARM_Masis_Blur_N
0.0 AZE_Caucasus_lowlands_LN
0.0 BGR_C
0.0 BGR_Dzhulyunitsa_N
0.0 IRN_Ganj_Dareh_N
0.0 IRN_Hajji_Firuz_C
0.0 IRN_Seh_Gabi_C
0.0 IRN_Tepe_Abdul_Hosein_N
0.0 IRN_Wezmeh_N
0.0 RUS_Darkveti-Meshoko_En
0.0 RUS_Maykop
0.0 RUS_Maykop_Late
0.0 RUS_Maykop_Novosvobodnaya

The above results show exactly zero ancestry from West Asia. Admittedly, both RUS_Progress_En and HUN_Vinca_MN are European ancients with significant West Asian-related ancestry. However, this ancestry is very distantly West Asian-related, and, for instance, it almost certainly has no relevance to the Indo-Anatolian homeland debate.

The Afanasievo culture of Central Asia is regarded to have been an early offshoot of the Yamnaya culture. A good number of Afanasievo samples are available, so let's have a look if their results match those of the Yamnaya folks. And indeed they do, since BGR_C is very similar to HUN_Vinca_MN.

Target: RUS_Afanasievo
Distance: 3.4055% / 0.03405499
84.0 RUS_Progress_En
11.4 UKR_N
4.6 BGR_C
0.0 ARM_Aknashen_N
0.0 ARM_Masis_Blur_N
0.0 AZE_Caucasus_lowlands_LN
0.0 BGR_Dzhulyunitsa_N
0.0 HUN_Vinca_MN
0.0 IRN_Ganj_Dareh_N
0.0 IRN_Hajji_Firuz_C
0.0 IRN_Seh_Gabi_C
0.0 IRN_Tepe_Abdul_Hosein_N
0.0 IRN_Wezmeh_N
0.0 RUS_Darkveti-Meshoko_En
0.0 RUS_Maykop
0.0 RUS_Maykop_Late
0.0 RUS_Maykop_Novosvobodnaya

To try this at home, stick the PCA data in the text file here into the relevant fields here and cranck up the "Cycles" to 4X. You should see exactly zero ancestry from West Asia every time.

I can, more or less, reproduce these results with tools that are routinely used in peer reviewed papers. Below is a table of mixture models produced with the qpAdm software. I set the pass threshold to P ≥0.05, which is an arbitrary value, but the pattern is clear. The full output from each qpAdm run is available here.


Importantly, qpAdm needs to be fed the relevant "right pop" outgroups to be able to discriminate accurately between reference populations.

right pops:
CMR_Shum_Laka_8000BP
MAR_Taforalt
Levant_Natufian
IRN_Ganj_Dareh_N
Levant_PPNB
TUR_Marmara_Barcin_N
HUN_Starcevo_N
HUN_Koros_N
SRB_Iron_Gates_HG
Iberia_Southeast_Meso
RUS_Karelia_HG
RUS_West_Siberia_HG
RUS_Boisman_MN
MNG_North_N
TWN_Hanben
BRA_LapaDoSanto_9600BP

So, for instance, if one were to use in this role the modern-day Mbuti people, as opposed to, say, the ancient hunter-gatherers of Shum Laka, one might find that many models look statistically better than they should. And then one might also find that the Yamnaya samples carry significant West Asian ancestry.

Actually, I'm not opposed to the idea of some West Asian ancestry in Yamnaya. Indeed, considering the extraordinary mobility of the Yamnaya people and their Eneolithic predecessors on the Pontic-Caspian steppe, it would be unusual if they didn't come into close contact and mix, to some degree, with their neighbors from West Asia.

However, based on everything I've seen, from uniparental markers to different types of autosomal genetic tests, it's clear to me that there's no substantial West Asian ancestry in any Yamnaya samples, except for an outlier female from modern-day Ozera, Ukraine (see here).

Admittedly, ancient DNA does have a habit of throwing curveballs, so I'm eagerly awaiting new Eneolithic samples from the Pontic-Caspian steppe, particularly those associated with the Yamnaya-like Sredni Stog culture, to help finally settle this issue.

Believe it or not, a contact recently sent me a supposedly unpublished female sample from a ~4,200 BCE Sredni Stog burial in modern-day Igren, east central Ukraine. So what the hell, let's assume for the time being that this sample is genuine. This is how Miss Sredni Stog behaves in my PCA mixture test.

Target: UKR_Sredni_Stog
Distance: 4.0769% / 0.04076877
75.6 RUS_Progress_En
17.8 UKR_N
6.6 HUN_Vinca_MN
0.0 ARM_Aknashen_N
0.0 ARM_Masis_Blur_N
0.0 AZE_Caucasus_lowlands_LN
0.0 BGR_C
0.0 BGR_Dzhulyunitsa_N
0.0 HUN_Vinca_MN
0.0 IRN_Ganj_Dareh_N
0.0 IRN_Hajji_Firuz_C
0.0 IRN_Seh_Gabi_C
0.0 IRN_Tepe_Abdul_Hosein_N
0.0 IRN_Wezmeh_N
0.0 RUS_Darkveti-Meshoko_En
0.0 RUS_Maykop
0.0 RUS_Maykop_Late
0.0 RUS_Maykop_Novosvobodnaya

Wow, just wow. Have we actually found Miss Proto-Yamnaya? What does qpAdm have to say in the matter?

UKR_Sredni_Stog
HUN_Vinca_MN 0.034±0.028
RUS_Progress_En 0.796±0.045
UKR_N 0.170±0.034
P-value 0.41088

Again, this is an excellent match with the results from my PCA test, especially if we take into account the standard errors. However, with qpAdm it's also possible to model this individual's ancestry as part West Asian.

UKR_Sredni_Stog
AZE_Caucasus_lowlands_LN 0.056±0.039
RUS_Progress_En 0.761±0.061
UKR_N 0.183±0.036
P-value 0.465667

As I pointed out above, it's plausible for such people to harbor some West Asian ancestry, but I'm very sceptical that this is really the case here, despite the rather solid qpAdm statistical fit. That's because UKR_Sredni_Stog is not a high quality sample, and, from my experience, qpAdm often has problems analyzing fine scale ancestry in singletons or even small groups that show excess DNA damage and/or offer much less than a million markers.

See also...

Dear Iosif, about that ~2%

But Iosif, what about the Phrygians?

Friday, September 9, 2022

Dear Iosif, about that ~2%


The debate over the location of the so called Indo-Anatolian homeland won't be decided by the persistence of any type of genetic ancestry in ancient Anatolia.

It'll be decided by a multidisciplinary study on the interactions between the ancient peoples of the North Pontic steppe, the eastern Balkans, and western Anatolia.

If such a study finds a pulse of steppe-related gene flow from the Balkans into Anatolia sometime during the early metal ages, it'll corroborate the linguistic hypothesis that a language ancestral to Hittite, Luwian and related tongues moved into Anatolia from Eastern Europe.

Why do we only need a pulse of gene flow, you might ask? Obviously, because:

- language and genetic ancestry can start with a strong association but, since they're not linked, they can eventually follow very different trajectories

- the dilution of genetic ancestry is an important factor, especially in ancient West Asia, and it must be taken into account in models of language spread, rather than ignored in favor of simple, elegant models that do not reflect reality.

Here's my favorite quote from the recent Lazaridis, Alpaslan-Roodenberg et al. paper, because, probably unbeknownst to the authors, it's exceptionally revealing about the spread of a wide range of Indo-European speakers into Anatolia.

However, in individuals from Gordion, a Central Anatolian city that was under the control of Hittites before becoming the Phrygian capital and then coming under the control of Persian and Hellenistic rulers, the proportion of Eastern hunter-gatherer ancestry is only ~2%, a tiny fraction for a region controlled by at least four different Indo-European–speaking groups.

Indeed, this is exactly what the Lazaridis, Alpaslan-Roodenberg et al. paper should've been about. That is, the authors should've given us a painstaking account of the spread of different ancient Indo-European speaking groups into Anatolia and explained how, overall, their DNA was rapidly diluted to a trace amount.

However, instead they treated us to a make-believe tale about a so called Indo-Anatolian homeland in what is now Armenia.

See also...

Dear Iosif...Yamnaya

But Iosif, what about the Phrygians?

Dear Iosif...

Dear Iosif #2

Dear Iosif #3

Sunday, September 4, 2022

But Iosif, what about the Phrygians?


A paper in Science authored by around 200 scientists from some of the world's top academic institutions surely must mean something, right? Not necessarily.

In this short blog post I'll try to explain, as simply as I can, why the Lazaridis, Alpaslan-Roodenberg et al. paper doesn't get us any closer to solving the riddle of the so called Indo-Anatolian homeland.

However, it must be said that the paper does include many interesting and valuable samples. I'll be using six of these samples, labeled TUR_C_Gordion_Anc, to argue my case.

The TUR_C_Gordion_Anc sample set is from Gordion, the capital of ancient Phrygia, and thus, in all likeliness, it represents Phrygian speakers.

Phrygian is an Indo-European language and the leading hypothesis is that it originated in the Balkans.

In terms of fine scale ancestry, TUR_C_Gordion_Anc can be reliably divided into two genetic clusters. In the Principal Component Analysis (PCA) below these clusters are labeled TUR_C_Gordion_Anc1 and TUR_C_Gordion_Anc2.

Note that TUR_C_Gordion_Anc1 is obviously pulling away from TUR_C_Gordion_Anc2 towards samples from the Balkans. I've used ancient samples from what is now North Macedonia, labeled MKD_Anc, to represent the Balkans. To see an interactive version of the plot, paste the PCA coordinates from here into the relevant field here.

Visually, this is not an especially dramatic outcome, but it's an incredible result nonetheless, because it shows that even a few ancient samples can help to solve an age old mystery.

Across many dimensions of genetic variation, the shift in the PCA from TUR_C_Gordion_Anc1 to TUR_C_Gordion_Anc2 represents about 20% admixture from the Balkans, and about 8% from the Eastern European steppe. That's plenty enough to corroborate the linguistic hypothesis that the Phrygians originated in the Balkans, and that some of their ancestors came from the steppe. The mixture models below were done with the tools here.

Target: TUR_C_Gordion_Anc1
Distance: 1.6634% / 0.01663373
40.6 Kura-Araxes_ARM_Kaps
22.2 Anatolia_Barcin_N
21.8 MKD_Anc
13.6 Levant_PPNB
1.4 IRN_Ganj_Dareh_N
0.4 Han

Target: TUR_C_Gordion_Anc1
Distance: 1.7109% / 0.01710904
40.2 Kura-Araxes_ARM_Kaps
37.8 Anatolia_Barcin_N
12.4 Levant_PPNB
8.0 Yamnaya_RUS_Samara
1.2 IRN_Ganj_Dareh_N
0.4 Han

Target: TUR_C_Gordion_Anc2
Distance: 2.0293% / 0.02029339
51.0 Kura-Araxes_ARM_Kaps
26.8 Anatolia_Tepecik_Ciftlik_N
17.6 Anatolia_Barcin_N
4.6 Levant_PPNB

Surprisingly, Lazaridis, Alpaslan-Roodenberg et al. didn't have much to say about this topic. This quote basically sums it up:

However, in individuals from Gordion, a Central Anatolian city that was under the control of Hittites before becoming the Phrygian capital and then coming under the control of Persian and Hellenistic rulers, the proportion of Eastern hunter-gatherer ancestry is only ~2%, a tiny fraction for a region controlled by at least four different Indo-European–speaking groups.

I have no doubt that Lazaridis, Alpaslan-Roodenberg et al. can run a very decent PCA, and then blow it up to a size big enough to show that the Gordion samples represent two genetically somewhat distinct groups. I'm also sure that, if they really try, they can locate significant levels of proximate and relevant European ancestry in some of these samples.

They don't have to use my methods; they can use any methods they like. My point is that they won't find much if they're just looking for genetic signals from the Upper Paleolithic or Mesolithic.

Now, considering the way that the Phrygian question was treated by Lazaridis, Alpaslan-Roodenberg et al., despite the fact that they managed to sequence a few likely Phrygian speakers from none other than the Phrygian capital, let's not pretend that their paper brought us any closer to understanding the genetic origins of Anatolian speakers or pinpointing their ancestral homeland.

In order to even try to solve these problems with ancient DNA, we need a wide range of samples from Hittite, Luwian and other key sites where Anatolian languages were spoken. And then we must analyze them properly.

I'm guessing that Lazaridis, Alpaslan-Roodenberg et al. went out of their way to get such samples, but for one reason or another they failed. If so, that's OK, but I have a feeling that even if they got them, they wouldn't know what to do with them, because at best these samples would only show ~2% Eastern hunter-gatherer ancestry. Haha.

For what it's worth, I believe that the ancient data in the Lazaridis, Alpaslan-Roodenberg et al. paper point to the North Pontic steppe as the Indo-Anatolian homeland, and I'll lay out my arguments in an upcoming blog post.

See also...

Dear Iosif...Yamnaya

Dear Iosif, about that ~2%

Dear Iosif...

Dear Iosif #2

Dear Iosif #3

Thursday, September 1, 2022

Dear Iosif #3


Back in 2016 I made this prediction about the origins of the Yamnaya people (Steppe_EMBA):

But here's my prediction: Steppe_EMBA only has 10-15% admixture from the post-Mesolithic Near East not including the North Caucasus, and basically all of this comes via female mediated gene flow from farming communities in the Caucasus and perhaps present-day Ukraine.

The relevant blog post is still here. Looking back, my analysis is a bit sloppy and I didn't articulate my ideas too well. But that was a pretty good prediction for its time, and I believe it still has a chance of being confirmed, more or less.

On the other hand, the widely publicized hypothesis that the Yamnaya population is a ~50/50 mixture between indigenous Eastern European hunter-gatherers and Near Eastern or West Asian migrants never looked right to me. So I'm glad that it's now dead and buried.

Those of you not up to date with this topic, all you need to know is that the Yamnaya genotype existed in Eastern Europe at least a thousand years before Yamnaya, and, moreover, the Yamnaya people are largely derived from Eastern European foragers already rich in Near Eastern-related ancestry. The relevant ancient genomes are on the way (for instance, see here).

Nevertheless, the narrative that waves of Near Eastern migrants moved into prehistoric Eastern Europe, leading to the emergence of the Yamnaya culture and even the Proto-Indo-European language, is still being pushed by some notable scientists working with ancient DNA.

My hope is that, considering the latest revelations about the genetic origins of the Yamnaya people, these scientists can embrace a more nuanced view. How about something like this?

- people moved around, and they were especially mobile on the Eastern European steppe from the Eneolithic onwards

- when they made contact they sometimes mixed, so there was admixture between far flung steppe groups

- since population densities on the steppe were low until the Yamnaya period, minor admixture that entered the steppe during the Neolithic and Eneolithic wasn't dilluted easily.

See also...

Dear Iosif #2

But Iosif, what about the Phrygians?

Dear Iosif, about that ~2%

Dear Iosif...Yamnaya