search this blog

Wednesday, December 20, 2023

Dear Harald #2


The ancIBD method paper from the David Reich Lab was just published in Nature (open access here). It's a very useful effort, but the authors are still somewhat confused about the origin of the Corded Ware culture (CWC) population. From the paper (emphasis is mine):

This direct evidence that most Corded Ware ancestry must have genealogical links to people associated with Yamnaya culture spanning on the order of at most a few hundred years is inconsistent with the hypothesis that the Steppe-like ancestry in the Corded Ware primarily reflects an origin in as-of-now unsampled cultures genetically similar to the Yamnaya but related to them only a millennium earlier.

This is basically a straw man argument, because it's easy to debunk. So why put it in the paper? Well, as far as I can see, to make the idea that the CWC is derived from Yamnaya look more plausible.

This idea, that the CWC is an offshoot of Yamnaya, seems to be the favorite explanation for the appearance of the CWC among the scientists at the David Reich Lab.

However, I'd say they're facing a major problem with that, because the CWC and Yamnaya populations have largely different paternal origins. That is, CWC males mostly belong to Y-haplogroups R1a-M417 and R1b-L51, while Yamnaya males almost exclusively belong to Y-haplogroup R1b-Z2103.

Indeed, as far as I know, there are no reliable instances of R1a-M417 or R1b-L51 in any published or yet to be published Yamnaya samples.

But it is possible to reconcile the Y-haplogroup data with the ancIBD results if we assume that the peoples associated with the Corded Ware, Yamnaya and also Afanasievo cultures expanded from a genetically more diverse ancestral gene pool, each taking a specific subset of the variation with them.

This gene pool would've existed somewhere in Eastern Europe, probably at the western end of the Pontic-Caspian steppe, at most a few hundred years before the appearance of the earliest CWC burials in what is now Poland.

Moreover, the split between the CWC and Yamnaya populations need not have been a clean one, with long-range contacts and largely female-mediated mixing maintained for generations, adding to the already close genealogical links between them.

Citation...

Ringbauer, H., Huang, Y., Akbari, A. et al. Accurate detection of identity-by-descent segments in human ancient DNA. Nat Genet (2023). https://doi.org/10.1038/s41588-023-01582-w

See also...

On the origin of the Corded Ware people

Sunday, November 19, 2023

Musaeum Scythia on the Seima-Turbino Phenomenon


A few weeks ago bioRxiv published two preprints on the Seima-Turbino Phenomenon (see here and here).

I can't say much about these manuscripts until I see the relevant ancient DNA samples, and that might take some time.

However, for now, I will say that both preprints really need to emphasize the profound impact that the Sintashta-related early Indo-Iranian speakers had on the Seima-Turbino Phenomenon. This, of course, would require Wolfgang Haak and friends to pull their heads out of their behinds and admit that the proto-Indo-Iranian homeland was in Eastern Europe, not in Iran.

At the same time, it's likely that the Seima-Turbino Phenomenon originated deep in Siberia, and its inception was probably most closely associated with the West Siberian Hunter-Gatherer (WSHG) genetic component. It's important that the preprints emphasize this too.

Moreover, I can't see any convincing arguments in either preprint that the Seima-Turbino Phenomenon was mainly associated with proto-Uralic speakers, or even that it was an important vector for the spread of proto-Uralic. So there's not much point in forcing the Uralic angle on studies focused on the Seima-Turbino Phenomenon. Indeed, what we also need is an archaeogenetics paper dealing specifically with the proto-Uralic expansion.

Apart from that, I'd like to direct your attention to the fact that Musaeum Scythia has already written a fine blog post about these preprints:

Genomic insights into the Seima-Turbino Phenomenon

See also...

Finally, a proto-Uralic genome

The Uralic cline with kra001 - no projection this time

Slavs have little, if any, Scytho-Sarmatian ancestry

Friday, November 10, 2023

Wielbark Goths were overwhelmingly of Scandinavian origin


When used properly, Principal Component Analysis (PCA) is an extraordinarily powerful tool and one of the best ways to study fine-scale genetic substructures within Europe.

The PCA plot below is based on Global25 data and focuses on the genetic relationship between Wielbark Goths and Medieval Poles, including from the Viking Age, in the context of present-day European genetic variation.


I'd say that it's a wonderfully self-explanatory plot, but here are some key observations:

- the Wielbark Goths (Poland_Wielbark_IA) and Medieval Poles (Poland_Middle_Ages) are two distinct populations

- moreover, the Wielbark Goths form a relatively compact Scandinavian-related cluster and must surely represent a homogenous population overwhelmingly of Scandinavian origin

- on the other hand, the Medieval Poles form a more extensive and heterogeneous cluster that overlaps with present-day groups all the way from Central Europe to the East Baltic, and that's because they are likely to be in large part of mixed origin

- I know for a fact that at least some of these early Poles harbor recent admixture, because their burials are similar to those of Vikings and their haplotypes have been shown to be partly of Scandinavian origin (see here)

- one of the Wielbark females is an obvious genetic outlier (Poland_Wielbark_IA_outlier), and basically looks like a first generation mixture between a Goth and a Balt.

Please note that the PCA is only based on relatively high quality genomes, so as not to confuse the picture with spurious results and noise. Also, all outliers with potentially significant ancestry from outside of Central, Eastern and Northern Europe were removed from the analysis. The relevant datasheet is available here.

However, sanity checks are always important when studying complex topics like fine-scale genetic ancestry. To that end I've prepared a graph based on f3-statistics of the form f3(X,Cameroon_SMA,Estonia_BA)/(X,Cameroon_SMA,Ireland_Megalithic), that reproduces the key features of my PCA. The relevant datasheet is available here.

Polish groups from the Middle Ages are marked with the MA suffix, while the Iron Age Wielbark Goths are marked with the IA suffix.

If you're wondering why I plotted the f3-statistics that I did, take a look at this (all groups largely of Scandinavian origin are emboldened):

f3(X,Estonia_BA,Cameroon_SMA)
Poland_Legowo_MA 0.226406
Poland_Ostrow_Lednicki_MA 0.225996
Poland_Plonsk_MA 0.225017
Poland_Trzciniec_Culture 0.224215
Poland_Lad_MA 0.224142
Poland_Viking 0.223838
Poland_Niemcza_MA 0.223659
Poland_Weklice_IA 0.223549
Poland_Kowalewko_IA 0.222584
Poland_Pruszcz_Gdanski_IA 0.222324
Sweden_Viking 0.222091
Russia_Viking 0.222042
Poland_Maslomecz_IA 0.221914
Norway_Viking 0.221825
Denmark_EarlyViking 0.221257
Denmark_Viking 0.221174
England_Viking 0.220979

f3(X,Ireland_Megalithic,Cameroon_SMA)
Poland_Maslomecz_IA 0.219816
Poland_Weklice_IA 0.219501
Denmark_Viking 0.2192
Poland_Kowalewko_IA 0.219176
Poland_Ostrow_Lednicki_MA 0.218916
Norway_Viking 0.218854
Poland_Pruszcz_Gdanski_IA 0.218684
Sweden_Viking 0.218626
Denmark_EarlyViking 0.218529
England_Viking 0.218308
Russia_Viking 0.217999
Poland_Viking 0.217914
Poland_Plonsk_MA 0.217756
Poland_Lad_MA 0.217719
Poland_Legowo_MA 0.21765
Poland_Niemcza_MA 0.217001
Poland_Trzciniec_Culture 0.216551

Interestingly, the Middle Bronze Age samples associated with the Trzciniec Culture (Poland_Trzciniec_Culture) show a closer genetic relationship to Medieval Poles than to Wielbark Goths or Northwestern Europeans. This is indeed the case both in terms of genome-wide and uniparental markers, including some very specific lineages under Y-chromosome haplogroup R1a.

But that's a much more complex issue that I'll leave for another time. So please stay tuned.

See also...

Slavs have little, if any, Scytho-Sarmatian ancestry

Saturday, November 4, 2023

Slavs have little, if any, Scytho-Sarmatian ancestry


Here's an abstract of a new study from the David Reich Lab about ancient Slavs, titled "Genetic identification of Slavs in Migration Period Europe using an IBD sharing graph". Emphasis is mine:

Popular methods of genetic analysis relying on allele frequencies such as PCA, ADMIXTURE and qpAdm are not suitable for distinguishing many populations that were important historical actors in the Migration Period Europe. For instance, differentiating Slavic, Germanic, and Celtic people is very difficult relying on these methods, but very helpful for archaeologists given a large proportion of graves with no inventory and frequent adoption of a different culture. To overcome these problems, we applied a method based on autosomal haplotypes. Imputation of missing genotypes and phasing was performed according to a protocol by Rubinacci et al. (2021), and IBD inference was done for ancient Eurasian individuals with data available at >600,000 1240K sites. IBD links for a subset of these individuals were represented as a graph, visualized with a force-directed layout algorithm, and clusters in this graph are inferred with the Leiden algorithm. One of the clusters in the IBD graph emerged that includes nearly all individuals in the dataset annotated archaeologically as “Slavic”. According to PCA a hypothesis for the origin of this population can be proposed: it was formed by admixture of a Baltic-related group with East Germanic people and Sarmatians or Scythians. The individuals belonging to the “Slavic” IBD sharing cluster form a chronological gradient on the PCA plot, with the earliest samples close to the Baltic LBA/EIA group. Later “Slavic” individuals are shifted to the right, closer to Central and Southern Europeans and probably reflecting further admixture of Slavs with local populations during the Migration Period.

Apparently this abstract is causing a bit of confusion online because of the mention of possible Sarmatian or Scythian ancestry in Slavs.

However, it's important to understand that the authors are referring to certain Slavic or even just Slavic-related individuals, usually from culturally heterogeneous frontier settlements deep in what is now Russia.

So yes, it's possible that some of these individuals carry Sarmatian, Scythian or other exotic eastern ancestry. But even if this is true, then obviously we can't extend this inference to all ancient and modern-day Slavs.

Indeed, below is a G25/Vahaduo Principal Component Analysis (PCA) that shows why modern-day Slavic speakers can't be linked genetically to Sarmatians or Scythians. To experience a more detailed version of the PCA paste the data here into the relevant field here.

As you can see, dear reader, most of the Slavs (Belarusians, Poles, Ukrainians and many Russians) cluster with the Irish near the western end of the plot.

Some Russians are shifted significantly east of them along the "Uralic cline" and, as a result, they cluster with various Uralic speakers such as Mordovians. That's because when Slavs migrated deep into what is now northern Russia they mixed with Uralic speakers who were there before them.

Most of the Sarmatians and Scythians form a cluster southeast of the Slavs and Irish because they carry significant levels of East Asian ancestry. This type of eastern ancestry is basically missing in modern-day Slavs (see here).

Several of the Scythians cluster among the Slavs and Irish, but that's because they're genetic outliers, whose existence, if anything, suggests that some Scythians had significant Slavic-related and/or Irish-related ancestry.

Now, even though most of the Slavs do cluster with the Irish in the above PCA plot, I strongly disagree with the authors of the abstract when they claim that "differentiating Slavic, Germanic, and Celtic people is very difficult" with PCA. It's actually pretty damn easy and I've been doing it successfully for many years. For instance, see here.

See also...

Wielbark Goths were overwhelmingly of Scandinavian origin

The Caucasus is a semipermeable barrier to gene flow

Friday, September 22, 2023

The Caucasus is a semipermeable barrier to gene flow


The scientists at the David Reich Lab are a clever bunch. But they're not always on top of things. And this can be a problem.

For instance, they fail to understand that the Caucasus has effectively stymied human gene flow between Eastern Europe and West Asia through the ages. That is, the Caucasus is a semipermeable barrier to human gene flow.


Until they accept and understand this fact, they won't be able to accurately characterize the ancestry of the ancient human populations of the Pontic-Caspian (PC) steppe, including the Yamnaya people.

In turn, they also won't be able to locate the Indo-Anatolian homeland.

Now, the Caucasus isn't a barrier to gene flow because it's difficult to cross, and, indeed, many human populations have managed to cross it since the Upper Palaeolithic. As a result, the peoples of the North Caucasus are today genetically more similar to the populations of the Near East than Europe.

In fact, the clear genetic gap between most West Asian and Eastern European populations through the ages is actually caused by the extreme differentiation between the mountain ecology of the Caucasus and the steppe ecology of the PC steppe.

That is, the Caucasus is ecologically so different from the PC steppe that it has been practically impossible for human populations to make the transition from one to the other.

Indeed, it's important to understand that there's no reliable record of any prehistoric human population successfully making the transition from the mountain ecology to the steppe ecology in this part of the world.

In other words, contrary to claims by people like David Reich and David Anthony, there's no solid evidence of any significant prehistoric human migration from the Caucasus, or from south of the Caucasus, to Eastern Europe by hunter-gatherers, farmers or pastoralists.


But, you might ask, how on earth did the Yamnaya people get their significant Caucasus/Iranian-related admixture if not via a mass migration from the Caucasus and/or the Iranian Plateau, as is often argued by the above mentioned scholars and their many colleagues?

Well, obviously, the diffusion of alleles from one population to another can happen without migration. All that is needed is a contact zone between them.

The ancient DNA and archeological data currently available from the Caucasus and the PC steppe suggest to me that there was at least one such contact zone in this area bringing together the peoples of the mountain and steppe ecoregions. This allowed them to mix, probably gradually and over a long period of time, by and large without leaving their ecoregions.

Once the Caucasus alleles entered the steppe, they were spread around by local hunter-gatherers and pastoralists who were highly mobile and well adapted to the steppe ecology.


Someone should write a paper about this.

See also...

The Nalchik surprise

Understanding the Eneolithic steppe

Matters of geography

Thursday, August 31, 2023

The story of the Khvalynsk people


I'm totally serious when I say that this video is more objective, informative and accurate than any peer-reviewed paper published to date when it comes to the genetic origins of the Khvalynsk people.


However, that's not to say it's perfect. I think it misses some important details. See here...

The Caucasus is a semipermeable barrier to gene flow

Dear David, Nick, Iosif...let's set the record straight

Understanding the Eneolithic steppe

Saturday, August 12, 2023

Frustrated comedians


I've now had the chance to read and digest the following two papers in Science about the origin of Indo-European languages:

Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages, Heggarty et al.

The genetic history of the Southern Arc: A bridge between West Asia and Europe, Lazaridis, Alpaslan-Roodenberg et al.

The Heggarty et al. paper is pure fluff. It offers nothing useful or even remotely interesting.

For instance, the authors derive some Indo-European languages in Europe from Anatolian farmers and others from Caucasus hunter-gatherers (see here). This is not just exceedingly far fetched, but also obviously forced.

Wolfgang Haak and Johannes Krause, you should be deeply ashamed of yourselves.

I've already commented extensively about the Lazaridis, Alpaslan-Roodenberg et al. paper (for example, see here). But the one thing I need to add is that this paper is what it is due to the inherent bias of some of the lead authors to push the Indo-Anatolian homeland into West Asia. I won't even bother mentioning their names, because we all know who they are.

See also...

Crazy stuff

Dear David, Nick, Iosif...let's set the record straight

The story of R-V1636

Sunday, July 23, 2023

Dear Sandra, Wolfgang...a problem


In their recent paper, titled Early contact between late farming and pastoralist societies in southeastern Europe, Penske et al. make the following claim:

By contrast, Yamnaya Caucasus individuals from the southern steppe can be modelled as a two-way model of around 76% Steppe Eneolithic and 26% Caucasus Eneolithic/Maykop, confirming the findings of Lazaridis and colleagues 47. This two-way mix (40% + 60%, respectively) also provides a well-fit model (P = 0.09) for the Ozera outlier individual, consistent with the position in PCA and corroborating an influence from the Caucasus.

Err, nope.

The Ozera Yamnaya outlier, a female dated to 3096-2913 calBCE, is, in fact, a ~50/50 mix between standard Yamnaya and Late Maykop. It's a result that is totally unambiguous.

There are a number of ways to demonstrate this fact. For example, with the qpAdm software that was also used by Penske et al., except with different outgroups or right pops. Please note that in my dataset the Ozera outlier is labeled Ukraine_Ozera_EBA_Yamnaya_o.

right pops:
Cameroon_SMA
Levant_N
Iran_GanjDareh_N
Iran_C_SehGabi
Georgia_HG
Turkey_N
Serbia_IronGates_Mesolithic
Russia_WestSiberia_HG
Russia_Karelia_HG
Latvia_HG
Russia_Boisman_MN
Brazil_LapaDoSanto_9600BP

Ukraine_Ozera_EBA_Yamnaya_o
Russia_Caucasus_EneolithicMaykop 0.554±0.031
Russia_Steppe_Eneolithic 0.446±0.031
P-value 0.00109868 (FAIL)


Ukraine_Ozera_EBA_Yamnaya_o
Russia_LateMaykop 0.512±0.035
Russia_Samara_EBA_Yamnaya 0.488±0.035
P-value 0.462447 (PASS)

I can also do it with the Global25/Vahaduo method. And you, dear reader, can too, by putting the Target and Source Global25 coords from the text file here into the relevant fields here.

Target: Ukraine_Ozera_EBA_Yamnaya_o
Distance: 2.9292% / 0.02929202
50.6 Russia_Samara_EBA_Yamnaya
49.4 Russia_Caucasus_LateMaykop
0.0 Russia_Caucasus_EneolithicMaykop
0.0 Russia_Steppe_Eneolithic

Moreover, here's a self-explanatory Principal Component Analysis (PCA) plot that illustrates why my Late Maykop/Samara Yamnaya combo is much better than the reference populations used by Penske and colleagues. It was done with the PCA tools here.
I'm pointing this out for two main reasons. First of all, this is a fairly obvious mistake that should've been avoided, especially considering the level of expertise and experience among the authors (such as Wolfgang Haak and Johannes Krause).

Secondly, it's important to understand that the Ozera outlier comes out almost exactly 50% Samara Yamnaya because the standard Yamnaya genotype already existed well before she was alive, and thus she cannot be used to corroborate any sort of influence from the Caucasus in the formation of the mainstream Yamnaya population.


As for the Yamnaya Caucasus individuals, I don't know why Penske et al. attempted to model their ancestry as a group, because they don't form a coherent genetic cluster. RK1001 and ZO2002 are fairly similar to standard Yamnaya samples, while RK1007 and SA6010 resemble Eneolithic steppe samples from the Progress burial site. This is what happens when I try to reproduce the Penske et al. model with my outgroups.

Russia_Caucasus_EBA_Yamnaya
Russia_Caucasus_EneolithicMaykop 0.187±0.019
Russia_Steppe_Eneolithic 0.813±0.019
P-value 4.15842e-06 (HARD FAIL)

Oh, and Penske et al. modeled the ancestry of mainstream Yamnaya as a three-way mixture with Steppe Eneolithic, Caucasus Eneolithic/Maykop and Ukraine Neolithic (or Ukraine N). They succeeded, but with my outgroups it's another hard fail.

Russia_Samara_EBA_Yamnaya
Russia_Caucasus_EneolithicMaykop 0.177±0.017
Russia_Steppe_Eneolithic 0.706±0.026
Ukraine_N 0.116±0.014
P-value 4.73919e-07 (HARD FAIL)

Admittedly, proximal models aren't easy to get right. And if you throw enough outgroups into a model, a large proportion of plausible models will fail. But I'm somewhat taken aback by these poor statistical fits.

In my opinion, mainstream Yamnaya doesn't harbor any Caucasus ancestry that wasn't already present on the Pontic-Caspian steppe during the Eneolithic or even much earlier (see here). But ultimately this problem can only be solved with direct evidence from ancient DNA, so let's now wait patiently for the right samples.

Citation...

Penske et al., Early contact between late farming and pastoralist societies in southeastern Europe, Nature, https://doi.org/10.1038/s41586-023-06334-8

See also...

Understanding the Eneolithic steppe

Wednesday, July 19, 2023

Early contact between farmers and pastoralists in ancient Europe (Penske et al. 2023)


I can't wait to get stuck into the data from the new Penske et al. paper. This is likely to be the main topic on this blog for the next few weeks, or perhaps even months.

Early contact between late farming and pastoralist societies in southeastern Europe

By the way, I think it's hilarious how the authors totally ignored the fact that the North Pontic region is located in Eastern Europe. Instead they used the term Eurasian steppes, suggesting that Western Steppe Herders (WSH) may have come not from Eastern Europe but from some part of Asia. Haha.

See also...

Dear Sandra, Wolfgang...we have a problem

Saturday, April 8, 2023

Dear Harald...


I've started analyzing the Identity-by-Descent (IBD) data from the recent Ringbauer et al. preprint (see here). Unfortunately, it'll take me a few weeks to do this properly, so I won't be able to write anything detailed on the topic for a while.

Meantime, this is the comment that I left for the authors at bioRxiv (at this time it's still being approved, but it should appear there within a day or so, possibly along with a reply from the authors):

Hello authors,

Thanks for the interesting preprint and data. However, I'd like to see you address a couple of technical issues and perhaps one theoretical issue in the final manuscript:

- the output you posted shows some unusual results, which are potentially false positives that appear to be concentrated among the shotgun and noUDG samples. I'm guessing that this is due to the same types of ancient DNA damage creating IBD-like patterns in these samples. If so, isn't there a risk that many or even most of the individuals in your analysis are affected by this problem to some degree, which might be skewing your estimates of genealogical relatedness between them?

- many individuals from groups that have experienced founder effects, such as Ashkenazi Jews, appear to be close genetic cousins, even though they're not genealogical cousins. Basically, the reason for this is reduced haplotype diversity in such populations. Have you considered the possibility that at least some of the close relationships that you're seeing between individuals and populations might be exaggerated by founder effects?

- thanks to ancient DNA we've learned that the Yamnaya phenomenon isn't just an archeological horizon, but also a closely related and genetically very similar group of people. Indeed, in my mind, ancient DNA has helped to redefine the Yamnaya concept, with Y-chromosome haplogroup R1b-Z2103 now being one of the key traits of the Yamnaya identity. So considering that the Corded Ware people are not rich in R1b-Z2103, and even the earliest Corded Ware individuals are somewhat different from the Yamnaya people in terms of genome-wide genetic structure, it doesn't seem right to keep claiming that the Corded Ware population is derived from Yamnaya. I can't see anything in your IBD data that would preclude the idea that the Corded Ware and Yamnaya peoples were different populations derived from the same as yet unsampled pre-Yamnaya/post-Sredny steppe group.

See also...

Dear Harald #2

On the origin of the Corded Ware people

Monday, February 13, 2023

Dear David, Nick, Iosif...let me tell you about Yamnaya


Lazaridis, Alpaslan-Roodenberg et al. recently claimed that the Yamnaya people of the Pontic-Caspian (PC) steppe carried "substantial" ancestry from what is now Armenia or surrounds.

However, this claim is essentially false.

Only one individual associated with the Yamnaya culture shows an unambiguous signal of such ancestry. This is a female usually labeled Ukraine_Yamnaya_Ozera_o:I1917. The "o" suffix indicates that she is an outlier from the main Yamnaya genetic cluster.

Unlike I1917, typical Yamnaya individuals carry a few per cent of ancient European farmer admixture. This ancestry is only very distantly Armenian-related via Neolithic Anatolia (see here).

It's difficult for me to understand how Lazaridis, Alpaslan-Roodenberg et al. missed this. I suspect that they relied too heavily on formal statistics and overinterpreted their results.

Formal statistics are a very useful tool in ancient DNA work. Unfortunately, they're also a relatively blunt tool that often has problems distinguishing between similar sources of gene flow.

There are arguably better methods for studying fine scale ancestry, such as Principal Component Analysis (PCA).

Below is a somewhat special PCA featuring a wide range of ancient populations that plausibly might be relevant to the genetic origins of the Yamnaya people. Unlike most PCA with ancient samples, this PCA doesn't rely on any sort of projection, so that all of the actors are interacting with each other and directly affecting the outcome.


Here's another version of the same plot with a less complicated labeling system. Note that I designed this PCA specifically to differentiate between European populations and those from the Armenian highlands, the Iranian plateau and surrounds.


And here's a close up of the part of the plot that shows the Yamnaya cluster. This cluster is made up of samples associated with the Afanasievo, Catacomb, Poltavka and Yamnaya cultures. All of the individuals in this part of the plot are closely related, which is why they're so tightly packed together. The differentiation between them is caused by admixture from different groups mostly from outside of the PC steppe.


The Yamnaya cluster can be broadly characterized as a population that formed along the genetic continuum between the Eneolithic groups of the Progress region and Neolithic foragers from the Dnieper River valley (Progress_Eneolithic and Ukraine_N, respectively). However, this cluster also shows a slight western shift that is increasingly more pronounced in the Corded Ware samples. This shift is due to the aforementioned admixture from early European farmers.

Indeed, the plot reveals two parallel clines extending west from the Progress samples. One of the clines is made up of the Yamnaya cluster and the Corded Ware samples, and pulls towards the ancient European farmers. The other cline includes Ukraine_Yamnaya_Ozera_o:I1917 and pulls towards samples from the Armenian highlands and surrounds.

Being aware of these two clines and knowing how they came about is important to understanding the genetic prehistory of the PC steppe and indeed of much of Eurasia.

At some point, probably during the late Eneolithic, a Progress-related group experienced gene flow from the west and became the Yamnaya and Corded Ware populations. Sporadically, admixture from the Armenian highlands and the Iranian plateau also entered the PC steppe, giving rise to people like the Steppe Maykop outliers and Ukraine_Yamnaya_Ozera_o:I1917.


Unfortunately, this sort of PCA doesn't offer output suitable for mixture modeling, basically because the recent genetic drift shared by many of the samples creates significant noise.

However, to check that my inferences based on the plot are correct I can create composites with specific ancestry proportions to see how they behave. In the plot below Mix1 is 80% Progress_Eneolithic and 20% Iran_Hajji_Firuz_N, Mix2 is 80% Progress_Eneolithic and 20% Armenia_EBA_Kura_Araxes, while Mix3 is 80% Progress_Eneolithic, 15% Ukraine_N and 5% Hungary_MN_Vinca (Middle Neolithic farmers from the Carpathian Basin).


Obviously, we can't get Yamnaya by mixing Progress_Eneolithic with any ancients from the Armenian highlands or the Iranian plateau. On the other hand, Mix3 works quite well, at least in the first two dimensions. In some of the other dimensions genetic drift specific to Ukraine_N pulls it away from the Yamnaya cluster, but this is to be expected.

By the way, the plots were created with the excellent Vahaduo Custom PCA tool freely available here. It's well worth trying the interactive 3D option using my PCA data. The relevant datasheet is available here.

See also...

Dear David, Nick, Iosif...let's set the record straight

The Caucasus is a semipermeable barrier to gene flow

Friday, January 13, 2023

Dear David, Nick, Iosif...let's set the record straight


Almost a decade ago scientists at the David Reich Lab extracted DNA from the remains of three men from the Khvalynsk II cemetery at the northern end of the Pontic-Caspian (PC) steppe.

These Eneolithic Eastern Europeans showed significant genetic heterogeneity, with highly variable levels of Eastern Hunter-Gatherer (EHG) and Near Eastern-related ancestry components.

As a result, the people at the David Reich Lab concluded that the Eneolithic populations of the PC steppe formed from a relatively recent admixture between local hunter-gatherers and Near Eastern migrants.

Unfortunately, this view has since become the consensus among scientists working with ancient DNA.

I say unfortunately because there's a more straightforward and indeed obvious explanation for the genetic heterogeneity among the samples from Khvalynsk II. It's also the only correct explanation, and it doesn't involve any recent gene flow from the Near East.

Here it is, in point form, as simply as I can put it:

- EHG is best represented by samples from Karelia and Lebyazhinka, which are modern-day Russian localities in the forest zone and on the border between the steppe and the forest-steppe, respectively

- Khvalynsk II is also located on the boundary between the steppe and the forest-steppe, and very far from the Near East

- so the genetic structure of the people buried at Khvalynsk II does represent an admixture event

- however, this admixture event simply involved an EHG population from the forest-steppe and a very distantly Near Eastern-related group native to the steppe (that is, two different Eastern European populations).

I've written this blog post because I think David Reich, Nick Patterson, Iosif Lazaridis and colleagues should finally admit that they didn't quite get this right. And it'd be nice if they could put out a paper sometime soon in which they set the record straight.

See also...