search this blog

Saturday, April 8, 2023

Dear Harald...


I've started analyzing the Identity-by-Descent (IBD) data from the recent Ringbauer et al. preprint (see here). Unfortunately, it'll take me a few weeks to do this properly, so I won't be able to write anything detailed on the topic for a while.

Meantime, this is the comment that I left for the authors at bioRxiv (at this time it's still being approved, but it should appear there within a day or so, possibly along with a reply from the authors):

Hello authors,

Thanks for the interesting preprint and data. However, I'd like to see you address a couple of technical issues and perhaps one theoretical issue in the final manuscript:

- the output you posted shows some unusual results, which are potentially false positives that appear to be concentrated among the shotgun and noUDG samples. I'm guessing that this is due to the same types of ancient DNA damage creating IBD-like patterns in these samples. If so, isn't there a risk that many or even most of the individuals in your analysis are affected by this problem to some degree, which might be skewing your estimates of genealogical relatedness between them?

- many individuals from groups that have experienced founder effects, such as Ashkenazi Jews, appear to be close genetic cousins, even though they're not genealogical cousins. Basically, the reason for this is reduced haplotype diversity in such populations. Have you considered the possibility that at least some of the close relationships that you're seeing between individuals and populations might be exaggerated by founder effects?

- thanks to ancient DNA we've learned that the Yamnaya phenomenon isn't just an archeological horizon, but also a closely related and genetically very similar group of people. Indeed, in my mind, ancient DNA has helped to redefine the Yamnaya concept, with Y-chromosome haplogroup R1b-Z2103 now being one of the key traits of the Yamnaya identity. So considering that the Corded Ware people are not rich in R1b-Z2103, and even the earliest Corded Ware individuals are somewhat different from the Yamnaya people in terms of genome-wide genetic structure, it doesn't seem right to keep claiming that the Corded Ware population is derived from Yamnaya. I can't see anything in your IBD data that would preclude the idea that the Corded Ware and Yamnaya peoples were different populations derived from the same as yet unsampled pre-Yamnaya/post-Sredny steppe group.

See also...

On the origin of the Corded Ware people

Monday, February 13, 2023

Dear David, Nick, Iosif...let me tell you about Yamnaya


Lazaridis, Alpaslan-Roodenberg et al. recently claimed that the Yamnaya people of the Pontic-Caspian (PC) steppe carried "substantial" ancestry from what is now Armenia or surrounds.

However, this claim is essentially false.

Only one individual associated with the Yamnaya culture shows an unambiguous signal of such ancestry. This is a female usually labeled Ukraine_Yamnaya_Ozera_o:I1917. The "o" suffix indicates that she is an outlier from the main Yamnaya genetic cluster.

Unlike I1917, typical Yamnaya individuals carry a few per cent of ancient European farmer admixture. This ancestry is only very distantly Armenian-related via Neolithic Anatolia (see here).

It's difficult for me to understand how Lazaridis, Alpaslan-Roodenberg et al. missed this. I suspect that they relied too heavily on formal statistics and overinterpreted their results.

Formal statistics are a very useful tool in ancient DNA work. Unfortunately, they're also a relatively blunt tool that often has problems distinguishing between similar sources of gene flow.

There are arguably better methods for studying fine scale ancestry, such as Principal Component Analysis (PCA).

Below is a somewhat special PCA featuring a wide range of ancient populations that plausibly might be relevant to the genetic origins of the Yamnaya people. Unlike most PCA with ancient samples, this PCA doesn't rely on any sort of projection, so that all of the actors are interacting with each other and directly affecting the outcome.


Here's another version of the same plot with a less complicated labeling system. Note that I designed this PCA specifically to differentiate between European populations and those from the Armenian highlands, the Iranian plateau and surrounds.


And here's a close up of the part of the plot that shows the Yamnaya cluster. This cluster is made up of samples associated with the Afanasievo, Catacomb, Poltavka and Yamnaya cultures. All of the individuals in this part of the plot are closely related, which is why they're so tightly packed together. The differentiation between them is caused by admixture from different groups mostly from outside of the PC steppe.


The Yamnaya cluster can be broadly characterized as a population that formed along the genetic continuum between the Eneolithic groups of the Progress region and Neolithic foragers from the Dnieper River valley (Progress_Eneolithic and Ukraine_N, respectively). However, this cluster also shows a slight western shift that is increasingly more pronounced in the Corded Ware samples. This shift is due to the aforementioned admixture from early European farmers.

Indeed, the plot reveals two parallel clines extending west from the Progress samples. One of the clines is made up of the Yamnaya cluster and the Corded Ware samples, and pulls towards the ancient European farmers. The other cline includes Ukraine_Yamnaya_Ozera_o:I1917 and pulls towards samples from the Armenian highlands and surrounds.

Being aware of these two clines and knowing how they came about is important to understanding the genetic prehistory of the PC steppe and indeed of much of Eurasia.

At some point, probably during the late Eneolithic, a Progress-related group experienced gene flow from the west and became the Yamnaya and Corded Ware populations. Sporadically, admixture from the Armenian highlands and the Iranian plateau also entered the PC steppe, giving rise to people like the Steppe Maykop outliers and Ukraine_Yamnaya_Ozera_o:I1917.


Unfortunately, this sort of PCA doesn't offer output suitable for mixture modeling, basically because the recent genetic drift shared by many of the samples creates significant noise.

However, to check that my inferences based on the plot are correct I can create composites with specific ancestry proportions to see how they behave. In the plot below Mix1 is 80% Progress_Eneolithic and 20% Iran_Hajji_Firuz_N, Mix2 is 80% Progress_Eneolithic and 20% Armenia_EBA_Kura_Araxes, while Mix3 is 80% Progress_Eneolithic, 15% Ukraine_N and 5% Hungary_MN_Vinca (Middle Neolithic farmers from the Carpathian Basin).


Obviously, we can't get Yamnaya by mixing Progress_Eneolithic with any ancients from the Armenian highlands or the Iranian plateau. On the other hand, Mix3 works quite well, at least in the first two dimensions. In some of the other dimensions genetic drift specific to Ukraine_N pulls it away from the Yamnaya cluster, but this is to be expected.

By the way, the plots were created with the excellent Vahaduo Custom PCA tool freely available here. It's well worth trying the interactive 3D option using my PCA data. The relevant datasheet is available here.

See also...

Dear David, Nick, Iosif...let's set the record straight

Friday, January 13, 2023

Dear David, Nick, Iosif...let's set the record straight


Almost a decade ago scientists at the David Reich Lab extracted DNA from the remains of three men from the Khvalynsk II cemetery at the northern end of the Pontic-Caspian (PC) steppe.

These Eneolithic Eastern Europeans showed significant genetic heterogeneity, with highly variable levels of Eastern Hunter-Gatherer (EHG) and Near Eastern-related ancestry components.

As a result, the people at the David Reich Lab concluded that the Eneolithic populations of the PC steppe formed from a relatively recent admixture between local hunter-gatherers and Near Eastern migrants.

Unfortunately, this view has since become the consensus among scientists working with ancient DNA.

I say unfortunately because there's a more straightforward and indeed obvious explanation for the genetic heterogeneity among the samples from Khvalynsk II. It's also the only correct explanation, and it doesn't involve any recent gene flow from the Near East.

Here it is, in point form, as simply as I can put it:

- EHG is best represented by samples from Karelia and Lebyazhinka, which are modern-day Russian localities in the forest zone and on the border between the steppe and the forest-steppe, respectively

- Khvalynsk II is also located on the boundary between the steppe and the forest-steppe, and very far from the Near East

- so the genetic structure of the people buried at Khvalynsk II does represent an admixture event

- however, this admixture event simply involved an EHG population from the forest-steppe and a very distantly Near Eastern-related group native to the steppe (that is, two different Eastern European populations).

I've written this blog post because I think David Reich, Nick Patterson, Iosif Lazaridis and colleagues should finally admit that they didn't quite get this right. And it'd be nice if they could put out a paper sometime soon in which they set the record straight.

See also...


Monday, January 2, 2023

Trying to catch up


For starters, I need population labels for many of these G25 coords:

Koptekin et al. 2022

Peltola et al. 2022

Skourtanioti et al. 2023

Varela et al. 2023

Wang et al. 2023

Yu et al. 2023

I'll be running more samples later this week, and I'll need help in organizing them for the G25 datasheets. See comments below for more details.

Sunday, November 13, 2022

A reappraisal of Ashkenazic maternal ancestry


Kevin Brook, who occasionally comments on this blog, has published a peer-reviewed book titled The Maternal Genetic Lineages of Ashkenazic Jews.

The book focuses on 129 mitochondrial (mtDNA) haplogroups that are found in present-day Ashkenazic Jews, and reveals that these lineages can be traced back to a wide range of places, such as Israel, Italy, Poland, Germany, North Africa, and China.

Ergo, it argues that both Israelites and converts to Judaism from a variety of gentile groups made lasting contributions to the Ashkenazic maternal gene pool. In Kevin's own words, the book also:

- shows that all Ashkenazim remain genetically linked to a significant degree to other types of Jewish populations, not only paternally but maternally as well

- disproves the myth that Cossack rapists were responsible for any of the non-Israelite DNA in Ashkenazim

- presents new DNA evidence in favor of a small contribution of Khazarian and Alan converts to Judaism to the Ashkenazic gene pool.

That makes good sense based on what I've learned over the years from studying modern and ancient genome-wide Ashkenazic DNA. More information about Kevin's book is available at the Khazaria.com website HERE.

See also...

My take on the Erfurt Jews

Tuesday, November 1, 2022

The story of R-V1636


Who wants to bet against this map? Keep in mind that ART038 (~3000 calBCE) remains the oldest sample with the V1636 and R1b Y-chromosome mutations in the West Asian ancient DNA record. Ergo, there's nothing to suggest that V1636 or R1b entered Eastern Europe from West Asia.

See also...

A tantalizing link

How relevant is Arslantepe to the PIE homeland debate?

Thursday, October 27, 2022

The Yassitepe challenge


This is about the only successful qpAdm model that I can find for the pair of Early Bronze Age (EBA) females from Yassitepe, Turkey, using a decent set of outgroups and markers. I wouldn't take it too literally, but it does suggest a potentially significant level of European ancestry, including some steppe ancestry, in these Yassitepe individuals.

TUR_Aegean_Yassitepe_EBA
AZE_Caucasus_lowlands_LN 0.565±0.054
ROU_N 0.387±0.041
RUS_Progress_En 0.048±0.022

P-value 0.103248
Full output

If anyone reading this can find a better, more convincing solution then I'd love to see it. Feel free to share it in the comments below.

Obviously, both of the Yassitepe samples are from the recent Lazaridis, Alpaslan-Roodenberg et al. paper. Their EBA dating suggests that they might be relevant to the debate over the origins of Anatolian speakers, such as the Hittites and Luwians.

See also...

Dear Iosif, about that ~2%

The precursor of the Trojans

Thursday, October 13, 2022

The Kura-Araxes people deserve better


When discussing the Kura-Araxes culture and its people it's important to understand these key points:

- there is Eastern European steppe ancestry in Kura-Araxes samples, and if you're not seeing it then you're not looking hard enough

- Armenian Kura-Araxes samples are mainly a mixture between three different groups currently best represented in the ancient DNA record by ARM_Areni_C, IRN_Hajji_Firuz_C and RUS_Darkveti-Meshoko_En

- ergo, most of the steppe ancestry in the Kura-Araxes population of what is now Armenia must have been mediated via local Chalcolithic groups like ARM_Areni_C

- Kura-Araxes samples show Mesopotamian-related ancestry, and this mustn't be ignored.

Oh, you don't believe it because you just read a big paper in Science claiming otherwise?

Well, the authors of that paper, Lazaridis, Alpaslan-Roodenberg et al., used distal mixture models to study the ancestry of their Kura-Araxes samples, and such models can miss important details.

Consider these three proximate mixture models for a relatively high quality and very homogenous Kura-Araxes sample set from the aforementioned paper. They were done with the qpAdm software

ARM_Kura-Araxes_Berkaber
ARM_Areni_C 0.239±0.068
IRN_Hajji_Firuz_C 0.379±0.068
RUS_Darkveti-Meshoko_En 0.382±0.054
P-value 0.285122 (Pass)
Full output

ARM_Kura-Araxes_Berkaber
IRN_Hajji_Firuz_C 0.569±0.051
RUS_Darkveti-Meshoko_En 0.363±0.058
RUS_Progress_En 0.068±0.020
P-value 0.20306 (Pass)
Full output

ARM_Kura-Araxes_Berkaber
IRN_Hajji_Firuz_C 0.531±0.060
RUS_Darkveti-Meshoko_En 0.469±0.060
P-value 0.0132579 (Fail)
Full output

Some caveats apply. For instance, the pass threshold (P-value ≥0.05) is arbitrary. But the point is that the models look much better with steppe-related and steppe reference populations (ARM_Areni_C and RUS_Progress_En, respectively).

Moreover, the unique and vital Darkveti-Meshoko population is represented by just one individual. I also have the genotypes of his brother and sister, but relatives aren't allowed in these sorts of tests.

Including a singleton in the analysis means that I can't use the inbreed: YES option, which apparently can be a bad thing. Nevertheless, these models do look very solid.

Indeed, I can also model ARM_Kura-Araxes_Berkaber as practically 100% RUS_Maykop_Novosvobodnaya, perhaps with some excess ARM_Areni_C-related input.

ARM_Kura-Araxes_Berkaber
ARM_Areni_C 0.094±0.087
RUS_Maykop_Novosvobodnaya 0.906±0.087
P-value 0.284259 (Pass)
Full ouput

This makes good sense, because RUS_Maykop_Novosvobodnaya can also be modeled solidly as a mixture between IRN_Hajji_Firuz_C, RUS_Darkveti-Meshoko_En and RUS_Progress_En.

RUS_Maykop_Novosvobodnaya
IRN_Hajji_Firuz_C 0.614±0.056
RUS_Darkveti-Meshoko_En 0.307±0.064
RUS_Progress_En 0.080±0.022
P-value 0.141468 (Pass)
Full output

I don't know whether the genetic relationship between ARM_Kura-Araxes_Berkaber and RUS_Maykop_Novosvobodnaya shown in my model is due to Maykop ancestry in the former. It might just be a coincidence in the sense that the same or similar processes led to the formation of both groups. Feel free to let me know your thoughts about that in the comments.

The fact that the Kura-Araxes people harbored steppe ancestry might be very important in the debate over the location of the so called Indo-Anatolian homeland. For instance, it's possible that the proto-Anatolian language spread from the North Caucasus into Anatolia via the Kura-Araxes culture.

But, admittedly, such a solution doesn't have strong support from historical linguistics data, which suggest that the Indo-Anatolian homeland was located in what is now Ukraine and that Anatolian speakers entered West Asia via the Balkans:

Indo-European cereal terminology suggests a Northwest Pontic homeland for the core Indo-European languages

See also...

R-V1636: Eneolithic steppe > Kura-Araxes?

Dear Iosif...Yamnaya

But Iosif, what about the Phrygians?

Thursday, October 6, 2022

Balto-Slavs and Sarmatians in the Battle of Himera


G25 coordinates for most of the samples from the recent Reitsema et al. paper are available in a text file here. They're also in the G25 datasheets at the usual link here.

A basic distance analysis with the G25 data at Vahaduo shows that the two samples labeled Himera_480BCE_3 are either early Balts or Slavs. I suspect that they're Slavs, because I believe that early Slavs had this type of Baltic-like genetic structure before mixing with their non-Slavic-speaking neighbors. Well, that's my pet theory for now, so take it or leave it.

Distance to: ITA_Sicily_Himera_480BCE_3:I10943
0.03393838 HUN_IA_La_Tene_o:I18226
0.03572886 DEU_MA_Krakauer_Berg:KRA001
0.03618075 RUS_Pskov_VA:VK159
0.03899963 SWE_Gotland_VA:VK463
0.03915018 Baltic_EST_IA:s19_V12_1

Distance to: ITA_Sicily_Himera_480BCE_3:I10949
0.03573636 HUN_IA_La_Tene_o3:I25524
0.03698768 HUN_IA_La_Tene_o:I18226
0.03732752 SWE_Skara_VA:VK397
0.03767022 Baltic_EST_IA:s19_V12_1
0.03772687 DEU_MA_Krakauer_Berg:KRA001

On the other hand, I'm almost certain that the two Himera_480BCE_4 samples are Sarmatians. The good old G25 does it again!

Distance to: ITA_Sicily_Himera_480BCE_4:I10944
0.03100861 KAZ_Segizsay_Sarmatian:SGZ002
0.03548059 MDA_Sarmatian:I11925
0.03619219 RUS_Urals_Sarmatian:MJ56
0.03626538 RUS_Urals_Sarmatian:chy001
0.03904260 RUS_Urals_Sarmatian:MJ41

Distance to: ITA_Sicily_Himera_480BCE_4:I10947
0.02989458 RUS_Urals_Sarmatian:MJ43
0.03052790 RUS_Urals_Sarmatian:chy002
0.03170622 KAZ_Kangju:DA226
0.03288789 TUR_BlackSea_Samsun_Anc_C:I4529
0.03310149 KAZ_Aigyrly_Sarmatian:AIG003
See also...

Slavic-like Medieval Germans