search this blog

Thursday, January 14, 2021

David Anthony on Y-haplogroup R1a

Archeologist David Anthony has a new theory which attempts to explain why Y-haplogroup R1a hasn't yet been found in any Yamnaya graves. Basically, he thinks that it was carried by Yamnaya men who weren't buried in kurgans, because they were part of a social underclass, and so their remains are now difficult to locate. See here.

This is an interesting attempt to find a socio-archeological solution to a genetic question, but it's unnecessarily complicated and, in fact, also unnecessary.

The important thing to understand about R1a is that it's rarely seen in the ancient DNA record before the rise of the Corded Ware culture (CWC). Moreover, the vast majority of the R1a lineages in the world today belong to the R1a-M417 subclade, which is a relatively young (Eneolithic era?) marker and closely associated with the CWC population and its rapid expansion.

Indeed, modern R1a lineages show a very strong star-like phylogeny indicative of a series of rapid and massive expansions starting from a handful of lineages only a few thousand years ago.

So if R1a was actually present in the Yamnaya population, then the obvious reason why it hasn't yet been found in any Yamnaya remains is because it was only carried by a very small group of Yamnaya men. Simple as that.

Its expansions from the Pontic-Caspian (PC) steppe, predominantly via the highly successful R1a-M417, may have coincidentally and rather ironically started in a socially disenfranchised Yamnaya clan.

But my view is that R1a-M417 just happened to be present in a small group of early Yamnaya or Yamnaya-related males who came up with an economic package that allowed them to expand out of the PC steppe like no one else before them, and so they did just that.

Anthony is currently collaborating on a new paper about the Eneolithic era on the PC steppe with scientists from Harvard's David Reich Lab (see here). I'm really hoping that they get this right.

See also...

Fatyanovo as part of the wider Corded Ware family

Tuesday, December 29, 2020

Fully automated graph exploration

Scientists at Broad MIT are working on a new feature-packed and "lightning fast" version of Admixtools that runs in R. It's already available via this link...

I don't have access to a Linux machine right now, but since this thing runs in R then it also runs in Windows, and I do have a Windows computer here.

One of the most interesting and useful features in the new R package is arguably the find_graphs function, which automatically searches for admixture graphs that reflect the observed f-statistics. That is, once the user chooses the samples and settings, find_graphs runs an unsupervised admixture graph analysis.

Here are a couple of graphs that I knocked out with find_graphs in about five minutes each. The commands and settings that I used are listed in a text file here.

The two topologies above were among the most commonly seen in a series of about 50 runs with the same sample set. A couple of basic inferences based on the output:

- RUS_Progress-Vonyuchka_En harbors GEO_Kotias-Satsurblia_HG-related ancestry, not IRN_Ganj_Dareh_N-related ancestry

- IRN_Ganj_Dareh_N and TKM_Geoksyur_En form a clade to the exclusion of GEO_Kotias-Satsurblia_HG.

The results are certainly in line with those from other types of analyses that I've done on this blog (for instance, see here and here).

Update 05/01/21: Robert Maier, one of the creators of Admixtools2, has left this message in the comments below.

I'm glad to see that there is so much interest in Admixtools2! I very much appreciate any comments and suggestions on how to improve it and how to make it more user friendly.

Because it's still under active development, some things are likely to change in the future. For example, there is a faster successor to "find_graphs", called "find_graphs2", but in the future they will probably be merged into one.

I'm in David Reich’s group at Harvard and Broad and we are hoping to publish a paper describing Admixtools2 where we illustrate its value by using it to test how robust several previously published results are by exploring a large number of alternative models for each of them. If any of you use Admixtools2 to find graphs that are significantly better fits than published graphs and are also historically plausible - or if you find families of graphs that are equally good fits to the published ones but provide qualitatively different conclusions about population relationships - please contact us. That would be a meaningful contribution to the paper we write about this and we’d be open to including someone as a co-author based on identifying case studies like this.

Sunday, December 6, 2020

Looking forward to a post-Covid world

I was hoping that the Covid-19 pandemic wouldn't have an immediate impact on the publication of ancient DNA papers and new data, but considering how much things have slowed down in this respect, it seems that I was fooling myself.

So let's take a break until early next year, and then see what happens.

Trust me, we've got a lot to look forward to in the post-Covid-19 world. Based on what I've heard from various sources, here are some predictions about what we might see:

- the search for the Proto-Indo-European homeland will shift west to the North Pontic steppe

- on the other hand, the search for the Proto-Uralic homeland will move deep into Siberia

- the key role of the Single Grave (westernmost Corded Ware) culture in the population history of Western Europe will finally get some attention

- following on from the above, Y-haplogroup R1b-L51 will be revealed as a Single Grave marker

- the idea that the Pontic-Caspian steppe was colonized by migrants from Mesopotamia during the Bronze Age will be forgotten, and, ironically, we'll instead learn that there was a significant influx of steppe ancestry into ancient Mesopotamia

- Old Kingdom Egyptians will come out less Sub-Saharan African than present-day Egyptians.

I probably shouldn't blab everything out, so that's all you're getting from me for now. You'll just have to wait for the rest until next year, or perhaps even the year after that.

See also...

Friday, November 13, 2020

Fatyanovo as part of the wider Corded Ware family (Nordqvist and Heyd 2020)

There's a new archeological paper about the Fatyanovo culture at the Proceedings of the Prehistoric Society [LINK]. It includes this quote on page 18:

In the traditional narrative, the Fatyanovo people – like the CWC populations in general – are regarded as Indo-European, representing the pre-Balto-Slavic (-Germanic) stage (Carpelan & Parpola 2001, 88; Anthony 2007, 380; also Gimbutas 1956, 163; Tretyakov 1966, 109) in the spread of Indo-European languages.

That's correct, but considering the latest ancient DNA research on the Fatyanovo people, the traditional narrative is probably wrong. Fatyanovo males were rich in Y-haplogroup R1a-Z93, which is found at very low frequencies in Balto-Slavic populations (see here). It's actually much more common nowadays in Central and South Asia, where it often reaches frequencies of over 50% in Indo-Iranian speaking groups.

Balts and Slavs are rich in R1a-Z282, which is a sister clade of R1a-Z93, and has been found in Corded Ware and Corded Ware-related samples from west of Fatyanovo sites. That is, in present-day Poland and the Baltic states.

Therefore, the origins of the Balto-Slavs should be sought somewhere west of the Fatyanovo culture, probably in the Corded Ware derived populations from what is now the border zone between Poland, Belarus and Ukraine.

Indeed, in my view the Fatyanovo people are more likely to have spoken Proto-Indo-Iranian rather than anything ancestral to Baltic or Slavic (see here).
Nordqvist and Heyd, The Forgotten Child of the Wider Corded Ware Family: Russian Fatyanovo Culture in Context, Proceedings of the Prehistoric Society, online 12 November 2020, DOI:

See also...

The oldest R1a to date

Saturday, November 7, 2020

Slavic-like Medieval Germans

The samples labeled DEU_Krakauer_Berg_MA in the Principal Component Analysis (PCA) plot below are from a recent paper by Parker et al. at Scientific Reports. Their remains were excavated from a Medieval cemetery in the now abandoned village of Krakauer Berg in eastern Germany.

Krakauer sounds sort of like Kraków, doesn't it? That's probably not a coincidence, especially considering how these people behave in my analysis. To see an interactive version of the plot, paste the coordinates from the text file here into the relevant field here.

See also...

Yamnaya-related ancestry proportions in present-day Poles

Warriors from at least two different populations fought in the Tollense Valley battle

Viking world open analysis and discussion thread

Wednesday, October 14, 2020

A new model for the genomic formation of First American ancestors in Asia (Ning et al. 2020 preprint)

Over at bioRxiv at this LINK. The main topic of the preprint is largely outside the scope of this blog. However, the manuscript includes a detailed discussion about how to get the most out of the qpAdm mixture modeling program. I've used qpAdm regularly over the years, and I plan to use it more often in the future, so I'll be looking very carefully at the qpAdm methodology that Ning et al. are recommending. Here's the preprint abstract:

Upward Sun River 1, an individual from a unique burial of the Denali tradition in Alaska (11500 calBP), is considered a type representative of Ancient Beringians who split from other First Americans 22000-18000 calBP in Beringia. Using a new admixture graph model-comparison approach resistant to overfitting, we show that Ancient Beringians do not form the deepest American lineage, but instead harbor ancestry from a lineage more closely related to northern North Americans than to southern North Americans. Ancient Beringians also harbor substantial admixture from a lineage that did not contribute to other Native Americans: Amur River Basin populations represented by a newly reported site in northeastern China. Relying on these results, we propose a new model for the genomic formation of First American ancestors in Asia.

Ning et al., The genomic formation of First American ancestors in East and Northeast Asia, bioRxiv, posted October 12, 2020, doi:

See also...

Ancient ancestry proportions in present-day Europeans

Major updates to ADMIXTOOLS

Yamnaya-related ancestry proportions in present-day Poles

Tuesday, September 29, 2020

Viking world open analysis and discussion thread

Global25 and Celtic vs Germanic coordinates for most of the samples from the recent Margaryan et al. Viking paper are now available HERE and HERE, respectively. Look for the VK2020 prefix.

Feel free to put them through their paces and let me know what you find. Below are a couple of examples of what can be done with these coordinates using Vahaduo Global25 Views.

See also...

Viking invasion at bioRxiv

Commoner or elite?

Who were the people of the Nordic Bronze Age?

Wednesday, September 16, 2020

Domestic horses were introduced into Anatolia and Transcaucasia during the Bronze Age (Guimaraes et al. 2020)

Over at Science Advances at this LINK. This is a very important paper because it basically eliminates West Asia as the source of the modern domestic horse lineage, which leaves the Pontic-Caspian steppe in Eastern Europe as the only viable option.

It also corroborates the linguistic theory that the Proto-Indo-European homeland was located on the Pontic-Caspian steppe. That's because the horse is a key animal in the Proto-Indo-European pantheon, and it appears in Indo-European mythology in intricate roles. This suggests that the speakers of Proto-Indo-European weren't just familiar with the horse but also managed to domesticate it. From the paper:

Abstract: Despite the important roles that horses have played in human history, particularly in the spread of languages and cultures, and correspondingly intensive research on this topic, the origin of domestic horses remains elusive. Several domestication centers have been hypothesized, but most of these have been invalidated through recent paleogenetic studies. Anatolia is a region with an extended history of horse exploitation that has been considered a candidate for the origins of domestic horses but has never been subject to detailed investigation. Our paleogenetic study of pre- and protohistoric horses in Anatolia and the Caucasus, based on a diachronic sample from the early Neolithic to the Iron Age (~8000 to ~1000 BCE) that encompasses the presumed transition from wild to domestic horses (4000 to 3000 BCE), shows the rapid and large-scale introduction of domestic horses at the end of the third millennium BCE. Thus, our results argue strongly against autochthonous independent domestication of horses in Anatolia.
Guimaraes et al., Ancient DNA shows domestic horses were introduced in the southern Caucasus and Anatolia during the Bronze Age, Science Advances 16 Sep 2020: Vol. 6, no. 38, eabb0030, DOI: 10.1126/sciadv.abb0030

See also...

Tuesday, September 8, 2020

Warriors from at least two different populations fought in the Tollense Valley battle

I can't get the genotype data from the Burger et al. paper. The lead authors, Joachim Burger and Daniel Wegmann, aren't replying to my emails.

But they were gracious enough to release the BAM files for each of their samples, and these files can be converted to genotype data. So I've included ten of the Tollense Valley warriors (DEU_Tollense_BA) in the Global25 datasheets (see here).

The claim in the paper that these warriors "represent an unstructured population" is absolutely false and extremely naive.

Below are a couple of Principal Component Analysis (PCA) plots produced with Vahaduo Global25 views. The samples are labeled according to their Y-chromosome haplogroups. To see interactive versions of the same plots, paste the Global25 coordinates from the text file here into the relevant fields here.

These warriors are not a single unstructured population, because they cover too much ground in the above plots for that to be possible. It's clear to me that they represent at least two different groups from Central Europe and surrounds.

Of course, this would be a lot easier to work out if Burger et al. cared to supply more information about each of the warriors, such as their attire, weapons, circumstances of death, and so on. It's a complete mystery to me why this wasn't included in the paper, and the authors are refusing to talk to me, so it's unlikely that I'll ever be able to get it from them.

In the absence of such crucial archeological and anthropological data, I don't want to speculate too much, and get overly creative, but here are a couple of possible scenarios to explain the ancient DNA results:
- this may have been a battle between two Central European armies, one rich in Y-haplogroup R1b and the other rich in Y-haplogroup I2a, as well as their allies or hired help, including warriors from Eastern Europe belonging to Y-haplogroup R1a

- or perhaps it was an invasion from the east by warriors rich in Y-haplogroup R1a, and it was a success, with the local armies, rich in Y-haplogroups R1b and I2a, losing the battle and suffering most of the casualties.

I'm sure that one day someone will attempt to undertake a decent multidisciplinary study of this epic battle, and we'll at least have a rough idea about what happened. Or not.


Burger et al., Low Prevalence of Lactase Persistence in Bronze Age Europe Indicates Ongoing Strong Selection over the Last 3,000 Years, Current Biology, Available online 3 September 2020,

See also...

Genetic and linguistic structure across space and time in Northern Europe

Sunday, September 6, 2020

Low prevalence of lactase persistence in Bronze Age Europe (Burger et al. 2020)

Over at Current Biology at this LINK. Unfortunately, this is the long-awaited Tollense Valley battle paper. Despite the obvious presence of some very interesting genetic substructures among the Tollense Valley warriors (see here), the authors have the audacity to claim that these individuals represent a "single unstructured Central/Northern European population".

One of the warriors, labeled WEZ56, belongs to Y-haplogroup R1a and shows an exceedingly Balto-Slavic-like genome-wide genetic structure. But none of this is even mentioned in passing in the paper. Indeed, according to Burger at al., WEZ56 is best classified as belonging to R1, even though the R1a classification is quite secure based on the raw data that the authors posted online.

Be extremely wary of what you read in this paper, and anything else that these scientists have published in the past and will publish in the future. Below is the paper summary:

Lactase persistence (LP), the continued expression of lactase into adulthood, is the most strongly selected single gene trait over the last 10,000 years in multiple human populations. It has been posited that the primary allele causing LP among Eurasians, rs4988235-A [1], only rose to appreciable frequencies during the Bronze and Iron Ages [2, 3], long after humans started consuming milk from domesticated animals. This rapid rise has been attributed to an influx of people from the Pontic-Caspian steppe that began around 5,000 years ago [4, 5]. We investigate the spatiotemporal spread of LP through an analysis of 14 warriors from the Tollense Bronze Age battlefield in northern Germany (∼3,200 before present, BP), the oldest large-scale conflict site north of the Alps. Genetic data indicate that these individuals represent a single unstructured Central/Northern European population. We complemented these data with genotypes of 18 individuals from the Bronze Age site Mokrin in Serbia (∼4,100 to ∼3,700 BP) and 37 individuals from Eastern Europe and the Pontic-Caspian Steppe region, predating both Bronze Age sites (∼5,980 to ∼3,980 BP). We infer low LP in all three regions, i.e., in northern Germany and South-eastern and Eastern Europe, suggesting that the surge of rs4988235 in Central and Northern Europe was unlikely caused by Steppe expansions. We estimate a selection coefficient of 0.06 and conclude that the selection was ongoing in various parts of Europe over the last 3,000 years.

Burger et al., Low Prevalence of Lactase Persistence in Bronze Age Europe Indicates Ongoing Strong Selection over the Last 3,000 Years, Current Biology, Available online 3 September 2020,

See also...

Warriors from at least two different populations fought in the Tollense Valley battle