search this blog

Wednesday, September 16, 2020

Domestic horses were introduced into Anatolia and Transcaucasia during the Bronze Age (Guimaraes et al. 2020)

Over at Science Advances at this LINK. This is a very important paper because it basically eliminates West Asia as the source of the modern domestic horse lineage, which leaves the Pontic-Caspian steppe in Eastern Europe as the only viable option.

It also corroborates the linguistic theory that the Proto-Indo-European homeland was located on the Pontic-Caspian steppe. That's because the horse is a key animal in the Proto-Indo-European pantheon, and it appears in Indo-European mythology in intricate roles. This suggests that the speakers of Proto-Indo-European weren't just familiar with the horse but also managed to domesticate it. From the paper:

Abstract: Despite the important roles that horses have played in human history, particularly in the spread of languages and cultures, and correspondingly intensive research on this topic, the origin of domestic horses remains elusive. Several domestication centers have been hypothesized, but most of these have been invalidated through recent paleogenetic studies. Anatolia is a region with an extended history of horse exploitation that has been considered a candidate for the origins of domestic horses but has never been subject to detailed investigation. Our paleogenetic study of pre- and protohistoric horses in Anatolia and the Caucasus, based on a diachronic sample from the early Neolithic to the Iron Age (~8000 to ~1000 BCE) that encompasses the presumed transition from wild to domestic horses (4000 to 3000 BCE), shows the rapid and large-scale introduction of domestic horses at the end of the third millennium BCE. Thus, our results argue strongly against autochthonous independent domestication of horses in Anatolia.
Guimaraes et al., Ancient DNA shows domestic horses were introduced in the southern Caucasus and Anatolia during the Bronze Age, Science Advances 16 Sep 2020: Vol. 6, no. 38, eabb0030, DOI: 10.1126/sciadv.abb0030

See also...

Tuesday, September 8, 2020

Warriors from at least two different populations fought in the Tollense Valley battle

I can't get the genotype data from the Burger et al. paper. The lead authors, Joachim Burger and Daniel Wegmann, aren't replying to my emails.

But they were gracious enough to release the BAM files for each of their samples, and these files can be converted to genotype data. So I've included ten of the Tollense Valley warriors (DEU_Tollense_BA) in the Global25 datasheets (see here).

The claim in the paper that these warriors "represent an unstructured population" is absolutely false and extremely naive.

Below are a couple of Principal Component Analysis (PCA) plots produced with Vahaduo Global25 views. The samples are labeled according to their Y-chromosome haplogroups. To see interactive versions of the same plots, paste the Global25 coordinates from the text file here into the relevant fields here.

These warriors are not a single unstructured population, because they cover too much ground in the above plots for that to be possible. It's clear to me that they represent at least two different groups from Central Europe and surrounds.

Of course, this would be a lot easier to work out if Burger et al. cared to supply more information about each of the warriors, such as their attire, weapons, circumstances of death, and so on. It's a complete mystery to me why this wasn't included in the paper, and the authors are refusing to talk to me, so it's unlikely that I'll ever be able to get it from them.

In the absence of such crucial archeological and anthropological data, I don't want to speculate too much, and get overly creative, but here are a couple of possible scenarios to explain the ancient DNA results:
- this may have been a battle between two Central European armies, one rich in Y-haplogroup R1b and the other rich in Y-haplogroup I2a, as well as their allies or hired help, including warriors from Eastern Europe belonging to Y-haplogroup R1a

- or perhaps it was an invasion from the east by warriors rich in Y-haplogroup R1a, and it was a success, with the local armies, rich in Y-haplogroups R1b and I2a, losing the battle and suffering most of the casualties.

I'm sure that one day someone will attempt to undertake a decent multidisciplinary study of this epic battle, and we'll at least have a rough idea about what happened. Or not.


Burger et al., Low Prevalence of Lactase Persistence in Bronze Age Europe Indicates Ongoing Strong Selection over the Last 3,000 Years, Current Biology, Available online 3 September 2020,

See also...

Genetic and linguistic structure across space and time in Northern Europe

Sunday, September 6, 2020

Low prevalence of lactase persistence in Bronze Age Europe (Burger et al. 2020)

Over at Current Biology at this LINK. Unfortunately, this is the long-awaited Tollense Valley battle paper. Despite the obvious presence of some very interesting genetic substructures among the Tollense Valley warriors (see here), the authors have the audacity to claim that these individuals represent a "single unstructured Central/Northern European population".

One of the warriors, labeled WEZ56, belongs to Y-haplogroup R1a and shows an exceedingly Balto-Slavic-like genome-wide genetic structure. But none of this is even mentioned in passing in the paper. Indeed, according to Burger at al., WEZ56 is best classified as belonging to R1, even though the R1a classification is quite secure based on the raw data that the authors posted online.

Be extremely wary of what you read in this paper, and anything else that these scientists have published in the past and will publish in the future. Below is the paper summary:

Lactase persistence (LP), the continued expression of lactase into adulthood, is the most strongly selected single gene trait over the last 10,000 years in multiple human populations. It has been posited that the primary allele causing LP among Eurasians, rs4988235-A [1], only rose to appreciable frequencies during the Bronze and Iron Ages [2, 3], long after humans started consuming milk from domesticated animals. This rapid rise has been attributed to an influx of people from the Pontic-Caspian steppe that began around 5,000 years ago [4, 5]. We investigate the spatiotemporal spread of LP through an analysis of 14 warriors from the Tollense Bronze Age battlefield in northern Germany (∼3,200 before present, BP), the oldest large-scale conflict site north of the Alps. Genetic data indicate that these individuals represent a single unstructured Central/Northern European population. We complemented these data with genotypes of 18 individuals from the Bronze Age site Mokrin in Serbia (∼4,100 to ∼3,700 BP) and 37 individuals from Eastern Europe and the Pontic-Caspian Steppe region, predating both Bronze Age sites (∼5,980 to ∼3,980 BP). We infer low LP in all three regions, i.e., in northern Germany and South-eastern and Eastern Europe, suggesting that the surge of rs4988235 in Central and Northern Europe was unlikely caused by Steppe expansions. We estimate a selection coefficient of 0.06 and conclude that the selection was ongoing in various parts of Europe over the last 3,000 years.

Burger et al., Low Prevalence of Lactase Persistence in Bronze Age Europe Indicates Ongoing Strong Selection over the Last 3,000 Years, Current Biology, Available online 3 September 2020,

See also...

Warriors from at least two different populations fought in the Tollense Valley battle

Sunday, August 23, 2020

Fascinating stuff

Coming soon I guess:

But we have results from the Ezero culture, from Southeastern Bulgaria, which is from the early Bronze Age and which seems to connect the people of this culture with the future Hittites and Trojans. This has been confirmed by archeology many times and has been known for at least half a century. But now we see the genetic parallels between the two. Some of these ancient groups from the Bronze Age in one way or another have survived to this day in our country Bulgarians, as we also carry a certain amount of blood and genes from these same people, perhaps in the range of between 5 and 10%, which connects us with the Hittites, ancient Anatolia and the Trojans. There is a huge processing of the results before they are published, but among them there are huge curiosities from now on. One of them is from the necropolis in Merichleri from the Early Bronze Age and in another necropolis in Tsaribrod (the older of the two), these are mound necropolises from the Yamna culture in the Caucasus, of people who migrated here in Bulgaria and connected between you are. They came from the haplogroup R1a, namely Z93, which is the haplogroup again of the Scythian, but more of the Indo-Aryan tribes, the future Indo-Aryans, who later conquered India. But one of the tribes of the Yamna culture seems to have strayed and arrived in the Balkans instead of going to India. And so by chance, because archaeologists and geneticists have chosen between 260 burial mounds from this period, they have chosen only 3-4 and have come across exactly this extremely ancient group, which is from the time before the Indo-European group was divided into Iranians, Indians and Slavs, they were still one people at the time with the same genomes. And yes, one of these groups is among what we call Thracian tribes, but these are not Thracians. We have results from both the Early Iron Age and the Late Bronze Age, which are possibly Thracian, but I will keep them a secret at this stage, as I do not want to provoke speculation.

See also...

The precursor of the Trojans

Steppe invaders in the Bronze Age Balkans

Wednesday, August 19, 2020

Yamnaya-related ancestry proportions in present-day Poles

Modeling ancient ancestry proportions in present-day Europeans with the qpAdm software is now a lot more difficult. The reasons for this are updates to qpAdm as well as the availabiity of more useuful outgroups or right pops.

This isn't necessarily a bad thing, because users are forced to work harder to find successful models, which is likely to lead to some interesting discoveries. But it can be very frustrating.

I don't think that settling for poor statistical fits or using a small number of outrgoups are acceptable short cuts. Perhaps sequencing modern-day samples in exactly the same way as the ancient samples, and thus increasing the compatability between them, might help?

Limiting qpAdm runs to higher quality SNPs from transversion sites does help, but perhaps largely because of the significant reduction in markers?

In any case, I've now given up on running such analyses, at least until I see some serious pointers on the topic from Harvard's qpAdm experts. But before I put this project to bed for the time being, I'd like to share some new results for Poles from eastern and western Poland, respectively.

right pops:


left pops:

CWC_Baltic_early 0.572±0.024
SWE_TRB 0.428±0.024
chisq 11.776
tail prob 0.300296
Full output

CWC_Baltic_early 0.587±0.021
SWE_TRB 0.413±0.021
chisq 11.165
tail prob 0.34478
Full output

Even using transversion sites, this is one of the very few combinations of ancient reference samples that works for the Poles with these right pops. That is, the combination of early Corded Ware samples from the East Baltic (CWC_Baltic_early) and Funnel Beaker samples from Scandinavia (SWE_TRB). The former are obviously the proxy here for Yamnaya-related ancestry.

Adding any sort of hunter-gatherer population to this model doesn't help or even makes things worse (for instance, see here and here). It is possible to add Baltic hunter-gatherers to a similar model after dropping CWC_Baltic_early in favor of closely related samples from the Early to Middle Bronze Age Pontic-Caspian steppe. Note, however, that the statistical fits are somewhat poorer.

Baltic_LTU_Narva 0.032±0.014
PC_steppe_EMBA 0.483±0.019
SWE_TRB 0.485±0.019
chisq 17.143
tail prob 0.0465198
Full output

Baltic_LTU_Narva 0.031±0.011
PC_steppe_EMBA 0.491±0.015
SWE_TRB 0.477±0.016
chisq 22.444
tail prob 0.00757421
Full output

Interestingly, but not surprisingly, the ancestry of many present-day Northwestern European populations can be modeled in basically the same way. That's because ancient ancestry proportions are more closely correlated with latitude than longitude across much of the European continent.

CWC_Baltic_early 0.527±0.024
SWE_TRB 0.473±0.024
chisq 13.042
tail prob 0.221357
Full output

CWC_Baltic_early 0.586±0.023
SWE_TRB 0.414±0.023
chisq 16.517
tail prob 0.085751
Full output

CWC_Baltic_early 0.583±0.021
SWE_TRB 0.417±0.021
chisq 12.144
tail prob 0.275536
Full output

A zip file with the qpAdm output from this analysis and a list of the most relevant ancients is available here. I might try to run a few more populations over the next few days, but probably only from the northern half of Europe, so please check the zip file in a week or so to see what else is in there.

If anyone wants to challenge my results, note that these and very similar samples are freely available to the public via Harvard University here and here.

Update 22/08/2020: From Nick Patterson (Broad) in the comments:
My general advice for qpAdm is 1) Work on the right hand set. Don't include irrelevant population (except for one population as an outgroup); picking the best RHS can dramatically reduce s. errors on the admixture weights. 2) If qpAdm gives a very low p-value try and understand why, sometimes it is telling you that the target is not a mixture of the sources but sometimes the assumptions are violated, for example recent gene-flow from left pops -> right.

See also...

Ancient ancestry proportions in present-day Europeans

Tuesday, August 18, 2020

Housekeeping stuff

I'm about to phase out the use of the Global25 datasheets with modern-day samples. In large part, this move is due to the uncertainty about the populations that these individuals represent and the resulting (often idiotic) discussions here and elsewhere about their usefulness.

This uncertainly exists because many, perhaps most, of these people are classified based on their self identity, which may or may not reflect their genetic origins.

Thus, I'll no longer be updating these datasheets and, from next week, I'll also stop linking to them at this blog (like here). The links will remain live for the next few months, so that users can adjust to the change.

However, modern-day samples sequenced from archeological remains, and thus, as a rule, painstakingly classified by experts based on their burial contexts and genetic characteristics, will continue to be featured in the Global25 datasheets.

In other words, as far as the Global25 is concerned, all of the modern-day samples from the living are out, but all of the modern-day samples from the dead will remain, and indeed I'll be adding more of the latter as they become available.

I'm planning to eventually create several sets of Global25 datasheets based on individuals and populations from different periods, including the modern era. But I'll probably need some help with that.

Also, please note that comment moderation will now be the rule here rather than the exception. And I'll be cracking down hard on trolling, insults and any sort of potentially defamatory material, so no more crazy stuff, or else.

See also...

New rules for comments

Friday, August 14, 2020

Awesome new toys from Vahaduo

Vahaduo now offers a 3D PCA experience. Check it out HERE and HERE. Below are a couple of screen caps of me messing around with the new tools.

Vahaduo says:

Hi everyone!

New tool - PCA 3D Viewer.

Global 25 version:

West Eurasia version:


Dots - ancients, circles - moderns.

Click X, Y, Z or COLOR tab and then click one of the PCx buttons to switch dimensions.

Click already active X, Y, Z or COLOR tab to temporarily reverse selected dimension. It will be restored to a default state when any of the dimensions will be switched to another one.

ADD CUSTOM POINTS - self-explanatory. Points will be added as "+". IMPORTANT - G25 version takes NON-SCALED coordinates. This will be true for any new tool dedicated to G25 and coordinates will be scaled automatically when needed or desired.

"Type parts of names." + TAG button - type parts of names to tag certain samples (try for example "KK1 Afon Pinar"). Search is Case Sensitive. Points will be redrawn as "x".

Next row - Labels and Annotations. Click the right button to cycle trough:


CLICK - click to add/delete labels/annotations.
AUTO - same as CLICK plus labels/annotations will be automatically added to newly plotted or tagged samples. Unfortunately adding new labels and annotations becomes very slow when there is too much of them, so there is a limit for the AUTO setting - 250 labels or 20 annotations at once.

Annotations are editable. They can be dragged to another place and text can be changed. Text can be also wrapped into an HTML SPAN element and some styles can be used, like "font-size" or "color". BR element (new line) works too.

HIGHLIGHT CLICK/OFF/HOVER - highlight all samples that belong to a single population. Set to HOVER to ignore clicks. Set to OFF to disable this feature and to remove highlight triggered by hover (highlights triggered by clicks will stay until they will be cleared or removed by a click). Click white dot to cycle trough available highlight colors.

Plotly buttons:

Default download is set to 1600x1200px PNG.

Custom Plotly buttons:
"Toggle projection: orthographic / perspective" - self-explanatory.
"Toggle background color" - cycles trough dark grey, black, white and light grey. Text color and white highlight will be switched to black when background will be set to white or light grey.
"Toggle color scheme" - cycles trough several gradients.
"Reverse color scheme" - reverses all gradients permanently.
"Download plot as png (custom size)" - default size is the size of the currently displayed plot.

See also...

New Global25 interpretation tools

Tuesday, August 11, 2020

Villabruna people existed in Europe at least 17,000 years ago (Bortolini et al. 2020 preprint)

Over at bioRxiv at this LINK. So, like I said here a few years back, there was no migration into Europe from the Near East ~14,00 years ago. I don't think there was even such a migration ~17,000 years ago. My view is that the so called Villabruna cluster formed somewhere in Europe at least 20,000 years ago. Below is the Bortolini et al. abstract, emphasis is mine:

The end of the Last Glacial Maximum (LGM) in Europe (~16.5 ka ago) set in motion major changes in human culture and population structure. In Southern Europe, Early Epigravettian material culture was replaced by Late Epigravettian art and technology about 18-17 ka ago at the beginning of southern Alpine deglaciation, although available genetic evidence from individuals who lived ~14 ka ago opened up questions on the impact of migrations on this cultural transition only after that date. Here we generate new genomic data from a human mandible uncovered at the Late Epigravettian site of Riparo Tagliente (Veneto, Italy), that we directly dated to 16,980-16,510 cal BP (2σ). This individual, affected by a low-prevalence dental pathology named focal osseous dysplasia, attests that the very emergence of Late Epigravettian material culture in Italy was already associated with migration and genetic replacement of the Gravettian-related ancestry. In doing so, we push back by at least 3,000 years the date of the diffusion in Southern Europe of a genetic component linked to Balkan/Anatolian refugia, previously believed to have spread during the later Bolling/Allerod warming event (~14 ka ago). Our results suggest that demic diffusion from a genetically diverse population may have substantially contributed to cultural changes in LGM and post-LGM Southern Europe, independently from abrupt shifts to warmer and more favourable conditions.

Bortolini et al., Early Alpine human occupation backdates westward human migration in Late Glacial Europe, bioRxiv, posted August 10, 2020, doi:

See also...

Villabruna cluster =/= Near Eastern migrants

Monday, July 27, 2020

Ancient ancestry proportions in present-day Europeans (to be continued)

This year has already been massive in all sorts of ways, including for new data and software releases. So I'm thinking it might be time to update many of the analyses that were featured at this blog a while ago.

Let's start with the classic hunter vs farmer vs herder mixture model for present-day European populations. The rules of the game are as follows:

- run the latest version of qpAdm using qpfstats output

- use transversion sites and 1240K capture data

- pick a set of diverse and chronologically sound outgroups

- for a model to be successful the p-value must reach 0.01

- tweak the left pops in models that are clearly underperforming

- follow high end scientific literature, logic and common sense

Obviously, the reason that I decided to limit my analysis to markers from transversion sites is to mitigate problems associated with modeling the ancestry of modern, high quality samples with relatively low quality ancients. One of these problems appears to be qpAdm assigning faux East Asian/Siberian admixture to present-day Europeans (for instance, see figure 4 here).

My starting reference populations and outgroups are listed below. In qpAdm terminology the former are known as the "left pops", while the latter as the "right pops". Most of these samples are freely available at the David Reich Lab website here.

left pops:

right pops:

As you can see, I picked a wide variety of right pops. But I chose most of them specifically to be able to differentiate the three streams of ancestry - from ancient hunters, farmers and herders - that are the focus of my analysis. I also intentionally avoided using samples in the right pops that may have experienced gene flow, including cryptic gene flow, from the populations in the left pops.

I somewhat speculatively earmarked HUN_Koros_N_HG, from the Early Neolithic Carpathian Basin, and UKR_Yamnaya, from the Early Bronze Age North Pontic steppe in what is now Ukraine, to represent the hunter-gatherer and pastoralist streams of ancestry, respectively.

That's because I expected HUN_Koros_N_HG to be the best proxy for the hunter-gatherer ancestry that was initially absorbed by the early farmers who fanned out from the Aegean region across much of the European continent, and of course it made sense to choose a steppe pastoralist population that was located close to Central Europe where such groups first made the biggest impact outside of the steppe.

Interestingly, HUN_Koros_N_HG and UKR_Yamnaya did prove to be among most effective choices for the types of ancestries that they represented. For instance, UKR_Yamnaya generally produced much stronger statistical fits than a very similar set of Yamnaya samples from the Caspian steppe (more precisely, from the Samara region in Russia). However, this might well be an artifact, due to very specific characteristics of these few ancient individuals. Larger sample sets would be welcome, especially from Yamnaya sites in Ukraine.

Below, dear audience, is a spreadsheet featuring the preliminary results. Click on the image to view and/or download the spreadsheet. The general rule is that the higher the tail prob, or p-value, the more likely it is that the ancestry proportions are close to the truth (a tail prob of well below 0.05 is usually a strong indication that something isn't right). For a detailed look at each of the qpAdm runs, feel free to consult the zip file here.

Note, however, that many of the European groups in my burgeoning genotype dataset are yet to make an appearance in the spreadsheet. That's because their models with the standard left pops showed p-values well under 0.01, which essentially meant that they failed, and I'm still trying to make them work.

But round one has certainly revealed some fascinating stuff. For instance, except for Hungarians and Estonians, none of the Uralic-speaking groups can be modeled successfully in the standard three-way model.

However, I managed to significantly improve the statistical fits in their models by adding a Siberian population, RUS_Baikal_BA, to the left pops. This is unlikely to be a coincidence, because the Proto-Uralic homeland was almost certainly located in or very near Siberia. Iain Mathieson please take note.

HUN_Koros_N_HG 0.134±0.043
RUS_Baikal_BA 0.270±0.015
TUR_Barcin_N 0.081±0.026
UKR_Yamnaya 0.515±0.058
chisq 19.865
tail prob 0.0108571

See also...