search this blog


Friday, October 31, 2014

Genetic continuity and shifts across the metal ages in the Carpathian Basin: analysis of ancient Hungarian genomes CO1, BR1 and IR1

The recent Gamba at el. paper on the genetic prehistory of the Great Hungarian Plain was an excellent piece of paleogenomic detective work. However, I feel that the authors could have done a little better with characterizing the genetic origins of their samples.

For instance, the Principal Component Analysis (PCA) appears to suffer from subtle projection bias, which is a common problem in ancient DNA studies (see here). Also, the model-based analyses, like the ADMIXTURE run, leave me wanting a lot more.

However, all of the samples are freely available online, including in user friendly genotype format at Genetic Genealogy Tools. So I thought it might be useful to take a closer look at three of the genomes, spanning a 2,000-year period from the Copper Age to the Iron Age: CO1, BR1 and IR1.

The metal ages are a critical period of prehistory and early history in the making of modern Europe. It's a time of profound cultural changes, and as we now know, large-scale genetic shifts across the continent (see here). Indeed, the three aforementioned genomes clearly show that major genetic shifts took place on the Great Hungarian Plain from the Copper Age to the Iron Age. However, they also suggest strong genetic continuity in the region throughout this period.

CO1, the Copper Age genome from a Baden Culture burial, appears ridiculously Western European, and could easily pass for a present-day Sardinian in most analyses, even though it's most likely of Balkan and Near Eastern origin. It's very similar in that respect to another Copper Age sample, Oetzi the Iceman from the Tyrolean Alps.

One of the main reasons for this Sardinian-like genetic character is certainly its very low level of Ancient North Eurasian (ANE) admixture, probably less than five per cent. Almost everyone in West Eurasia has more these days, so they appear a lot more eastern.

Shared drift stats in the form f3(Mbuti;CO1,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;CO1,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

BR1 represents the Early Bronze Age (EBA) Mako Culture. It looks roughly like a cross between CO1 and someone from northeastern Europe with an unusually high level of hunter-gatherer ancestry, and also a fair whack of ANE. Indeed, after running a variety of tests, I'd say that BR1 has around 12% of ANE (in other words, more than Basques but less than British, which fits with its position on the West Eurasian PCA).

So as far as I can see, the most parsimonious explanation for this result is a population movement into present-day Hungary from the northeast during the EBA, perhaps associated with the early Indo-Europeans and the not-so-pleasant effects of the 4.2 kiloyear event (see here).

Interestingly, the 4A Oracle suggests that BR1 might in large part be a mixture of CO1 and KO1, which is another sample from Gamba et al., assigned to the Koros Culture of early Neolithic Balkan farmers, but with typically hunter-gatherer genetic structure. This opens up the possibility that people with unusually high levels of hunter-gatherer ancestry survived on the Great Hungarian Plain well into the metal ages, and the sampling by Gamba et al. was too patchy to find them.

However, it's not possible to get a genome like BR1 simply by mixing CO1 with KO1, because the hunter-gatherer-like sample is not eastern enough. In other words, it lacks ANE. I know this just by eyeballing a couple of PCA, featuring KO1 and Motala12, a Scandinavian sample estimated by Lazaridis et al. to have a ratio of ~19% ANE (see here and here).

So there might well have been a resurgence in local hunter-gatherer DNA on the Great Hungarian Plain, and perhaps throughout much of Central Europe, after the Neolithic. Nevertheless, in my opinion this alone cannot explain the results in this case.

Shared drift stats in the form f3(Mbuti;BR1,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;BR1,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

IR1, the Iron Age genome, is clearly mixed. In some ways, much like CO1 and BR1, it's also deceptively similar to present-day Western Europeans, which suggests that it's in large part of local origin. However, its uniparental markers (Y-haplogroup N-M231 and mitochondrial haplogroup G2a1) actually fit better in Siberia than anywhere in Europe, and its genome-wide DNA shows influences from the North Caucasus and Volga-Ural regions (refer to the 4A Oracle results below).

Because of its complex ancestry, I can't accurately estimate the level of ANE admixture in this genome. Nevertheless, the PCA and Eurogenes K15 suggest that it easily surpasses BR1 in this respect. Note, for instance, its position among the Kargopol Russians and North Ossetians on the global PCA plot, as well as its high Eastern Euro score in the Eurogenes K15.

What I think this hints at is that the present levels of ANE across Europe aren't the result of a single early Indo-European migration, but multiple population movements around the continent spanning the entire metal ages, although usually involving Indo-European groups, and the effects of isolation-by-distance.

By the way, IR1 comes from a burial site of the Mezocsat Culture, which is generally accepted to be of Cimmerian origin. The Cimmerians are usually described as a nomadic Indo-European people from the Kuban steppe, just north of the Caucasus, who were pushed west by the expanding Scythians. Apparently, they founded a variety of cultures in the Carpathian Basin and Balkans by imposing themselves as the ruling elite over the locals. It's remarkable how closely IR1's genetic structure fits this narrative.

Shared drift stats in the form f3(Mbuti;IR1,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;IR1,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

Also, here's a really cool map of Identity-by-Descent (IBD) hits of over 3 cM shared between IR1 and a wide range of present-day populations. It comes from a recent post at Vadim's blog (see here). The shared IBD peaks are found in East Central Europe and the Volga-Ural region, which makes sense.

Sunday, October 26, 2014

Hinxton ancient genomes roundup

Most visitors here are probably aware by now that the Iron Age genomes from Hinxton are the two male samples 1 and 4 (ERS389795 and ERS389798, respectively). You can find confirmation of this at the link below.

Anglo-Saxons left language, but maybe not genes to modern Britons

In regards to the main thrust of the article above, I'm not sure if there's much point discussing whether the British today are mostly of Celtic or Anglo-Saxon stock based on just five ancient genomes from a single location in England. However, if I was told that Hinxton4, the only high coverage genome in this collection, was a modern sample, I'd say it belonged to an Irishman from western Ireland, rather than an Englishman from eastern England.

Thus, unless Hinxton4 was an ancient migrant from Ireland, then it does seem to me as if there was a fairly significant admixture event in England between the indigenous Irish-like Celts and newcomers from the east, which eventually resulted in the present-day English population.

In any case, there are indeed some noticeable differences between the two sets of samples, and these can be visualized by plotting their f3 shared drift statistics on graphs.

For instance, plotting the f3-statistics of Hinxton2, which actually looks like a genome that might belong to someone straight off a boat from the Jutland Peninsula, against those of Hinxtons 1 and 4, we see that the former shares most drift with the Danes. Moreover, the Danes, Swedes and Germans, all Germanic-speakers of course, deviate strongly on both graphs from the lines of slope that run from the Erzya to the Irish. The reason they deviate from these lines is because they don't share enough drift with Hinxtons 1 and 4 compared to the other reference populations from Northwestern Europe, especially the Irish.

A similar pattern can be seen when plotting the average results of Hinxtons 1 and 4 against those of 2, 3 and 5. However, the effect isn't nearly as pronounced, possibly because Hinxtons 3 and 5 are of mixed Celtic/Germanic origin. In fact, I suspect that Hinxton1 is also mixed, and probably has some ancestry from western Scandinavia, but I'll leave that for another time.

See also...

Analysis of an ancient genome from Hinxton

Analysis of Hinxton2 - ERS389796

Analysis of Hinxton3 - ERS389797

Analysis of Hinxton4 - ERS389798

Analysis of Hinxton5 - ERS389799

Friday, October 24, 2014

Analysis of Hinxton5 - ERS389799

Hinxton5, or ERS389799, is one of five ancient English genomes stored at the Sequence Read Archive under accession number ERP003900. However, this analysis is based on the latest genotype file of Hinxton5 available at Genetic Genealogy Tools. For more information and some speculation about these genomes see my earlier blog post here.

Despite its relatively low North Sea score in the Eurogenes K15, and pronounced western shift on the Principal Component Analysis (PCA) plots, this genome appears mostly Germanic. In my opinion, the shared drift stats and also oracle results are quite convincing in this regard. If this were a modern sample it could probably pass for 3/4 north Dutch and 1/4 Irish. By the way, the Sub-Saharan admixture just looks like noise; this is, after all, a low coverage genome.

Shared drift stats in the form f3(Mbuti;Hinxton5,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;Hinxton5,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

See also...

Analysis of Hinxton2 - ERS389796

Analysis of Hinxton3 - ERS389797

Analysis of Hinxton4 - ERS389798

Hinxton ancient genomes roundup

Wednesday, October 22, 2014

Ust-Ishim belongs to K-M526

Not long ago I predicted that Ust-Ishim belonged to a basal clade of Y-chromosome haplogroup P (see here). As it turns out, the 45,000 year-old western Siberian genome belongs to K(xLT) or K-M526, which is actually pretty close to my guess. The Ust-Ishim paper was published today and is behind a paywall here, but the extensive supp info is free. Here's a map to help visualize the information.

The genome was sequenced from the fossil of a femur bone found near the village of Ust-Ishim, on the banks of the Irtysh River. This area is very close to the Urals, and almost in the middle of the former Mammoth steppe that once stretched across North Eurasia from Iberia to Alaska. Interestingly, M526 is an ancestral mutation to the markers that define Y-chromosome haplogroups N, Q and R, which today dominate North Eurasia and the Americas.

In fact, R1a and R1b are the most frequent haplogroups in Europe. It's therefore plausible that most European males derive their paternal ancestry from North Eurasian hunter-gatherers whose ancestors spread out across Eurasia from the Middle East over 45,000 years ago.

I know that a lot of people have been arguing recently that K-M526 and the derived P-M45 originated and diversified in Southeast Asia, and then migrated north well within the last 45,000 years (for instance, see here). However, considering that K-M526 was already in reindeer country 45,000 years ago, as well as the Denisovan (ancient Siberian hominin) admixture among Southeast Asians, that might well turn out to be the equivalent of arguing that up is down and down is up.

By the way, Ust-Ishim also belongs to pan-Eurasian mitochondrial (mtDNA) haplogroup R*, and in terms of genome-wide genetic structure appears roughly intermediate between West and East Eurasians. These outcomes fit very nicely with its Y-haplogroup.

However, it's slightly closer to Mesolithic Iberian genome La Brana-1, Upper Paleolithic Siberian MA-1 (or Mal'ta boy), and present-day East Asians, than to present-day West Eurasians, including Europeans. That's because it lacks "ancestry from a population that did not participate in the initial dispersals of modern humans into Europe and Asia". This is obviously the so called Basal Eurasian admixture discussed in Lazaridis et al. (see here), which is probably associated with early Neolithic farmers.

Also worth mentioning is that Ust-Ishim harbors longer stretches of Neanderthal chromosomal segments than present-day Eurasians, which suggests that admixture between modern humans and Neanderthals took place in the Middle East not long before the ancestors of Ust-Ishim moved into Siberia (50-60,000 years ago). But this was already covered months ago, and you'll find lots of links on the topic on Google.


Qiaomei Fu et al., Genome sequence of a 45,000-year-old modern human from western Siberia, Nature 514, 445–449 (23 October 2014) doi:10.1038/nature13810

Tuesday, October 21, 2014

Ancient genomes from the Great Hungarian Plain

This open access paper on the genetic prehistory of the Great Hungarian Plain is full of surprises. Here are a few of my observations:

- Four of the genomes from a Neolithic farming context produced two Y-haplogroups previously identified in Mesolithic European hunter-gatherers (I2a and C6), and one of the samples (KO1) could probably pass for a Mesolithic hunter-gatherer overall, suggesting that males of hunter-gatherer origin played a major role in early European Neolithic societies. But what's happened to the C6 since then?

- The two Bronze Age genomes, BR1 and BR2, look very present-day French, and probably western French at that, in both the Principal Component and Admixture analyses. Indeed, they clearly show a northern influence relative to all of the Neolithic farmers and the Iron Age IR1. And yet, BR2 belongs to Y-haplogroup J2a1, which is generally seen as a Near Eastern marker.

- IR1 is described as a pre-Scythian genome with both East Eurasian and North Caucasian affinities (it's not clear in the paper whether it belongs to Y-haplogroup N and mtDNA G2a1, or vice versa, although either way works in this context). However, it also shows significant Northern European-like ancestry, and is even inferred to have fair hair, which makes me think that its eastern shift might be in large part due to Eastern Hunter-Gatherer (EHG) or Yamnaya-related admixture, which is now pervasive across Northern Europe (see here).

- Many people, including myself nowadays, see the Carpathian Basin as potentially a major staging point for the expansion of Y-chromosome haplogroup R1b into Central and Western Europe during the Bronze Age. And yet, it's again missing from the line-up.

- The T allele at SNP rs4988235, associated with lactase persistence into adulthood in Europeans, is only present among the two most recent genomes: BR2 and IR1. This suggests that selection for this allele, which now reaches frequencies of well over 50% in much of Europe, post dates not only the Neolithic but also the early Indo-European period, and was possibly most intense during the metal ages.

- Some of the Neolithic samples are clearly shifted towards the Bedouins (Bed) in Figure 2, relative to Oetzi the Iceman, a Copper Age genome from the Tyrolean Alps, which is generally considered to be typical of European Neolithic farmers (see below). So perhaps further sampling of Neolithic remains from southern Europe, in particular the southern Balkans, might reveal early farmers who actually cluster with Near Eastern populations, rather than Europeans?

- The authors found a sweetspot for extracting ancient DNA from humans: "the petrous portion of the temporal bone, the densest bone in the mammalian body". The amount of endogenous DNA salvaged from this part of the skull exceeds those from other bones by up to 183-fold. This is obviously great news, and probably means we can expect many more ancient genomes to be published in the near future.


Gamba, C. et al. Genome flux and stasis in a five millennium transect of European prehistory. Nat. Commun. 5:5257 doi:10.1038/ncomms6257 (2014).

See also...

First I1-M253 from prehistoric Europe

Genetic continuity and shifts across the metal ages in the Carpathian Basin: analysis of ancient Hungarian genomes CO1, BR1 and IR1

Monday, October 20, 2014

PIE homeland update: paleogenomics supports the steppe hypothesis

Several people tweeted from Iosif Lazaridis' talk at the ASHG earlier today, which focused on ancient DNA from 65 Neolithic and Bronze Age Europeans. Here are a couple of the tweets that caught my eye:
There was an influx from north Eurasian steppe into Europe after advent of farming. Consistent w linguistic evidence.Link

Admixture shows multiway admixture among late Neolithic ancient samples. Yamnaya good source as 3rd ancestral reference.Link

So it seems that latest paleogenomics data support the linguists and archeologists who see the Proto-Indo-European (PIE) homeland on the Eastern European steppe. For some background on that, check out the videos here.

Razib also tweeted a few times from the talk, and as far as I can tell, his main point was that the Yamnaya samples showed affinity to the Ancient North Eurasian (ANE) proxy Mal'ta boy, but were also partly of Near Eastern origin, and indeed could be modeled as a 50/50 mixture between present-day Armenians and ancient Karelian hunter-gatherers. He also said that the ancient Karelians were classified as eastern hunter-gatherers (let's call them EHG for now), along with the hunter-gatherers from the Samara Valley, which probably means they carried a lot of ANE admixture.

Moreover, he added that Corded Ware genomes from late Neolithic Germany were estimated at 75% Yamnaya, while another source from the talk revealed to me that they carried a surprisingly "large chunk" of EHG.

All of this makes sense, considering that during the Neolithic much of present-day Ukraine west of the Dnieper was home to the Cucuteni-Trypillian farmers, probably of Near Eastern origin, while at the same time large groups of indigenous hunter-gatherers still foraged east of the Dnieper. Based on archeological data, it seems these two groups mixed at some point, becoming mobile pastoralists associated with the Yamnaya culture, and then expanded in all directions during the late Neolithic/early Bronze Age, potentially spreading Indo-European culture and languages as they went.

The Cucuteni-Trypillian farmers might well have been very similar to present-day Armenians, although probably without the 10-15% of ANE carried by them, which likely arrived in eastern Anatolia with the early Indo-Europeans from the steppe.

By the way, it's possible that the Karelian hunter-gatherers are the same samples as those featured in Der Sarkissian et al. 2013., where they were reported to carry mitochondrial (mtDNA) haplogroups C1 (3 instances), U2e (x2), U4 (x2), U5a and H.

Here's a spatial map from that study showing genetic distances between the ancient Karelian mtDNA and that of modern populations.

Der Sarkissian C, Balanovsky O, Brandt G, Khartanovich V, Buzhilova A, et al. (2013) Ancient DNA Reveals Prehistoric Gene-Flow from Siberia in the Complex Human Population History of North East Europe. PLoS Genet 9(2): e1003296. doi:10.1371/journal.pgen.1003296

See also...

Corded Ware Culture linked to the spread of ANE across Europe

Guessing game

Coming soon: genome-wide data from more than forty 3-9K year-old humans from the ancient Russian steppe

Analysis of Hinxton4 - ERS389798

Hinxton4, or ERS389798, is one of five ancient English genomes stored at the Sequence Read Archive under accession number ERP003900. However, this analysis is based on the latest genotype file of Hinxton4 available at Genetic Genealogy Tools. For more information and some speculation about these genomes see my earlier blog post here.

I still don't know who these samples represent exactly, but in all likelihood, this is one of the two Iron Age sequences from the collection, and probably belongs to a Briton of Celtic stock. Note, for instance, its high affinity to the present-day Irish, relatively low North Sea score in the Eurogenes K15, and pronounced western shift on the second Principal Component Analysis (PCA) plot below.

Interestingly, Lithuanians top its shared drift list based on the Human Origins dataset and more than 360K SNPs. I'm not entirely sure what this means, but it's probably related in some way to the unusually high level (>45%) of indigenous European hunter-gatherer ancestry carried by Lithuanians.

Shared drift stats in the form f3(Mbuti;Hinxton4,Test) - Eurogenes dataset

Shared drift stats in the form f3(Mbuti;Hinxton4,Test) - Human Origins dataset

Eurogenes K15 4 Ancestors Oracle results

See also...

Analysis of Hinxton2 - ERS389796

Analysis of Hinxton3 - ERS389797

Analysis of Hinxton5 - ERS389799

Hinxton ancient genomes roundup