search this blog


Sunday, March 29, 2015

European foragers were almost wiped out by the ice age

I think what this article is really saying is that the effective population size of Europeans might have dropped to as little as 30 after the LGM peak. If so, that's pretty close to a genetic precipice for most animals. In any case, it looks like there are more hunter-gatherer genomes on the way, including from Denmark and Switzerland, courtesy of Ron Pinhasi's team, which brought us the ancient Hungarian genomes last year (see here).

‘As an archaeologist and anthropologist, I was quite shocked to see how limited, how small the population numbers were. You know, shockingly small,’ said Prof. Pinhasi, based at University College Dublin, Ireland.

‘I think that what happened, it’s on a catastrophic level of demography for a long time in human evolution,’ he said.

The impacts of this are significant for understanding the origins of many Europeans today, as it is forcing researchers to reconsider models of human expansion and colonisation of the continent, as well as our genetic ancestry.

By analysing the genomes of human remains, the researchers are able to gather demographic data and clues to potential population sizes.

Prof. Pinhasi’s team has found that the genomes sequenced from hunter-gatherers from Hungary and Switzerland between 14 000 to 7500 years ago are very close to specimens from Denmark or Sweden from the same period.

These findings suggest that genetic diversity between inhabitants of most of western and central Europe after the ice age was very limited, indicating a major demographic bottleneck triggered by human isolation and extinction during the ice age.

‘We’re starting to be able to reconstruct the actual dynamics of migrations and colonisation of the continent by modern humans and that’s never been done before the genomic era,’ explained Prof. Pinhasi.

He believes that early humans crossed the continent in small groups that were cut off while the ice was at its peak, then successively dispersed and regrouped over thousands of years, with dwindling northern populations invigorated by humans arriving from the south, where the climate was better.

Source: Francesca Jenner, Ice-age Europeans roamed in small bands of fewer than 30, on brink of extinction, 26 March 2015, Horizon Magazine

Saturday, March 28, 2015

Population genetics of Copper and Bronze Age inhabitants of the Eastern European steppe

I'm hoping like hell that the samples from this thesis eventually get the same treatment as those from Haak et al. 2015.

Summary: This dissertation presents the first genetic study of prehistoric populations in the Pontic-Caspian steppe from the Upper Thracian Plain to the Volga. Hypervariable region I (HVR I) and 30 short sections of the coding region containing 32 clade- determining polymorphisms on the mitochondrial DNA, as well as 20 putatively naturally selected autosomal SNPs and a sex-determining locus were analysed using a combination of multiplex PCR and 454 sequencing. Data analysis was performed on the HVR I of 65 of the 180 Eneolithic and Bronze Age samples. (Partial) genotypes were generated from 61 individuals. Published ancient DNA data from Central and Eastern Europe and Central Asia, as well as modern DNA sequences were consulted for comparison.

The genetic data support the inference that early Neolithic farmers from Southeast Europe were involved in establishing pastoralism in the steppes by demic diffusion. The consistently low values of the FST-statistic (the range includes zero) between the Yamnaya Culture of the steppe and a succession of Neolithic cultures in Central Europe indicate continuous or recurrent contacts between the two regions. Between the Yamnaya Culture and its successor, the Catacomb Culture, the incidence of haplogroup U4, which is at high frequency in hunter-gatherer populations of Neolithic Scandinavia and Mesolithic Northwest Russia, rises from approximately 5 % to above 30 %. It is possible that immigrants from Eastern Baltic hunter-gatherer refugia were involved in the genesis of the Catacomb Culture.

The low FST values between the prehistoric steppe populations and the modern populations of Central and Eastern Europe indicate genetic continuity. This is supported by the nuclear genotype frequencies. According to current knowledge the modern European gene pool can be explained by three roots: indigenous Mesolithic hunter-gatherers, early farmers from the Near East, and an ancient North Eurasian component with an Upper Palaeolithic origin. Maybe the third ancestry component was introduced into the late Neolithic European genome by the North Pontic population.

Source: Wilde, Sandra, Populationsgenetik kupfer- und bronzezeitlicher Bevölkerungen der osteuropäischen Steppe, 2014, Dissertation

Tuesday, March 24, 2015

Live reports from AAPA 2015

Chad Rohlfsen is heading off to St. Louis tomorrow for the annual American Association of Physical Anthropologists (AAPA) conference, and will be posting updates from the big event in the comments below. Most of you will know Chad from the comments section on this blog. He's yet to finalize his program, but I know he'll be at this talk on the population history of the Aegean.

The origins of the Aegean palatial civilizations from a population genetic perspective

MARTINA UNTERLÄNDER1,2, SUSANNE KREUTZER2 and CHRISTINA PAPAGEORGOPOULOU1. 1 Department of History and Ethnology, Demokritus University of Thrace, 2 Palaeogenetics Group, Institute of Anthropology, Johannes Gutenberg-University of Mainz.

The present paper investigates the origins of the Aegean pre-palatial civilizations (5th-3rd millennium BC) by applying cutting-edge methods of molecular biology and population genetics. The term Aegean Civilizations refers to the novel human lifeway (agriculture and craft specialization, redistribution systems, intensive trade) that appeared during the end of the Neolithic and the beginning of the Bronze Age in the Aegean. Although many studies exist on archaeological constructions of ethnic and cultural identity on mainland Greece, the Cyclades and Crete, not enough efforts have been made to explore this direction on a population history basis. We have investigated Late, Final Neolithic and Early Bronze Age human skeletons (n=127) from the Aegean using ancient DNA methods, next generation sequencing (NGS) technology and statistical population genetic inferences to i) gather information on diversity, population size, and origin of the pre-palatial Aegean Cultures, ii) to compare them on a genetic basis, in terms of their cultural division (Helladic, Cycladic, Minoan) and iii) to investigate their ancestral/non-ancestral status to the Early and Middle Neolithic farmers from Greece. In addition to mitochondrial DNA genomes, by applying a capture-NGS approach we collected information on functional traits of the early Aegean communities in southeastern Europe. Considering the International Spirit that overwhelms the Aegean during the 3rd millennium BC, seen by the wide distribution of artifacts, this palaeogenetic approach provides valuable new insights on population structure of the groups involved in the Neolithic-Bronze Age transition and the spread of specific alleles in this part of Europe.

Feel free to help Chad plan the rest of his itinerary. The AAPA 2015 website is here. You can download a PDF book with all of the abstracts here.

By the way, Chad is paying for the trip himself. If anyone wants to help him cover the costs, please send contributions via PayPal to c_rohlfsen [at] hotmail [dot] com.

Saturday, March 21, 2015

Mitogenomes reveal post-Neolithic gene flow from the Near East to Tuscany

Europeans probably received their Ancient North Eurasian (ANE) admixture from at least a couple of different sources. Most of it no doubt came from the Eurasian steppe during the late Neolithic/early Bronze Age, very likely with the early Indo-Europeans. But I'd say that a significant amount of the ANE in southern Europe arrived there from the Near East during and after the late Bronze with a wide variety of groups, possibly including the proto-Etruscans. Here's a new paper from PLoS One focusing on Tuscan mitogenomes that adds weight to my argument.

Background: Genetic analyses have recently been carried out on present-day Tuscans (Central Italy) in order to investigate their presumable recent Near East ancestry in connection with the longstanding debate on the origins of the Etruscan civilization. We retrieved mitogenomes and genome-wide SNP data from 110 Tuscans analyzed within the context of The 1000 Genome Project. For phylogeographic and evolutionary analysis we made use of a large worldwide database of entire mitogenomes (>26,000) and partial control region sequences (>180,000).

Results: Different analyses reveal the presence of typical Near East haplotypes in Tuscans representing isolated members of various mtDNA phylogenetic branches. As a whole, the Near East component in Tuscan mitogenomes can be estimated at about 8%; a proportion that is comparable to previous estimates but significantly lower than admixture estimates obtained from autosomal SNP data (21%). Phylogeographic and evolutionary inter-population comparisons indicate that the main signal of Near Eastern Tuscan mitogenomes comes from Iran.

Conclusions: Mitogenomes of recent Near East origin in present-day Tuscans do not show local or regional variation. This points to a demographic scenario that is compatible with a recent arrival of Near Easterners to this region in Italy with no founder events or bottlenecks.


Gómez-Carballa A, Pardo-Seco J, Amigo J, Martinón-Torres F, Salas A (2015) Mitogenomes from The 1000 Genome Project Reveal New Near Eastern Features in Present-Day Tuscans. PLoS ONE 10(3): e0119242. doi:10.1371/journal.pone.0119242

Sunday, March 15, 2015

Modeling Yamnaya with qpAdm

I've been playing around with the new qpAdm program and the Haak et al. dataset over the past few days and managed to come up with what I think are some very promising results. For instance, the Yamnaya genomes from the Samara Valley and surrounds fit rather well as 0.514 Samara hunter-gatherer + 0.486 Georgian (std. errors 0.032, chisq 3.890).

This is an interesting outcome, mainly because Georgian is a Kartvelian language, and linguistics data suggest that the early Indo-Europeans - presumably the Yamnaya nomads or their ancestors - were in close contact with Proto-Kartvelian speakers. Moreover, even though the Yamnaya males tested to date all belong to Y-chromosome haplogroup R1b, which they probably inherited from their hunter-gatherer ancestors, because the Samara forager also belonged to this haplogroup, some of their mtDNA lineages appear to be derived from the Caucasus and/or nearby areas of the Near East.

However, the main problem with this analysis is that it's attempting to model an ancient population as a mixture of a modern one. Indeed, my estimate is that present-day Georgians harbor around 20% of the so called Ancient North Eurasian (ANE) component, which probably arrived in the Caucasus from the Eurasian steppe (see here). If so, then the qpAdm run might be overestimating the non-steppe admixture in the Yamnaya genomes by at least 10%. Nevertheless, I'm quite happy with this result as I await ancient DNA from the Caucasus and Near East.

By the way, I also pretty much nailed the Corded Ware sample: 0.73 Yamnaya + 0.27 Esperstedt_MN (std. errors 0.060, chisq 2.621). Admittedly, an identical result for the same genomes was reported months ago at the ASHG 2014 conference (see here), but that's OK, because it means I'm on the right track.

qpAdm is easy to run, but the quality of its output heavily reliant on the outgroup or "right set" of populations picked by the user. As far as I can see, the following ten populations (a subset of the "magic set" of 15 from Haak et al.) produce the most robust outcomes when analyses are limited to West Eurasian groups.


Why do they work so well? I really have no idea, but through simple trial and error I found that some of the others from the "magic set", in particular the Ami, produced much poorer results.

I'll probably end up posting a whole catalog of qpAdm output in the comments section below over the next couple of weeks. I'm open to suggestions about the models to test and how to improve my runs.


Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317

See also...

qpAdm tour of Iran

Yamnaya's exotic ancestry: The Kartvelian connection

Eight thousand years of natural selection in Europe

Update 11/10/2015: Eight thousand years of natural selection in Europe - take 2


A new preprint at bioRxiv reports on the first genome-wide scan for selection using ancient DNA, with a couple of unexpected outcomes:

The SNP (rs4988235) responsible for lactase persistence in Europe gives the strongest signal in our analysis. We estimated the selection coefficient on the derived allele to be 0.015 (95% confidence interval; CI=0.010-0.034) using a method that fits a hidden Markov model to the population allele frequencies as they change over time. Our data strengthens previous reports of the late appearance of lactase persistence in Europe, with the earliest appearance of the allele in a central European Bell Beaker sample (individual I0112) who lived approximately 4,300 years ago. We detect no evidence of lactase persistence in Early Neolithic farming populations like the Linearbandkeramik (LBK), or in the steppe pastoralist Yamnaya, despite their use of domesticated cattle (Figure 2).

We find a surprise in seven Scandinavian hunter-gatherers from the Motala site in southern Sweden who lived around 7,700 years before present. While the western hunter-gatherers of central and southern Europe largely have the ancestral allele at the two major European skin pigmentation loci, the closely related Scandinavian hunter-gatherers have both the derived alleles contributing to light skin pigmentation at high frequency (Figure 2B). Thus, the derived allele of SLC24A5 was common in both the Scandinavian hunter-gatherers and Early European farmers, but not in the geographically intermediate western hunter-gatherers. Further, in four out of seven Motala samples, we observe the derived allele of rs3827760 in the EDAR gene, which has effects on tooth morphology and hair thickness. This allele has been the subject of a selective sweep in East Asia, and today it is at high frequency in East Asians and Native Americans.


The derived allele in the Motala samples lies on the same haplotype as in modern East Asians (Extended Data Figure 4) implying a shared origin. The statistic f4(Yoruba, Scandinavian hunter-gatherers, Han, Onge Andaman Islanders) is significantly negative (Z=-3.9) implying gene flow between the ancestors of Scandinavian hunter-gatherers and Han so this shared haplotype is likely the result of ancient gene flow between groups ancestral to these two populations.

The high frequency of the East Asian-specific EDAR allele among the Motala foragers is even more surprising for me than their inferred light skin. But it does at least gel with the earlier finding that Scandinavian hunter-gatherers did not contribute significant ancestry to modern Europeans (see here).


Mathieson et al., Eight thousand years of natural selection in Europe, bioRxiv preprint first posted online March 14, 2015; doi:

Friday, March 13, 2015

Yamnaya-related ancestry proportions in Europe and west Asia

Here's a quick and dirty attempt to flush out a Yamnaya-specific ancestral component with the ADMIXTURE software and a few Yamnaya genomes from the recent Haak et al. paper: K6 spreadsheet.

Obviously, we'll need many more ancient samples from the vast Yamnaya horizon to be able to estimate direct Yamnaya ancestry in modern populations with any great confidence. But I'd say this looks like a very reasonable attempt, with more or less comparable results to those published by Haak et al. (for instance, see Figure 3 from the study here).

Please note that this wasn't a supervised run. In other words, I didn't mark the Yamnaya genomes as reference samples with the aim of creating a cluster from them.

However, I initially excluded all individuals from northeastern Europe, the north Caucasus and South Asia from the analysis. The reason I did this was because samples from these regions have a peculiar habit of creating very robust clusters in ADMIXTURE, which is useful when looking at recent variation and wanting low cross validation errors, but not so great when trying to resurrect genetic components from the depths of prehistory.

Once I had a dataset that was forcing the algorithm to focus its attention on the ancient genomes and producing consistent results, I tested the problem samples in batches of 5-10, thus making sure they didn't skew the analysis.

Interestingly, the Yamnaya-specific component peaks in Udmurts, who live close to where the Yamnaya samples were collected. This can hardly be a coincidence.

In any case, I'm hoping to look at this issue in more detail soon with the help of qpAdm, a new program released recently with the updated ADMIXTOOLS package (see here). Based on f4 statistics, qpAdm is specifically designed for analyzing ancient admixture events.


Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317

Wednesday, March 11, 2015

New release of ADMIXTOOLS with two additional programs

ADMIXTOOLS 3.0 is now available at github via the Reich Lab site. The updated package includes minor bugfixes and improvements and two new programs: qpWave and qpAdm for studying migration and admixture. Scroll down the page at the link below.

Reich Lab software

Documentation is minimal, but I'm told that users of the old ADMIXTOOLS should be able to get things running. I haven't had a chance to check it out yet, but I'm looking forward to trying qpWave and qpAdm, hopefully this weekend.

Update 16/03/2015: Modeling Yamnaya with qpAdm

Friday, March 6, 2015

Bell Beaker, Corded Ware, EHG and Yamnaya genomes in the fateful triangle

The Principal Component Analyses (PCA) below are based on my K8 model (aka. fateful triangle) and ancient genomes from Haak et al. 2015. A spreadsheet with the K8 ancestry proportions for the ancient samples is available here. Some of the results are a bit noisy, and there are good reasons for that (like low coverage calls), but overall I think they look quite solid.

Note the total lack of the Ancient North Eurasian (ANE) component in the Middle Neolithic (MN) genomes, and its sudden appearance at levels of around 24% in the Late Neolithic (LN) Corded Ware genomes. Keep in mind that these samples are from the same region of Germany and only separated by a couple thousand years at most. Clearly, what we're seeing here is a major migration to Central Europe from the east.

So how do I know that these K8 ancestry proportions are correct? Because when I analyze several of the highest quality genomes with very different methodology, like genotype-based PCA, I get basically the same outcomes.

Bell_Beaker I0112 PCA

Corded_Ware I0103 PCA

Karelia_HG I0061 PCA

Yamnaya I0231 PCA

Yamnaya I0443 PCA

However, I have to admit that I'm now more befuddled than ever as to why anyone would want to model the Yamnaya as 50/50 Karelia_HG/present-day Armenian. I do realize that Haak et al. showed this to be a solid statistical fit, but I just don't see it as a very practical solution considering the surprisingly high ANE and low Near Eastern ancestry in the Yamnaya genomes.

I suspect that inflated Basal Eurasian admixture in the Yamnaya and/or East Eurasian, perhaps ancient Arctic, admixture in Karelia_HG might have skewed the Haak et al. model.


Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317

See also...

4mix: four-way mixture modeling in R

Modeling Yamnaya with qpAdm

K8 results for selected Allentoft et al. genomes

Tuesday, March 3, 2015

First look at Bell Beaker, Corded Ware and Yamnaya genomes

It's usually not a good idea to try and force people who've been dead for thousands of years into analyses based on modern genetic variation. However, that's what I've done here by running 20 of what I consider the most interesting samples from the freshly published Haak et al. 2015 paper with the Eurogenes K15 and 4A Oracle.

K15 ancestry proportions + other data

K15 4A Oracle results

My experience is that the K15 is an excellent tool for exploring ancient genomes, and I think it's done a great job here. Below are a few of my observations based on the output:

- the best two-way mixture model for the Yamnaya genomes, from the Samara region near the Russo-Kazakh border, is Samara_HG/Tabassaran, rather than Karelia_HG/Armenian as per Haak et al. (see discussion below)

- far Eastern Europeans like Volga Tatars and Finns are the most similar modern populations to these Yamnaya samples, which makes good sense considering uniparental marker data and geography (for instance, see this map posted by Richard recently in the comments)

- the unusually high Amerindian and South Asian ancestry proportions among the Yamnaya genomes are very likely the result of their extreme levels of Ancient North Eurasian (ANE) ancestry, estimated by me with the West Eurasia K8 to be around 35%

- the German Bell Beaker sample appears to be a complex mixture of populations from several different parts of Europe, including the Yamnaya horizon, so based on this data it's impossible to pinpoint the main geographic source of the Bell Beaker population expansion, if indeed there was such a source

- three out of the four German Corded Ware genomes are obviously of mixed origin, presumably between Corded Ware migrants from Eastern Europe and earlier middle Neolithic inhabitants of North-Central Europe, but still largely of Yamnaya or very similar ancestry

- Eastern European foragers Karelia_HG and Samara_HG don't show any hints of Near Eastern admixture

I'll post the K8 ancestry proportions for the same 20 ancient genomes in a couple of days. A lot of people will probably be surprised by the results of the Yamnaya samples. Not only do they show unusually high levels of ANE, but also only around 25% of Near Eastern or Early Neolithic Farmer (ENF) ancestry.

Admittedly, these results are somewhat at odds with the findings of Haak et al., who were able to fit the Yamnaya as 50/50 Karelia_HG/present-day Armenian or Iraqi Jewish. Well, this might be a statistically valid fit, but I'm simply not seeing any obvious connection between Armenians or Iraqi Jews and the Yamnaya samples.

As per above, a more sensible solution appears to be Samara_HG/Tabassaran, but based on the K8 output I'd say an even better solution would be to model the Yamnaya as a three-way mixture between Eastern European foragers, early Neolithic farmers straight from the Near East, and perhaps some sort of Central Asian population very similar to the main ANE-proxy MA-1 or Mal'ta boy. But more on that later.

Update 04/03/2015: I've also now analyzed most of the early and middle Neolithic samples from Haak et al. (see here). The results clearly suggest that a profound genetic shift took place in Germany from the middle to the late Neolithic.


Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317

See also...

Fitting the Yamnaya with qpAdm