search this blog

Sunday, March 15, 2015

Eight thousand years of natural selection in Europe


Update 11/10/2015: Eight thousand years of natural selection in Europe - take 2

...

A new preprint at bioRxiv reports on the first genome-wide scan for selection using ancient DNA, with a couple of unexpected outcomes:

The SNP (rs4988235) responsible for lactase persistence in Europe gives the strongest signal in our analysis. We estimated the selection coefficient on the derived allele to be 0.015 (95% confidence interval; CI=0.010-0.034) using a method that fits a hidden Markov model to the population allele frequencies as they change over time. Our data strengthens previous reports of the late appearance of lactase persistence in Europe, with the earliest appearance of the allele in a central European Bell Beaker sample (individual I0112) who lived approximately 4,300 years ago. We detect no evidence of lactase persistence in Early Neolithic farming populations like the Linearbandkeramik (LBK), or in the steppe pastoralist Yamnaya, despite their use of domesticated cattle (Figure 2).
...

We find a surprise in seven Scandinavian hunter-gatherers from the Motala site in southern Sweden who lived around 7,700 years before present. While the western hunter-gatherers of central and southern Europe largely have the ancestral allele at the two major European skin pigmentation loci, the closely related Scandinavian hunter-gatherers have both the derived alleles contributing to light skin pigmentation at high frequency (Figure 2B). Thus, the derived allele of SLC24A5 was common in both the Scandinavian hunter-gatherers and Early European farmers, but not in the geographically intermediate western hunter-gatherers. Further, in four out of seven Motala samples, we observe the derived allele of rs3827760 in the EDAR gene, which has effects on tooth morphology and hair thickness. This allele has been the subject of a selective sweep in East Asia, and today it is at high frequency in East Asians and Native Americans.

...

The derived allele in the Motala samples lies on the same haplotype as in modern East Asians (Extended Data Figure 4) implying a shared origin. The statistic f4(Yoruba, Scandinavian hunter-gatherers, Han, Onge Andaman Islanders) is significantly negative (Z=-3.9) implying gene flow between the ancestors of Scandinavian hunter-gatherers and Han so this shared haplotype is likely the result of ancient gene flow between groups ancestral to these two populations.

The high frequency of the East Asian-specific EDAR allele among the Motala foragers is even more surprising for me than their inferred light skin. But it does at least gel with the earlier finding that Scandinavian hunter-gatherers did not contribute significant ancestry to modern Europeans (see here).

Citation...

Mathieson et al., Eight thousand years of natural selection in Europe, bioRxiv preprint first posted online March 14, 2015; doi: https://dx.doi.org/10.1101/016477

Friday, March 13, 2015

Yamnaya-related ancestry proportions in Europe and west Asia


Here's a quick and dirty attempt to flush out a Yamnaya-specific ancestral component with the ADMIXTURE software and a few Yamnaya genomes from the recent Haak et al. paper: K6 spreadsheet.


Obviously, we'll need many more ancient samples from the vast Yamnaya horizon to be able to estimate direct Yamnaya ancestry in modern populations with any great confidence. But I'd say this looks like a very reasonable attempt, with more or less comparable results to those published by Haak et al. (for instance, see Figure 3 from the study here).

Please note that this wasn't a supervised run. In other words, I didn't mark the Yamnaya genomes as reference samples with the aim of creating a cluster from them.

However, I initially excluded all individuals from northeastern Europe, the north Caucasus and South Asia from the analysis. The reason I did this was because samples from these regions have a peculiar habit of creating very robust clusters in ADMIXTURE, which is useful when looking at recent variation and wanting low cross validation errors, but not so great when trying to resurrect genetic components from the depths of prehistory.

Once I had a dataset that was forcing the algorithm to focus its attention on the ancient genomes and producing consistent results, I tested the problem samples in batches of 5-10, thus making sure they didn't skew the analysis.

Interestingly, the Yamnaya-specific component peaks in Udmurts, who live close to where the Yamnaya samples were collected. This can hardly be a coincidence.

In any case, I'm hoping to look at this issue in more detail soon with the help of qpAdm, a new program released recently with the updated ADMIXTOOLS package (see here). Based on f4 statistics, qpAdm is specifically designed for analyzing ancient admixture events.

Citation...

Haak et al., Massive migration from the steppe was a source for Indo-European languages in Europe, Nature, Advance online publication, doi:10.1038/nature14317

Wednesday, March 11, 2015

New release of ADMIXTOOLS with two additional programs


ADMIXTOOLS 3.0 is now available at github via the Reich Lab site. The updated package includes minor bugfixes and improvements and two new programs: qpWave and qpAdm for studying migration and admixture. Scroll down the page at the link below.

Reich Lab software

Documentation is minimal, but I'm told that users of the old ADMIXTOOLS should be able to get things running. I haven't had a chance to check it out yet, but I'm looking forward to trying qpWave and qpAdm, hopefully this weekend.

Update 28/06/2020: Major updates to ADMIXTOOLS

Thursday, February 19, 2015

The Near East ain't what it used to be


Up for public comment at bioRxiv this week is this paper on the population history of the Near East, with a special focus on Armenians. Here's the abstract:

The Armenians are a culturally isolated population who historically inhabited a region in the Near East bounded by the Mediterranean and Black seas and the Caucasus, but remain underrepresented in genetic studies and have a complex history including a major geographic displacement during World War One. Here, we analyse genome-wide variation in 173 Armenians and compare them to 78 other worldwide populations. We find that Armenians form a distinctive cluster linking the Near East, Europe, and the Caucasus. We show that Armenian diversity can be explained by several mixtures of Eurasian populations that occurred between ~3,000 and ~2,000 BCE, a period characterized by major population migrations after the domestication of the horse, appearance of chariots, and the rise of advanced civilizations in the Near East. However, genetic signals of population mixture cease after ~1,200 BCE when Bronze Age civilizations in the Eastern Mediterranean world suddenly and violently collapsed. Armenians have since remained isolated and genetic structure within the population developed ~500 years ago when Armenia was divided between the Ottomans and the Safavid Empire in Iran. Finally, we show that Armenians have higher genetic affinity to Neolithic Europeans than other present-day Near Easterners, and that 29% of the Armenian ancestry may originate from an ancestral population best represented by Neolithic Europeans.

Unfortunately, the authors failed to even mention the main cause of what they're seeing; the massive influx of Ancient North Eurasian (ANE) admixture into the Near East. They included ancient genomes Oetzi the Iceman and La Brana-1 in their analysis, but not MA-1 or Mal'ta boy, the main ANE proxy.

MA-1 is a low coverage genome, and not easy to work with, but until better ANE reference genomes are sequenced, it simply can't be ignored in studies on the population history of West Eurasia. Here's why:


Above is my Fateful Triangle PCA. Note the eastern shift of the Islamic Near Eastern groups relative to their non-Islamic neighbors. Here are the relevant ANE ancestry proportions:

Anatolian Turks ~16.54%
Armenians ~15.48%

Iranians ~19.61%
Iranian Jews ~14.01%

Lebanese Muslims ~9.82%
Lebanese Christians ~7.14%

The differences aren't very dramatic, but they're consistent and, as per the PCA, hard to overlook. Indeed, the contrast would be even more obvious if we were to add to the list other exotic admixtures, such as East Asian, South Asian and/or Sub-Saharan.

If you're wondering why it is that Muslims generally carry more ANE than their non-Muslim neighbors, it's probably because the Islamic expansion had a homogenizing effect on the Near East, and it didn't have as much of an impact on the religious minorities in the region.

How and when ANE arrived in the Near East is still a mystery which can only be solved with ancient DNA. However, my bet is that most of it came after the Neolithic from the Eurasian steppe, the northeast Caucasus and the Altai, with the Indo-Europeans, Kura-Araxes people and Turks, respectively.

Citation...

Marc Haber et al., Genetic evidence for an origin of the Armenians from Bronze Age mixing of multiple populations, bioRxiv, Posted February 18, 2015. doi: https://dx.doi.org/10.1101/015396

See also...

First look at an ancient genome from Neolithic Anatolia

Tuesday, February 17, 2015

Latest linguistics research backs the Indo-European steppe hypothesis


Most of the action on this blog in recent weeks has revolved around the Indo-European urheimat question. So it's probably not a coincidence that I just got this press release in the mail:

Linguists have long agreed that languages from English to Greek to Hindi, known as 'Indo-European languages', are part of a language family which first emerged from a common ancestor spoken thousands of years ago. Now, a new study gives us more information on when and where it was most likely used. Using data from over 150 languages, linguists at the University of California, Berkeley provide evidence that this ancestor language originated 5,500 - 6,500 years ago, on the Pontic-Caspian steppe stretching from Moldova to Russia and western Kazakhstan.

"Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis", by Will Chang, Chundra Cathcart, David Hall and Andrew Garrett, will appear in the March issue of the academic journal Language. A pre-print version of the article is available on the LSA website [see HERE].

This article provides new support for the "steppe hypothesis" or "Kurgan hypothesis", which proposes that Indo-European languages first spread with cultural developments in animal husbandry around 4500 - 3500 BCE. (An alternate theory proposes that they spread much earlier, around 7500 - 6000 BCE, in Anatolia in modern-day Turkey.)

Chang et al. examined over 200 sets of words from living and historical Indo-European languages; after determining how quickly these words changed over time through statistical modeling, they concluded that the rate of change indicated that the languages which first used these words began to diverge approximately 6,500 years ago, in accordance with the steppe hypothesis.

This is one of the first quantitatively-based academic papers in support of the steppe hypothesis, and the first to use a model with "ancestry constraints" which more directly incorporate previously discovered relationships between languages. Discussion of prior studies in favor of and against the steppe hypothesis can be found in the paper.

I'm reading the paper now, and it'll probably take me a while to get my head around it. Admittedly, linguistics is not my strong point, but I might post some observations in the comments if I feel up to it.

In any case, here's one of the phylogenetic trees from the paper. It'd be interesting to see how it lines up with thousands of complete Y-chromosome sequences from these language groups, particularly from Y-haplogroup R1; I have a feeling we'd see some very nice correlations.


Citation...

Chang et al., Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis. Manuscript to be published in Language, (Vol. 91, No. 1) March 2015.

See also...

Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint)

Eastern Europe as a bifurcation hotspot for Y-hg R1

Thursday, February 12, 2015

Eastern Europe as a bifurcation hotspot for Y-hg R1


The main angle of the recently released epic manuscript Haak et al. 2015 is that ancient DNA supports the steppe origin of at least some of Europe's Indo-European languages. That's certainly a move in the right direction, so that we can eventually do away with the Anatolian hypothesis, which was always a failed proposition.

But it's clear that the authors are holding back. They've obviously decided to be very cautious until they've looked at more ancient DNA, particularly from the Near East, Central Asia and India, before backing fully any one Proto-Indo-European (PIE) urheimat model.

That's understandable, considering how much opposition there is still to the steppe hypothesis, even though it does by and large have the support of historical linguists, which is what really counts. Nevertheless, my feeling is that Haak et al. are underselling their data, particularly the stuff from Eastern Europe.

I'm of the opinion that the steppe or Kurgan PIE model works just fine, and also not surprised by the ancient DNA evidence pointing to a massive expansion of people from the western steppe during the Late Neolithic/Early Bronze Age. So for me, the really big news in this paper is that the only two Eastern European forager samples belong to basal lineages of Y-chromosome haplogroups R1a and R1b. What this suggests, Id' say, is that ancient Eastern Europe was a key bifurcation region for R1.

Remarkably, it's possible to basically lay out the history and phylogeny of R1a in Europe using just three R1a samples from the paper. This can't be a coincidence.

- Mesolithic Hunter-Gatherer from Karelia: R1a (xM198)

- Late Neolithic Corded Ware pastoralist from Germany: R1a (M198, M417, xZ282)

- Late Bronze Age Urnfielder from Germany: R1a (M198, M417, Z282, Z280)

What we can see there is the progression from a basal R1a in pre-Neolithic Northeastern Europe to a derived R1a in late prehistoric Central Europe. The derived R1a is actually R1a1a1b1a2, which is by far the most common subclade of R1a in Europe today, and closely related to the Asian and Indo-Iranian-specific R1a1a1b2.

Interestingly, all seven of the Yamnaya males sampled by Haak et al., mostly from the Samara Valley, belong to R1b-M269, the most common subclade of R1b today. However, five belong to the West Asian-specific R1b-Z1203, but none to the West European-specific R1b-M412. Also, all nine Yamnaya samples show Near Eastern admixture, described in the paper as Armenian-like.

Does this perhaps mean that the Proto-Indo-Europeans (and thus Yamnaya) originated in the Near East, as per the Armenian Plateau hypothesis?

I doubt it. The aforementioned Eastern European R1b forager is also from the Samara Valley, and he clearly lacks Near Eastern admixture. So what are the chances that a Near Eastern population with a frequency of R1b-M269 of around 100% moved into an area of Eastern Europe where a more basal R1b was already present, and in fact in a population with no Near Eastern ancestry? Very slim, I'd say.

So how did the Yamnaya herders acquire their Near Eastern admixture? The answer is obvious if we look at their mtDNA haplogroups. These include H, T and W, all of which might have come to Eastern Europe from the Near East.

Of course this doesn't mean that the Eastern European steppe was overrun by Near Eastern Amazons. It's generally accepted that during the Neolithic the steppe was settled by farmers from the Near East, just like much of the rest of Europe, and I'd say that it was mostly the women from these groups who were incorporated into the later pastoralist societies of the steppe. The men, who probably belonged to Near Eastern haplogroups like G or T, might have been killed or marginalized in some way, so that their reproductive success was seriously hampered.

This is not a far fetched scenario. Typical hunter-gatherer Y-haplogroups like I2 and C6 have already been recorded alongside Near Eastern-specific mtDNA lineages at several Neolithic sites in Western and Central Europe. The social mechanisms for this might have been different there than on the steppe, but in any case, it seems that European hunter-gatherer males shacking up with farm girls of largely Near Eastern ancestry was not an unusual occurrence back in the day.

Now, if Eastern Europe was indeed a bifurcation hotspot for R1, then a large proportion, or even the majority of R1a and R1b in Eurasia today, might well be of Eastern European origin. If so, there should be some support for this in genome-wide DNA of present-day Asians, and indeed I think there is.

Below are a couple of principal component analyses (PCA). The first is from Haak et al. and the second from my own West Eurasia K8 analysis (see here). Unfortunately, I don't yet have access to the Yamnaya genomes, but I think it's petty easy to guesstimate where they will land on my plot when I run them in the K8. I marked this spot with an X.



Note that most of the Near Eastern and Caucasian populations are clearly shifted east towards ANE, and also up towards Europe. Moreover, I'd say many of these groups are specifically pushing up towards the Volga-Ural samples and thus the Yamnaya herders.

There's really no other way to explain this outcome. Quite simply, the vast majority of West Asians have relatively recent (post-Neolithic?) ancestry from the Ural or Kazakh steppe, which manifests itself as a west to east cline on PCA, running from the southern Levant to the north Caucasus. This result is easily reproduced on any decent PCA with West Eurasian populations, and can be seen on the Haak et al. plot.

I'm yet to find solid evidence that Indo-European speakers from the Near East, like Armenians, Kurds and Iranians, don't harbor fairly significant ancestry from this northeastern source.

For instance, unlike many people, I don't find unsupervised ADMIXTURE analyses very convincing when they show these groups to be entirely of Near Eastern ancestry. That's because when ADMIXTURE creates a modern Near Eastern/West Asian cluster, it usually lumps within it all of the ancient ancestral components that are today ubiquitous in the Near East. In other words, the steppe admixture which shows up amongst most West Asians on the PCA above is classified as native to the Near East, even though this is unlikely to be true.

See also...

High female mobility in Bronze Age Europe

Ust'-Ishim belongs to K-M526

Tuesday, February 10, 2015

Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint)


I'll probably end up writing a whole series of posts on this paper. But for now, here's the abstract and a PCA.

We generated genome-wide data from 69 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of almost four hundred thousand polymorphisms. Enrichment of these positions decreases the sequencing required for genome-wide ancient DNA analysis by a median of around 250-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that the populations of western and far eastern Europe followed opposite trajectories between 8,000-5,000 years ago. At the beginning of the Neolithic period in Europe, ~8,000-7,000 years ago, closely related groups of early farmers appeared in Germany, Hungary, and Spain, different from indigenous hunter-gatherers, whereas Russia was inhabited by a distinctive population of hunter-gatherers with high affinity to a ~24,000 year old Siberian6. By ~6,000-5,000 years ago, a resurgence of hunter-gatherer ancestry had occurred throughout much of Europe, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and is ubiquitous in present-day Europeans. These results provide support for the theory of a steppe origin of at least some of the Indo-European languages of Europe.


Haak et al., Massive migration from the steppe is a source for Indo-European languages in Europe, bioRxiv, Posted February 10, 2015, doi: https://dx.doi.org/10.1101/013433

Friday, January 30, 2015

Half of our ancestry comes from the Pontic-Caspian steppe


Here's the latest teaser for the new David Reich et al. paper on the ethnogenesis of present-day Europeans. It's part of an abstract for a seminar to be held by Professor Reich at Jesus College, Oxford, on February 9. Interestingly, it argues that migrations from the steppe resulted in a ~50% population turnover across northern Europe from the late Neolithic onwards, which is very much in agreement with recent discussions on the topic at Eurogenes (for instance, see here).

By ~6,000-5,000 years ago, a resurgence of hunter-gatherer ancestry had occurred throughout much of Europe, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but also from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and comprises about half the ancestry of today’s northern Europeans. These results support the theory of a steppe origin of at least some of the Indo-European languages of Europe, and show the power of genome-wide ancient DNA studies to document human migrations.

Source: Ancient DNA documents three ancestral populations for present-­day Europeans


Update 11/02/2015: Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint).


Haak et al., Massive migration from the steppe is a source for Indo-European languages in Europe, bioRxiv, Posted February 10, 2015, doi: https://dx.doi.org/10.1101/013433

Friday, January 23, 2015

Yamnaya genomes are a 50/50 mix of eastern Euro foragers and something else ANE-rich


I'm posting a new entry about the upcoming Corded Ware/Yamnaya paper because the last entry (see here) now has over 400 comments which aren't easy to load for many people.

One of the authors of this eagerly awaited paper, Nick Patterson of the Broad Institute, briefly joined our discussion. Nick's contribution is much appreciated. He wasn't able to reveal a great deal, because the manuscript is in submission, but he did make a couple of interesting points:

- the paper will feature Y-haplogroup results from the Yamnaya culture, represented by nine samples in all, including seven males

- the population with Near Eastern ancestry that mixed with the Eastern Hunter-Gatherers (EHG) on the Russian steppe to form the Yamnaya pastoralists by 5,000 YBP was also "rich" in ANE

- ancient DNA from the Caucasus, Iran and India is probably necessary to work out how the Indo-Europeans got to India, but the paper won't feature such data

It's nice to hear that Y-haplogroups aren't being ignored. My opinion is that they're at least as important as genome-wide data when tracking the movements across vast space and time of highly patriarchal and patrilineal groups like the ancient Indo-Europeans.

Indeed, we already know that the Slavic, Baltic and Norse-specific R1a1a1b1, defined by the Z282 mutation, is the sister clade of the Indo-Iranian-specific R1a1a1b2, defined by Z93. Thus, if the Yamnaya males were found to belong to these or upstream markers, this would suggest that they were the paternal ancestors of many Balts, Scandinavians, Slavs and Indo-Iranians, and correlate very nicely with the linguistic and archeological "steppe hypothesis" of Indo-European origins.

In fact, even if analyses based on high density genome-wide data suggest that Indians don't harbor any genome-wide European ancestry, we'd still have to accept the likelihood of gene flow - albeit perhaps very indirect gene flow - from the European steppe to India because many Indians belong to R1a1a1b2.

The second point made by Nick is perhaps surprising, but at least for me not totally unexpected. That's because we've already known for a while that the Yamnaya genomes can be successfully modeled as half Karelian EHG and half present-day Armenian (see here), and according to my own estimates Armenians carry an average of 15.5% ANE.

The fact that these Armenian-like, ANE-rich newcomers dampened the genome-wide affinity to ANE-proxy MA-1 on the Russian steppe might look like a contradiction, but not if we remember that the higher the Near Eastern ancestry the lower the genome-wide affinity to MA-1, and also consider that the steppe foragers probably carried a lot more ANE than the newcomers.

Actually, as far as I know, all of the Yamnaya samples in this study come from the Samara Valley, which is some distance north of the Caspian Sea near the southern Urals. So it makes senses that the pseudo Armenians who turned up there more than 5,000 years ago were not like the Neolithic farmers of Western and Central Europe, who lacked ANE.

I'd say that this as yet unidentified group (wild guess: immediate ancestors of the Repin culture people?) was the result of an admixture event, or perhaps a series of admixture events, with ANE-rich foragers somewhere on the steppe south of the Samara. If so, I won't be surprised if it turns out that R1a only appeared in the Samara Valley after their arrival.

In any case, it looks like even after this paper comes out, we'll still need a lot more ancient DNA from across Eurasia to help map out the early Indo-European dispersals with any confidence.

Update 11/02/2015: Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint) .

Monday, January 19, 2015

Ancient DNA points to the Eurasian steppe as a proximate source for Indo-European migrations into Europe


This is yet another teaser for the upcoming Corded Ware/Yamnaya paper from the Reich lab. Sadly, it doesn't mention Y-chromosome haplogroups, so perhaps the authors are going to tackle this issue later. However, check out what they say about the German and Spanish farmers being of the same stock, and the resurgence of hunter-gatherer ancestry in Western Europe after the early Neolithic. Fascinating stuff.

Ancient DNA points to the Eurasian steppe as a proximate source for Indo-European migrations into Europe

David Reich and Nick Patterson

Abstract: We generated genome-wide data from 65 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of about 390,000 single nucleotide polymorphisms. This strategy decreases the sequencing required to obtain genome-wide data from ancient DNA samples by around 1000-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that in western Europe, the farmers of both Germany and Spain >7,000 years ago were descended from a common ancestral stock. These farmers did not replace the earlier hunter-gatherers, but continued to mix with them, leading to a resurgence of hunter-gatherer ancestry in both Germany and Spain ~1,000-2,000 years later. In eastern Europe, the hunter-gatherers of Russia >7,000 years ago were distinct from those of the west, having an increased affinity to a ~24,000 year old individual from Siberia, but this affinity was reduced by ~5,000 years ago in the Yamnaya steppe pastoralists because of admixture with a population of Near Eastern ancestry. Western and Eastern Europe collided ~4,500 years ago with the appearance of the Corded Ware people in Central Europe, who derived at least two thirds of their ancestry from an eastern population closely related to the Yamnaya. The evidence for mass migration into Europe thousands of years after the arrival of agriculture, in combination with linguistic and archaeological data, makes a compelling case for the steppe as a proximate source for the spread of Indo-European languages into Europe.

Source: INA Kolloquium Ws 2014/15


Update 11/02/2015: Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint) .