search this blog


Friday, January 30, 2015

Half of our ancestry comes from the Pontic-Caspian steppe

Here's the latest teaser for the new David Reich et al. paper on the ethnogenesis of present-day Europeans. It's part of an abstract for a seminar to be held by Professor Reich at Jesus College, Oxford, on February 9. Interestingly, it argues that migrations from the steppe resulted in a ~50% population turnover across northern Europe from the late Neolithic onwards, which is very much in agreement with recent discussions on the topic at Eurogenes (for instance, see here).

By ~6,000-5,000 years ago, a resurgence of hunter-gatherer ancestry had occurred throughout much of Europe, but in Russia, the Yamnaya steppe herders of this time were descended not only from the preceding eastern European hunter-gatherers, but also from a population of Near Eastern ancestry. Western and Eastern Europe came into contact ~4,500 years ago, as the Late Neolithic Corded Ware people from Germany traced ~3/4 of their ancestry to the Yamnaya, documenting a massive migration into the heartland of Europe from its eastern periphery. This steppe ancestry persisted in all sampled central Europeans until at least ~3,000 years ago, and comprises about half the ancestry of today’s northern Europeans. These results support the theory of a steppe origin of at least some of the Indo-European languages of Europe, and show the power of genome-wide ancient DNA studies to document human migrations.

Source: Ancient DNA documents three ancestral populations for present-­day Europeans

Update 11/02/2015: Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint).

Wolfgang Haak et al., Massive migration from the steppe is a source for Indo-European languages in Europe, bioRxiv, Posted February 10, 2015, doi:

Friday, January 23, 2015

Yamnaya genomes are a 50/50 mix of eastern Euro foragers and something else ANE-rich

I'm posting a new entry about the upcoming Corded Ware/Yamnaya paper because the last entry (see here) now has over 400 comments which aren't easy to load for many people.

One of the authors of this eagerly awaited paper, Nick Patterson of the Broad Institute, briefly joined our discussion. Nick's contribution is much appreciated. He wasn't able to reveal a great deal, because the manuscript is in submission, but he did make a couple of interesting points:

- the paper will feature Y-haplogroup results from the Yamnaya culture, represented by nine samples in all, including seven males

- the population with Near Eastern ancestry that mixed with the Eastern Hunter-Gatherers (EHG) on the Russian steppe to form the Yamnaya pastoralists by 5,000 YBP was also "rich" in ANE

- ancient DNA from the Caucasus, Iran and India is probably necessary to work out how the Indo-Europeans got to India, but the paper won't feature such data

It's nice to hear that Y-haplogroups aren't being ignored. My opinion is that they're at least as important as genome-wide data when tracking the movements across vast space and time of highly patriarchal and patrilineal groups like the ancient Indo-Europeans.

Indeed, we already know that the Slavic, Baltic and Norse-specific R1a1a1b1, defined by the Z282 mutation, is the sister clade of the Indo-Iranian-specific R1a1a1b2, defined by Z93. Thus, if the Yamnaya males were found to belong to these or upstream markers, this would suggest that they were the paternal ancestors of many Balts, Scandinavians, Slavs and Indo-Iranians, and correlate very nicely with the linguistic and archeological "steppe hypothesis" of Indo-European origins.

In fact, even if analyses based on high density genome-wide data suggest that Indians don't harbor any genome-wide European ancestry, we'd still have to accept the likelihood of gene flow - albeit perhaps very indirect gene flow - from the European steppe to India because many Indians belong to R1a1a1b2.

The second point made by Nick is perhaps surprising, but at least for me not totally unexpected. That's because we've already known for a while that the Yamnaya genomes can be successfully modeled as half Karelian EHG and half present-day Armenian (see here), and according to my own estimates Armenians carry an average of 15.5% ANE.

The fact that these Armenian-like, ANE-rich newcomers dampened the genome-wide affinity to ANE-proxy MA-1 on the Russian steppe might look like a contradiction, but not if we remember that the higher the Near Eastern ancestry the lower the genome-wide affinity to MA-1, and also consider that the steppe foragers probably carried a lot more ANE than the newcomers.

Actually, as far as I know, all of the Yamnaya samples in this study come from the Samara Valley, which is some distance north of the Caspian Sea near the southern Urals. So it makes senses that the pseudo Armenians who turned up there more than 5,000 years ago were not like the Neolithic farmers of Western and Central Europe, who lacked ANE.

I'd say that this as yet unidentified group (wild guess: immediate ancestors of the Repin culture people?) was the result of an admixture event, or perhaps a series of admixture events, with ANE-rich foragers somewhere on the steppe south of the Samara. If so, I won't be surprised if it turns out that R1a only appeared in the Samara Valley after their arrival.

In any case, it looks like even after this paper comes out, we'll still need a lot more ancient DNA from across Eurasia to help map out the early Indo-European dispersals with any confidence.

Update 11/02/2015: Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint) .

Monday, January 19, 2015

Ancient DNA points to the Eurasian steppe as a proximate source for Indo-European migrations into Europe

This is yet another teaser for the upcoming Corded Ware/Yamnaya paper from the Reich lab. Sadly, it doesn't mention Y-chromosome haplogroups, so perhaps the authors are going to tackle this issue later. However, check out what they say about the German and Spanish farmers being of the same stock, and the resurgence of hunter-gatherer ancestry in Western Europe after the early Neolithic. Fascinating stuff.

Ancient DNA points to the Eurasian steppe as a proximate source for Indo-European migrations into Europe

David Reich and Nick Patterson

Abstract: We generated genome-wide data from 65 Europeans who lived between 8,000-3,000 years ago by enriching ancient DNA libraries for a target set of about 390,000 single nucleotide polymorphisms. This strategy decreases the sequencing required to obtain genome-wide data from ancient DNA samples by around 1000-fold, allowing us to study an order of magnitude more individuals than previous studies and to obtain new insights about the past. We show that in western Europe, the farmers of both Germany and Spain >7,000 years ago were descended from a common ancestral stock. These farmers did not replace the earlier hunter-gatherers, but continued to mix with them, leading to a resurgence of hunter-gatherer ancestry in both Germany and Spain ~1,000-2,000 years later. In eastern Europe, the hunter-gatherers of Russia >7,000 years ago were distinct from those of the west, having an increased affinity to a ~24,000 year old individual from Siberia, but this affinity was reduced by ~5,000 years ago in the Yamnaya steppe pastoralists because of admixture with a population of Near Eastern ancestry. Western and Eastern Europe collided ~4,500 years ago with the appearance of the Corded Ware people in Central Europe, who derived at least two thirds of their ancestry from an eastern population closely related to the Yamnaya. The evidence for mass migration into Europe thousands of years after the arrival of agriculture, in combination with linguistic and archaeological data, makes a compelling case for the steppe as a proximate source for the spread of Indo-European languages into Europe.

Source: INA Kolloquium Ws 2014/15

Update 11/02/2015: Massive migration from the steppe is a source for Indo-European languages in Europe (Haak et al. 2015 preprint) .

Saturday, January 17, 2015

Ancient Jomon people not like present-day East Asians

Here's an abstract about a couple of ancient Jomon genomes from the recent OIST Ancient DNA Symposium in Japan.

Hideaki Kanzawa-Kiriyama, Nuclear Genome Analysis of Ancient Japanese Archipelago Humans

The Jomon period, characterized by chord-marked potteries, lasted from ~16,000 to <3,000 years before present (YBP), and abundant human skeletal remains have been excavated from shell mounds and other sites throughout the Japanese Archipelago. However, their genetic origin and the relationships with modern populations are largely unknown. Here we determined 10% and 80% of the genomic DNA sequences from two Jomon individuals, excavated at Yugura cave site, Nagano, and Shitsukariabe cave site, Aomori, respectively, and compared their genome sequences with worldwide populations. We found a unique genetic position of the Jomon people who had diverged before the diversification of most of present-day East Eurasian populations including East Eurasian Islanders. This indicates that Jomon people were a basal population in East Eurasia and genetically isolated from other East Eurasians for long time. However, their genetic affinities to modern East Eurasians are uneven. The heterogeneity might be a hint to clarify human migration and gene flow in East Eurasia after the divergence of Jomon ancestors.

Hopefully the full paper and genomes are published soon. It'll be interesting to see how these Jomon individuals compare to Western European Hunter-Gatherers (WHG) like Loschbour and Ancient North Eurasians (ANE) like MA-1.

Update 17/01/2015: In fact, the author of this abstract analyzed a variety of Jomon samples, including the two mentioned above, as well as an Upper Paleolithic individual from Ryukyu, for his doctoral thesis back in 2013. The thesis is freely available here.

Update 09/02/2016: The paper is now available and open access at Human Genetics. See here: A partial nuclear genome of the Jomons who lived 3000 years ago in Fukushima, Japan

Wednesday, January 14, 2015

Eleven Y-chromosome descent clusters in Asia

Unfortunately, this new Balaresque et al. paper is behind a pay wall, but the figures and tables and supplementary info are freely available.

Out of the 11 descent clusters (DCs) identified among Asian males, DC2 shows the strongest correlation with Indo-European languages. This cluster is based on STR haplotypes within Y-haplogroup R1a1, and is inferred to have expanded from Central Asia around 1300 BCE. In fact, to me DC2 looks like a signal of the Indo-Iranian dispersals, and associated with R1a1a1b2 (R1a-Z93) rather than R1a1 as a whole.

Here's a spatial map of DC2. The KYK marker in south Siberia represents European-like Kurgan samples from the Bronze Age. Four out of six of these individuals belonged to DC2. You can read more about them here.

Interestingly, Supplementary Figure 1 shows the presence of R*, R1*, R1b* and R1b1b2 lineages among Tajik groups (see here). Any ideas what these might really be?

High-frequency microsatellite haplotypes of the male-specific Y-chromosome can signal past episodes of high reproductive success of particular men and their patrilineal descendants. Previously, two examples of such successful Y-lineages have been described in Asia, both associated with Altaic-speaking pastoral nomadic societies, and putatively linked to dynasties descending, respectively, from Genghis Khan and Giocangga. Here we surveyed a total of 5321 Y-chromosomes from 127 Asian populations, including novel Y-SNP and microsatellite data on 461 Central Asian males, to ask whether additional lineage expansions could be identified. Based on the most frequent eight-microsatellite haplotypes, we objectively defined 11 descent clusters (DCs), each within a specific haplogroup, that represent likely past instances of high male reproductive success, including the two previously identified cases. Analysis of the geographical patterns and ages of these DCs and their associated cultural characteristics showed that the most successful lineages are found both among sedentary agriculturalists and pastoral nomads, and expanded between 2100 BCE and 1100 CE. However, those with recent origins in the historical period are almost exclusively found in Altaic-speaking pastoral nomadic populations, which may reflect a shift in political organisation in pastoralist economies and a greater ease of transmission of Y-chromosomes through time and space facilitated by the use of horses.

Balaresque et al., Y-chromosome descent clusters and male differential reproductive success: young lineage expansions dominate Asian pastoral nomadic populations, European Journal of Human Genetics advance online publication 14 January 2015; doi: 10.1038/ejhg.2014.285

Friday, January 9, 2015

Very rare SNPs reveal distant genealogical ties between the UK and China

As far as I can see, the paper doesn't say anything about the direction of gene flow. It might have been either way, I suppose, considering how active Great Britain was in the Far East during colonial times. Two of the British individuals mentioned in the abstract below are from Cornwall and one is from Kent.

Abstract: Nucleotide sequence differences on the whole-genome scale have been computed for 1092 people from 14 populations publicly available by the 1000 Genomes Project. Total number of differences in genetic variants between 96,464 human pairs has been calculated. The distributions of these differences for individuals within European, Asian or African origin were characterized by narrow unimodal peaks with mean values of 3.8, 3.5, and 5.1 million respectively and standard deviations of 0.1-0.03 million. The total numbers of genomic differences between pairs of all known relatives were found to be significantly lower than their respective population means and in reverse proportion to the distance of their consanguinity. By counting the total number of genomic differences it is possible to infer familial relations for people that share down to 6% of common loci identical-by-descent. Detection of familial relations can be radically improved when only very rare genetic variants are taken into account. Counting of total number of shared very rare SNPs from whole-genome sequences allows establishing distant familial relations for persons with 8th and 9th degree of relationship. Using this analysis we predicted 271 distant familial pair-wise relations among 1092 individuals that have not been declared by 1000 Genomes Project. Particularly, among 89 British and 97 Chinese individuals we found three British-Chinese pairs with distant genetic relationships. Individuals from these pairs share identical by descent DNA fragments that represent 0.001%, 0.004%, and 0.01% of their genomes. With affordable whole-genome sequencing techniques, very rare SNPs should become important genetic markers for familial relationships and population stratification.


Al-Khudhair et al., Inference Of Distant Genetic Relations In Humans Using “1000 Genomes”, Genome Biol Evol (2015), doi: 10.1093/gbe/evv003

Thursday, January 8, 2015

SpaceMix: A Spatial Framework for Understanding Population Structure and Admixture

Analyzing admixture isn't easy, especially among spatially more or less continuous populations that exchange DNA gradually by mixing with their immediate neighbors. A new preprint at bioRxiv explains this problem in detail and provides a possible solution: SpaceMix (available here as an R script). Below are a few excerpts from the paper. I highlighted the Polish sample in the two figures for my own use.

Of the European samples, the Spanish and the East and West Sicilian samples all draw small amounts of admixture from close to the Ethiopian samples, presumably reflecting a North African ancestry component [Moorjani et al., 2011, Botigu et al., 2013].


The Chuvash move close to Russian and Lithuanian samples, drawing admixture from close to the Yakut; the Turkish sample also draws a smaller amount of admixture from there. There are several other East-West connections: the Russian and Adygei samples have admixture from a location "north" of the East Asian samples, and the Cambodia sample draws admixture from close to the Eygptian sample [Pickrell and Pritchard, 2012, Hellenthal et al., 2014].

There are also a number of samples that draw admixture from locations that are not immediately interpretable. For example, the Hadza and Bantu Kenyan samples draw admixture from somewhat close to India, and the Xibo and Yakut from close to "northwest" of Europe. The Pathan samples draw admixture from a location far from any other samples' locations, but close to where the India samples also draws admixture from.


There are a number of possible explanations for these results. As we only allow a single admixture arrow for each sample, populations with multiple, geographically distinct sources of admixture may be choosing admixture locations that average over those sources. This may be the case for the Hadza and Bantu Keynan samples [Hellenthal et al., 2014]. A second possibility is that the relatively harsh prior on admixture proportion forces samples to choose lower proportions of admixture from locations that overshoot their true sources; this may explain the Xibo and Yakut admixture locations. A final explanation is that good proxies for the sources of admixture may not be included in our sampling, either because of of the limited geographic sampling of current day populations, or because of old admixture events from populations that are no longer extant.


Bradburd et al., A Spatial Framework for Understanding Population Structure and Admixture, bioRxiv, Posted January 7, 2015. doi: