search this blog

Friday, December 26, 2014

The fateful triangle

Not long ago Lazaridis et al. proposed that most present-day Europeans were derived from three distinct ancestral populations: Ancient North Eurasians (ANE), Early European Farmers (EEF) and Western European Hunter-Gatherers (WHG).

However, this is essentially a stop-gap model, which will in all likelihood be replaced by a partly revised and more robust model once someone manages to sequence a genome or two from the Neolithic Near East. That's because EEF is clearly a hybrid component, largely made up of ancient Near Eastern ancestry and something very WHG-like, sometimes in very different proportions depending on the location and archeological context of the EEF genomes being analyzed.

So what will this new model look like, you might ask? Probably like this, where EEF is replaced by an Early Neolithic Farmer (ENF) component from the ancient Near East, or something very similar:

The diagram above is basically a Principal Component Analysis (PCA) based on output from my new West Eurasia K8 test (see here), in which the Near Eastern component is synonymous with ENF.

I'm quite certain that these results are very close to the truth. However, just in case the Near Eastern ancestry proportions are a little bit too high (and we won't know until we see those ancient genomes from the Near East), I've got another version that offers lower bound Near Eastern estimates.

It might be useful to keep in mind that I rotated the plots to fit geography. As a result, Component 1, which packs around 85% of the variance on both plots, appears smaller than Component 2, which only carries around 10% of the variance.

A spreadsheet with West Eurasia K8 results for a wide variety of populations is available here. Please note that there are two sheets, with the second sheet showing the lower bound Near Eastern ancestry proportions.

We'll probably learn of more ancient European meta-populations as many more genomes are sequenced from across Eurasia. Nevertheless, I doubt this will affect the model outlined above. That's because I'm expecting all such meta-populations to be mixtures of ANE, ENF and/or WHG, as well as, in some cases, extra-West Eurasian components.

However, I suspect that West Eurasia will have to be modeled in a different way from Europe, with, amongst other things, the so called Basal Eurasian component replacing ENF. But for this to happen we'll need at least one ancient genome that is in large-part of Basal Eurasian origin. In any case, that's a whole different subject.

See also...

4mix: four-way mixture modeling in R

Sunday, December 21, 2014

Gokhem2 + Motala12 =/= present-day Swedes

I've seen quite a few comments on this blog suggesting that most of the Ancient North Eurasian (ANE) admixture found in Northern Europe today might come from Scandinavian hunter-gatherers like Motala12 and Ajvide58. It's probably obvious to most that this is not realistic, because the Scandinavian forager genomes sequenced to date show very high ratios of Western European Hunter-Gatherer (WHG) ancestry (>80%), so basically the math doesn't add up.

Nevertheless, I thought it might be useful to drive the point home using this Principal Component Analysis (PCA) based on my new West Eurasia K8 test. The datasheet is available here. You can view a spreadsheet of the results with extra samples here.

Please note that neither Motala12 nor Gokhem2, a late Neolithic farmer from south Sweden belonging to the Funnelbeaker culture, can pass for present-day Swedes. Moreover, mixing Gokhem2 with Motala12, in any proportions, will not produce a result even vaguely similar to present-day Swedes (ie. the outcome will fall somewhere along the dotted line).

I'd say one of the most obvious ways to get the right result would be to blend the Scandinavian forager and farmer with at least one other sample from somewhere below (ie. geographically speaking, east or southeast) of the Swedish cluster.

It might be possible to come up with a more precise plot location, and thus perhaps geographic origin, for this putative third source of Swedish ancestry by running some complex tests with the PCA datasheet. If anyone wants to have a go at that, and you actually manage to come up with a coherent outcome, then feel free to post your findings in the comments below.

I've decided not to bother, because as far as I can see, the options are infinite. What we really need are more genomes from the Swedish late Neolithic/early Bronze Age (LN/EBA), preferably belonging to one of the local spin-offs of the Corded Ware culture, which is thought to have originated in Eastern Europe, to provide more datapoints and help narrow down the options.

On a related note, I'm catching up on some reading this holiday season, and currently going through this book chapter which discusses the upheavals during the LN/EBA in south Scandinavia as seen through its archeology.

Rune Iversen, Beyond the Neolithic transition - the "de-Neolithisation" of south Scandinavia

See also...

Bell Beaker, Corded Ware, EHG and Yamnaya genomes in the fateful triangle

Monday, December 15, 2014

ANE is the primary cause of west to east genetic differentiation within West Eurasia

Here's a Principal Component Analysis (PCA) and an accompanying biplot based on output from an improved version of my ANE K7 ancestry test. Let's call it the West Eurasia K8. This one gives more accurate estimates of Western European Hunter-Gatherer (WHG) and Near Eastern admixture proportions, thanks to the use of new ancient samples.
When rotated accordingly (like here), the results are basically indistinguishable from those I get with genotype data (for instance, see here and here), which suggests that they're correct and based on ancestry proportions that are close to the truth. The Past3 data sheet used to create the PCA is available here. You can view a spreadsheet of the results with extra samples here.

Clearly, ANE is the main agent causing the west to east differentiation in dimension 2. Note that even a small rise in ANE, say, 4-5%, creates significant distance between samples on the PCA plot.

East and South Eurasian admixture has a similar effect, but must be more considerable to make an impact on a West Eurasian-specific PCA like this (and it does with the obvious Volga-Ural outliers, who come from Chuvashia and Tatarstan).

On the other hand, Near Eastern admixture without ANE creates almost the opposite effect. Note, for instance, that Neolithic genomes Stuttgart and NE1 show much higher levels of Near Eastern ancestry than most Europeans, and yet they're amongst the most western samples on the plot.

This suggests that the Near East, and in particular the Caucasus, experienced a significant rush of ANE admixture after early Neolithic farmers left the region for Europe. Alternatively, Caucasus populations may have carried even higher levels of ANE than they do today, before newcomers from the Near East mixed with them. But either way, a lot of ANE arrived in the Near East at some point.

It also suggests that, overall, the populations that moved west across northern Europe after the Neolithic, and shifted northern European genetic structure to the east, did not carry high ratios of Near Eastern ancestry. Instead, they harbored high ratios of ANE and WHG. What these ratios were exactly I haven't a clue, but ancient DNA should tell us that soon.

Below are the ancestry proportions for the five ancient genomes in this analysis, in chronological order. It's interesting to note (yet again) the rising and falling Near Eastern admixture, from the Mesolithic to Neolithic and then from the Neolithic to Bronze Age, respectively, as well as the steady rise of ANE from the Bronze Age to the Iron Age.

Loschbour (Mesolithic)

South_Eurasian 0
Near_Eastern 0
East_Eurasian 0
WHG 99.5
Oceanian 0.5
Pygmy 0
Sub-Saharan 0

Stuttgart (Neolithic)

South_Eurasian 0
Near_Eastern 72.19
East_Eurasian 0
WHG 27.8
Oceanian 0
Pygmy 0
Sub-Saharan 0

NE1 (Neolithic)

South_Eurasian 0
Near_Eastern 69.82
East_Eurasian 0
WHG 30.17
Oceanian 0
Pygmy 0
Sub-Saharan 0

BR2 (Bronze Age)

ANE 9.62
South_Eurasian 0.08
Near_Eastern 43.96
East_Eurasian 0
WHG 45.44
Oceanian 0.48
Pygmy 0.23
Sub-Saharan 0.19

Hinxton4 (Iron Age)

ANE 15.08
South_Eurasian 0.06
Near_Eastern 35.44
East_Eurasian 0.46
WHG 48.5
Oceanian 0
Pygmy 0
Sub-Saharan 0.46

See also...

The fateful triangle

Bell Beaker, Corded Ware, EHG and Yamnaya genomes in the fateful triangle

Sunday, December 7, 2014

Milk consumption in late Neolithic/Bronze Age West Eurasia

The map below is based on data from Warinner et al. 2014. It shows the consumption of milk, or lack of, among Late Neolithic/Bronze Age (LN/BA) individuals from across West Eurasia. Admittedly, the sampling is very sparse, but like I've said before on these blogs, the LN/BA was a time of profound changes in Europe, so every scrap of data from this period is very valuable.

Note the lack of milk consumption among the samples from north of the Alps, where today the vast majority of people consume milk as adults, and can do so because they carry the Lactase Persistence Allele (T-13910). This doesn't look like a coincidence, considering the mounting evidence of a major population turnover across much of Europe during the LN/BA, mostly as a result of migrations from the east.


Warinner, C. et al. Direct evidence of milk consumption from ancient human dental calculus. Sci. Rep. 4, 7104; DOI:10.1038/srep07104 (2014).

See also...

Lactase persistence and ancient DNA

Ancient genomes from the Great Hungarian Plain

Friday, December 5, 2014

The Y-chromosome tree bursts into leaf

Update 20/05/2015: Large-scale recent expansion of European patrilineages


I wonder what the hardcore Y-DNA genetic genealogists will say about this effort? I know that many of those guys have been working with full Y-chromosome sequences for a while now. It's open access with lots of supplementary info.

Abstract: Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51x, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analysing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of non-synonymous variants in 15 MSY single-copy genes.

Here are a couple of interesting quotes. You can see the samples they're talking about on the tree below. As per the second paragraph, it seems there's a paper about to be published at Nature Communications on European Y-chromosome haplogroups based on some heavy resequencing data (see Batini et al. in the references list). Can't wait for that.

(viii) Rare deep-rooting hg Q lineages in NW Europe: Hg Q has been most widely investigated in terms of the peopling of the Americas from NE Asia (Karafet et al. 1999). Here, as well as an example of the common native American Q-M3 lineage, we included examples of rare European hg Q chromosomes. One of the English chromosomes belongs to the deepest-rooting lineage within Q (Q-M378) and may reflect the Jewish diaspora (Hammer et al. 2009); the other is distantly related, shares a deep node with the Mexican Q-M3 chromosome, and has an STR-haplotype closely related to those of scarce Scandinavian hg Q chromosomes (unpublished data).

(ix) Structure within the west Eurasian hg R: The TMRCA of hg R is 19 KYA, and within it both hgs R1a and R1b comprise young, star-like expansions discussed extensively elsewhere (Batini et al. submitted). The addition of Central Asian chromosomes here contributes a sequence to the deepest subclade of R1b-M269, while another, in a Bhutanese individual, forms an outgroup almost as old as the R1a/R1b split.


Hallast et al., The Y-chromosome tree bursts into leaf: 13,000 high-confidence SNPs covering the majority of known clades, Molecular Biology & Evolution, published online December 2, 2014, doi: 10.1093/molbev/msu327