search this blog

Saturday, April 30, 2016

Y-hg J2 cannot be a Proto-Indo-European marker

The claim that the Proto-Indo-Europeans came from West Asia and largely belonged to Y-haplogroup J2 seems to be popular online nowadays. I won't discuss here in detail the reasons why, but suffice to say it has a lot do with aggressive lobbying on several online forums and blogs by a few people of Southern European extraction, like Dienekes Pontikos.

It was always a shaky proposition, but difficult to debunk thoroughly. Until now.

Thanks to recent advances in both modern and ancient DNA research, we can now safely say that Y-haplogroup J2 was not involved in any rapid, large scale population expansions during the Late Neolithic/Early Bronze Age (LN/EBA), the generally accepted Proto-Indo-European time frame.

It thus fails to meet even the most basic criteria of a Proto-Indo-European diagnostic marker. The Proto-Indo-Europeans, after all, were surely highly patriarchal and patrilineal, and therefore expected to have left a clear signal of their migrations in the Y-chromosomes of many present-day Indo-European speakers.

For instance, an analysis of data from the deep sequencing of human Y-chromosomes as part of the 1000 Genomes Project suggests that not a single major subclade of J2 began expanding even roughly close to the LN/EBA. See here.

In the plot above three lineages jump out at you. E1b, R1a, and R1b. The first is associated with the Bantu expansion, that occurred over the last 4,000 years. The second two are likely associated with Indo-Europeans in both Asia and Europe, respectively. The timescale is on the order of 4 to 5,000 years in the past. The association between culture and genes, or the genetic lineages of males, is rather clear, in these cases. In other instances the growth was more gradual. For example, the lineages likely associated with the first Neolithic pulses, J and G.

Moreover, not a single instance of J2 has been reported from remains classified as belonging to the Andronovo, Battle-Axe, Corded Ware, Khvalynsk, Poltavka, Potapovka, Sintashta, Srubnaya and Yamnaya archaeological cultures. In other words, Kurgan and Kurgan-derived groups generally accepted to be early Indo-European, whch is a view that now has very strong support from ancient genomics. See here and here.

To date, most of these samples have probably come from elite burials. So at some point, when many more non-elite samples are sequenced, we are likely to see J2 among a few supposedly early Indo-European individuals. But so what?

There might be a couple of ways to salvage the Proto-Indo-Europeans = J2 theory. We'd have to argue that...

- the Proto-Indo-European time frame was actually the early Neolithic


- the Proto-Indo-Europeans were a small group that Indo-Europeanized the steppe Kurgan people, perhaps mainly via female migrations, and then did not partake in the main early Indo-European expansions

But the former is not particularly clever when viewed in the context of historical linguistics data. See here.

For instance, almost all IE language branches testify to a word designating ‘wool’. Since archaeological evidence suggests that wool sheep did not exist until the beginning of the fourth millennium BCE, the existence of the word in PIE would indicate that the disintegration of the proto-language could not have taken place before this date. Similarly, words for concepts such as ‘wheel’, ‘yoke’, ‘honey bee’ and ‘horse’ may be correlated directly with concrete, datable archaeological evidence.

And the latter isn't very parsimonious, and to me looks like special pleading. Why even bother?

Monday, April 25, 2016

Signals of ancient population explosions in our Y-chromosomes

Nature Genetics has a massive new paper on human Y-chromosomes based on the latest 1000 Genomes data. I'm still getting my head around the details, but at first glance it looks like a very capable effort. This part basically reads like some of my blog entries in recent years. The emphasis is mine.

In South Asia, we detected eight lineage expansions dating to ~4.0–7.3 kya and involving haplogroups H1-M52, L-M11, and R1a-Z93 (Supplementary Fig. 14b,d,e). The most striking were expansions within R1a-Z93, occurring 4.0–4.5 kya. This time predates by a few centuries the collapse of the Indus Valley Civilization, associated by some with the historical migration of Indo-European speakers from the Western Steppe into the Indian subcontinent 27. There is a notable parallel with events in Europe, and future aDNA evidence may prove to be as informative as it has been in Europe.

Poznik et al., Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences, Nature Genetics, Published online 25 April 2016; doi:10.1038/ng.3559

See also...

The Poltavka outlier

Sunday, April 24, 2016

The calm before the storm

It's been another slow month in the world of ancient DNA. I can promise you that next month is going to be awesome. Meantime, here's some reading to keep you awake until the dry spell breaks. Please note, some of these aren't meant to be taken seriously.

Possible ritual cranial surgery on the Eneolithic/Bronze Age Russian steppes: Gresky et al. 2016

Beating the long dead Khazar horse to death again: New DNA tech traces origins of Yiddish to...Turkey [comic relief]

Greg Cochran on human goklus or races + discussion: Such a thing

Oetzi the Iceman and his stomach bug came from South Asia: Was the Indian Sub-Continent the original genetic homeland of the Europeans? [comic relief]

Oetzi the Iceman and his stomach bug didn't come from South Asia: 5300 year old Iceman's bacteria does not support out of India theory

Bear preprint at bioRxiv: Genome-wide evidence for a hybrid origin of modern polar bears

Sunday, April 17, 2016

Estimating Basal Eurasian ancestry?

Basal Eurasians (BE) are a hypothetical ghost population that apparently split from other Eurasians no later than 45,000 years ago. If they actually existed, they had a significant impact on the ancestry of early Neolithic farmers, and thus all present-day West Eurasians.

Testing ancestry proportions from ghost populations isn't easy. However, Haak et al. 2015 made use of an f4 equation that seemingly gave an accurate estimate of BE admixture in LBK farmer Stuttgart: f4(Stuttgart,Loschbour;Onge,MA1)/f4(Mbuti,MA1;Onge,Loschbour) = 44%. The other LBK farmers scored an average of 40% BE, which also made sense.

Unfortunately, this equation doesn't appear to work too well for Caucasus Hunter-Gatherers (CHG) Kotias and Satsurblia. They both score around 25% BE, which, as far as I can see, seems way too low. Perhaps using MA1 in the equation is messing things up because CHG harbor significant MA1-related ancestry?

I tinkered around with Haak's equation and came up with this: f4(X,Iberia_Mesolithic;Dai,Karelia_HG)/f4(Mbuti,Karelia_HG;Dai,Iberia_Mesolithic). The results look solid, at least in relative terms (see image below). But is the equation actually valid?

My main worry is using both Iberia Mesolithic and Karelia HG. They share a lot of drift, much more than Loschbour and MA1. Also, even though both Dai and Onge belong to the so called Eastern non-African (ENA) clade, they're quite distinct, with Dai a lot less basal in the context of ENA diversity. Any thoughts? Suggestions?

Update 04/18/2016: Interestingly, my f4 equation essentially fails for most post-Neolithic Europeans, particularly those with relatively high ratios of Karelia HG-related ancestry. For instance, Yamnaya Kalmykia scores just 2.9% BE, which can't be right. Yamnaya Samara shows -2.2%, which is obviously wrong.

But I tried several combinations of reference samples and found that by replacing Karelia HG with Hungary HG and Dai with Ust-Ishim I was able to obtain coherent results for a wider range of groups, including Yamnaya.

To be honest, I still don't know what the hell I'm testing here exactly. The results appear to reflect the existence of two components within West Eurasia; one representing ancient hunter-gatherers from Europe and probably surrounding areas of the Near East, and another closely related to present-day Near Eastern populations. The latter might well be a signal of the so called Basal Eurasians, or perhaps a number of as yet unsampled meta populations from the ancient Near East?

Thursday, April 7, 2016

Y-chromosome DNA from an Iberian Neandertal

Open access at the AJHG:

Summary: Sequencing the genomes of extinct hominids has reshaped our understanding of modern human origins. Here, we analyze ∼120 kb of exome-captured Y-chromosome DNA from a Neandertal individual from El Sidrón, Spain. We investigate its divergence from orthologous chimpanzee and modern human sequences and find strong support for a model that places the Neandertal lineage as an outgroup to modern human Y chromosomes—including A00, the highly divergent basal haplogroup. We estimate that the time to the most recent common ancestor (TMRCA) of Neandertal and modern human Y chromosomes is ∼588 thousand years ago (kya) (95% confidence interval [CI]: 447–806 kya). This is ∼2.1 (95% CI: 1.7–2.9) times longer than the TMRCA of A00 and other extant modern human Y-chromosome lineages. This estimate suggests that the Y-chromosome divergence mirrors the population divergence of Neandertals and modern human ancestors, and it refutes alternative scenarios of a relatively recent or super-archaic origin of Neandertal Y chromosomes. The fact that the Neandertal Y we describe has never been observed in modern humans suggests that the lineage is most likely extinct. We identify protein-coding differences between Neandertal and modern human Y chromosomes, including potentially damaging changes to PCDH11Y, TMSB4Y, USP9Y, and KDM5D. Three of these changes are missense mutations in genes that produce male-specific minor histocompatibility (H-Y) antigens. Antigens derived from KDM5D, for example, are thought to elicit a maternal immune response during gestation. It is possible that incompatibilities at one or more of these genes played a role in the reproductive isolation of the two groups.

Mendez et al., The Divergence of Neandertal and Modern Human Y Chromosomes, AJHG, Volume 98, Issue 4, p728–734, 7 April 2016, DOI:

Friday, April 1, 2016

Inferring heterozygosity from ancient and low coverage genomes

Interesting new preprint at bioRxiv:

Abstract: While genetic diversity can be quantified accurately from high coverage sequencing, it is often desirable to obtain such estimates from low coverage data, either to save costs or because of low DNA quality as observed for ancient samples. Here we introduce a method to accurately infer heterozygosity probabilistically from very low coverage sequences of a single individual. The method relaxes the infinite sites assumption of previous methods, does not require a reference sequence and takes into account both variable sequencing errors and potential post-mortem damage. It is thus also applicable to non-model organisms and ancient genomes. Since error rates as reported by sequencing machines are generally distorted and require recalibration, we also introduce a method to infer accurately recalibration parameter in the presence of post-mortem damage. This method does also not require knowledge about the underlying genome sequence, but instead works from haploid data (e.g. from the X-chromosome from mammalian males) and integrates over the unknown genotypes. Using extensive simulations we show that a few Mb of haploid data is sufficient for accurate recalibration even at average coverages as low as 1-3x. At similar coverages, out method also produces very accurate estimates of heterozygosity down to $10^{-4}$ within windows of about 1Mb. We further illustrate the usefulness of our approach by inferring genome-wide patterns of diversity for several ancient human samples and found that 3,000-5,000 samples showed diversity patterns comparable to modern humans. In contrast, two European hunter-gatherer samples exhibited not only considerably lower levels of diversity than modern samples, but also highly distinct distributions of diversity along their genomes. Interestingly, these distributions were also very differently between the two samples, supporting earlier conclusions of a highly diverse and structured population in Europe prior to the arrival of farming.

However, I posted this observation under the abstract:

It seems like you're contradicting yourselves with these comments:

"Second, the ancestry of modern Europeans traces only partly back to European hunter gatherers with early Neolithic people from the Aegean (Hofmanová et al. 2015) and Yamnaya steppe herders (Haak et al. 2015) contributing the majority of the modern day genetic make up.


The exceptions were the two European hunter-gatherers that showed patterns very different from both modern samples as well as from one another, further corroborating the view (Jones et al. 2015) that these samples represent different and ancient clades that contributed only marginally to the genetic make-up of modern day Europeans."

Yes, of course Early European farmers (EEF) from the Balkans and the Yamnaya and related groups from the Eastern European steppe make up the majority of modern European genetic structure.

However, Yamnaya are essentially a 50/50 mix of Eastern European hunter-gatherers and Kotias-related hunter-gatherers from the Caucasus. So how can you say hunter-gatherers closely related to Kotias didn't contribute much to the modern European gene pool? They're like ~25% of our genomes.

Also, EEF carry a large chunk of Balkan and Central European hunter-gatherer ancestry very closely related to Bichon (same WHG clade). So again, doesn't that contradict your conclusion?

Kousathanas et al, Inferring heterozygosity from ancient and low coverage genomes, bioRxiv, Posted April 1, 2016, doi: