search this blog

Tuesday, March 28, 2017

"Heavily sex-biased" population dispersals into the Indian Subcontinent (Silva et al. 2017)

And so it begins. BMC Evolutionary Biology has a very interesting, but hardly surprising, new paper by Silva et al. on the population history of the Indian Subcontinent. Emphasis is mine:

Background: India is a patchwork of tribal and non-tribal populations that speak many different languages from various language families. Indo-European, spoken across northern and central India, and also in Pakistan and Bangladesh, has been frequently connected to the so-called “Indo-Aryan invasions” from Central Asia ~3.5 ka and the establishment of the caste system, but the extent of immigration at this time remains extremely controversial. South India, on the other hand, is dominated by Dravidian languages. India displays a high level of endogamy due to its strict social boundaries, and high genetic drift as a result of long-term isolation which, together with a very complex history, makes the genetic study of Indian populations challenging.

Results: We have combined a detailed, high-resolution mitogenome analysis with summaries of autosomal data and Y-chromosome lineages to establish a settlement chronology for the Indian Subcontinent. Maternal lineages document the earliest settlement ~55–65 ka (thousand years ago), and major population shifts in the later Pleistocene that explain previous dating discrepancies and neutrality violation. Whilst current genome-wide analyses conflate all dispersals from Southwest and Central Asia, we were able to tease out from the mitogenome data distinct dispersal episodes dating from between the Last Glacial Maximum to the Bronze Age. Moreover, we found an extremely marked sex bias by comparing the different genetic systems.

Conclusions: Maternal lineages primarily reflect earlier, pre-Holocene processes, and paternal lineages predominantly episodes within the last 10 ka. In particular, genetic influx from Central Asia in the Bronze Age was strongly male-driven, consistent with the patriarchal, patrilocal and patrilineal social structure attributed to the inferred pastoralist early Indo-European society. This was part of a much wider process of Indo-European expansion, with an ultimate source in the Pontic-Caspian region, which carried closely related Y-chromosome lineages, a smaller fraction of autosomal genome-wide variation and an even smaller fraction of mitogenomes across a vast swathe of Eurasia between 5 and 3.5 ka.


There are now sufficient high-quality Y-chromosome data available (especially Poznik et al. [58]) to be able to draw clear conclusions about the timing and direction of dispersal of R1a (Fig. 5). The indigenous South Asian subclades are too young to signal Early Neolithic dispersals from Iran, and strongly support Bronze Age incursions from Central Asia. The derived R1a-Z93 and the further derived R1a-Z94 subclades harbour the bulk of Central and South Asian R1a lineages [55, 58], as well as including some Russian and European lineages, and have been variously dated to 5.6 [4.0;7.3] ka [55], 4.5-5.3 ka with expansions ~4.0-4.5 ka [58], or 4.7 [4.0;5.5] ka (Yfull tree v4.10 [54]). The South Asian R1a-L657, dated to ~4.2 ka [3.3;5.1] (Yfull tree v4.10 [54]]), is the largest (in the 1KG dataset) of several closely related subclades within R1a-Z94 of very similar time depth. Moreover, not only has R1a been found in all Sintashta and Sintashta-derived Andronovo and Srubnaya remains analysed to date at the genome-wide level (nine in total) [76, 77], and been previously identified in a majority of Andronovo (2/3) and post-Andronovo Iron Age (Tagar and Tachtyk: 6/6) male samples from southern central Siberia tested using microsatellite analysis [101], it has also been identified in other remains across Europe and Central Asia ranging from the Mesolithic up until the Iron Age (Fig. 5).

The other major member of haplogroup R in South Asia, R2, shows a strikingly different pattern. It also has deep non-Subcontinental branches, nesting a South Asian specific subclade. But the deep lineages are mainly seen in the eastern part of the Near East, rather than Central Asia or eastern Europe, and the Subcontinental specific subclade is older, dating to ~8 ka [55].

Altogether, therefore, the recently refined Y-chromosome tree strongly suggests that R1a is indeed a highly plausible marker for the long-contested Bronze Age spread of Indo-Aryan speakers into South Asia, although dated aDNA evidence will be needed for a precise estimate of its arrival in various parts of the Subcontinent. aDNA will also be needed to test the hypothesis that there were several streams of Indo-Aryan immigration (each with a different pantheon), for example with the earliest arriving ~3.4 ka and those following the Rigveda several centuries later [12]. Although they are closely related, suggesting they likely spread from a single Central Asian source pool, there do seem to be at least three and probably more R1a founder clades within the Subcontinent [58], consistent with multiple waves of arrival. Genomic Y-chromosome phylogeography is in its infancy compared to mito-genome analysis so it is of course likely that the picture will evolve with sequencing of further South Asian Y-chromosomes, but the picture is already sufficiently clear that we do not expect it to change drastically.

Silva et al., A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals, BMC Evolutionary Biology, Published: 23 March 2017, DOI: 10.1186/s12862-017-0936-9

See also...

On the doorstep of India

Indian confirmation bias

The Poltavka outlier

Caste is in the genes

Sunday, March 26, 2017

The Medieval pilgrim

Recently at PLoS Neglected Tropical Diseases:

Abstract: We have examined the remains of a Pilgrim burial from St Mary Magdalen, Winchester. The individual was a young adult male, aged around 18–25 years at the time of death. Radiocarbon dating showed the remains dated to the late 11th–early 12th centuries, a time when pilgrimages were at their height in Europe. Several lines of evidence in connection with the burial suggested this was an individual of some means and prestige. Although buried within the leprosarium cemetery, the skeleton showed only minimal skeletal evidence for leprosy, which was confined to the bones of the feet and legs. Nonetheless, molecular testing of several skeletal elements, including uninvolved bones all showed robust evidence of DNA from Mycobacterium leprae, consistent with the lepromatous or multibacillary form of the disease. We infer that in life, this individual almost certainly suffered with multiple soft tissue lesions. Genotyping of the M.leprae strain showed this belonged to the 2F lineage, today associated with cases from South-Central and Western Asia. During osteological examination it was noted that the cranium and facial features displayed atypical morphology for northern European populations. Subsequently, geochemical isotopic analyses carried out on tooth enamel indicated that this individual was indeed not local to the Winchester region, although it was not possible to be more specific about their geographic origin.


During analysis, the cranial morphology of the individual was noted as being of an unusual type and unlike other individuals from the cemetery (Fig 4). Therefore, the cranial measurements (S1 Table) were inputted into FORDISC and CRANID, with additional measurements being taken where necessary. The individual was found not to have an affinity with any of the populations contained within the program databases, which do include some from northern Europe, although not Britain. Therefore, the individual could be said not to share a physical affinity with these northern European samples, although this should not be taken as implying anything about their specific identity or origin. Populations that are poorly represented in the database include those from southern Europe and northern Africa (with the exception of Egypt), so there is a possibility that the individual could share physical cranial affinities with such populations, as his cranial morphology does bear similarities to other individuals from British archaeological populations who were also unclassifiable by FORDISC and have been suggested, on isotopic data, to originate from these areas [20]; (Stephany Leach personal communication, 2012).

Citation: Roffey S, Tucker K, Filipek-Ogden K, Montgomery J, Cameron J, O’Connell T, et al. (2017) Investigation of a Medieval Pilgrim Burial Excavated from the Leprosarium of St Mary Magdalen Winchester, UK. PLoS Negl Trop Dis 11(1): e0005186. doi:10.1371/journal.pntd.0005186

Wednesday, March 22, 2017

Trouble in early Mesolithic Iberia

Humans may have dined on other humans during the Epipalaeolithic-Mesolithic transition in Iberia, according to a new paper at the Journal of Anthropological Archaeology.

If true, I wonder if this had anything to do with the spread of the so called Villabruna cluster across Europe at around that time? I'm not suggesting that Villabruna forager bands ate most of the other European foragers, but rather that they coped best with the stresses associated with the Epipalaeolithic-Mesolithic transition.

The paper is behind a pay wall, but the figures can be viewed here.

Abstract: The identification of unarticulated human remains with anthropic marks in archaeological contexts normally involves solving two issues: a general one associated with the analysis and description of the anthropic manipulation marks, and another with regard to the interpretation of their purpose. In this paper we present new evidence of anthropophagic behaviour amongst hunter-gatherer groups of the Mediterranean Mesolithic. A total of 30 human remains with anthropic manipulation marks have been found in the Mesolithic layers of Coves de Santa Maira (Castell de Castells, Alicante, Spain), dating from ca. 10.2–9 cal ky BP. We describe the different marks identified on both human and faunal remains at the site (lithic, tooth, percussion and fire marks on bone cortex). As well as describing these marks, and considering that both human and faunal remains at the site present similar depositional and taphonomic features, this paper also contextualizes them within the archaeological context and subsistence patterns described for Mesolithic groups in the region. We cannot entirely rule out the possibility that these practices may be the result of periodic food stress suffered by the human populations. These anthropophagic events at the site coincide with a cultural change at the regional Epipalaeolithic-Mesolithic transition.

Morales-Pérez et al., Funerary practices or food delicatessen? Human remains with anthropic marks from the Western Mediterranean Mesolithic, Journal of Anthropological Archaeology, Volume 45, March 2017, Pages 115–130,

Saturday, March 18, 2017

Greek confirmation bias

A new paper at the EJHG claims that Slavic admixture in Peloponnesean Greeks averages a few per cent at best (see abstract below). However, I'd say the authors are making two potentially erroneous assumptions: 1) that Slavic invaders arrived in Greece straight from the Slavic homeland, probably located somewhere in East Central or Eastern Europe, and 2) modern-day Northern Slavs (Belarusians, Poles, Russians and Ukrainians) are accurate proxies for these ancient invaders.

Keep in mind that when the Slavs moved into the Balkans during the Early Middle Ages, they routinely absorbed the natives into their bands as free men and women (excellent paper on the topic here). So their numbers swelled thanks to this more southerly, local input, and, at the same time, their genetic structure shifted in a big way, probably from more or less Northern Slavic to modern-day Southern Slavic. Indeed, it's likely that by the time they arrived in the Peloponnese, they were less like this and more like this, or even this.

So was Fallmerayer correct when he theorized that the Peloponnese was totally re-populated by Slavs during the Medieval period? Probably not, but the population shift may still have been profound, and totaling much more than a few per cent.

I can't wait for more ancient DNA from Greece and Italy, especially from the Bronze and Iron Ages. Based on my experiences with many Greeks and Italians, it's sure to be a big eye opener for them, and a beautiful thing.

Abstract: Peloponnese has been one of the cradles of the Classical European civilization and an important contributor to the ancient European history. It has also been the subject of a controversy about the ancestry of its population. In a theory hotly debated by scholars for over 170 years, the German historian Jacob Philipp Fallmerayer proposed that the medieval Peloponneseans were totally extinguished by Slavic and Avar invaders and replaced by Slavic settlers during the 6th century CE. Here we use 2.5 million single-nucleotide polymorphisms to investigate the genetic structure of Peloponnesean populations in a sample of 241 individuals originating from all districts of the peninsula and to examine predictions of the theory of replacement of the medieval Peloponneseans by Slavs. We find considerable heterogeneity of Peloponnesean populations exemplified by genetically distinct subpopulations and by gene flow gradients within Peloponnese. By principal component analysis (PCA) and ADMIXTURE analysis the Peloponneseans are clearly distinguishable from the populations of the Slavic homeland and are very similar to Sicilians and Italians. Using a novel method of quantitative analysis of ADMIXTURE output we find that the Slavic ancestry of Peloponnesean subpopulations ranges from 0.2 to 14.4%. Subpopulations considered by Fallmerayer to be Slavic tribes or to have Near Eastern origin, have no significant ancestry of either. This study rejects the theory of extinction of medieval Peloponneseans and illustrates how genetics can clarify important aspects of the history of a human population.

Stamatoyannopoulos et al., Genetics of the peloponnesean populations and the theory of extinction of the medieval peloponnesean Greeks, European Journal of Human Genetics advance online publication 8 March 2017; doi: 10.1038/ejhg.2017.18

See also...

Greeks in a Longobard cemetery

Friday, March 17, 2017

Yamnaya X chromosomes

In this analysis I'm using the same qpAdm method and almost the same reference samples as Lazaridis & Reich 2017. However, to improve the resolution, in the right pops (or outgroups) I added European Late Upper Paleolithic forager Villabruna, and dropped the low quality Siberian Late Upper Paleolithic forager AfontovaGora3. Also, I ran tests with and without the allsnps: YES flag.

In the left pops, apart from test group Steppe_EMBA (Early Middle Bronze Age steppe conglomerate made up of closely related Afanasievo, Poltavka and Yamnaya samples), we have the putative ancestral populations: Eastern European Hunter-Gatherers (EHG), Caucasus Hunter-Gatherers (CHG), Kura-Araxes (Armenia_EBA), a Chalcolithic Anatolian (Anatolia_ChL), Chalcolithic Armenians (Armenia_ChL), and/or Chalcolithic farmers from Iran (Iran_ChL).

As far as I can tell, these are the best statistical fits with the X chromosome and genome-wide data, respectively. Feel free to set me straight; the full output is in a zip file here.


Steppe_EMBA X
CHG 0.617±0.178
EHG 0.383±0.178
chisq 1.868 taildiff 0.93139015
allsnps: YES

Anatolia_ChL 0.139±0.050
CHG 0.356±0.063
EHG 0.505±0.025
chisq 5.084 taildiff 0.405658017
allsnps: YES

In my opinion, despite the relatively low resolution of the X chromosome analysis, the Steppe_EMBA X chromosomes show a strong southern, in particular CHG, character, which suggests that CHG admixture into Steppe_EMBA was mediated largely via female gene flow.

Interestingly, in one of the models, the Steppe_EMBA X chromosomes are fitted successfully as a two-way mixture of CHG and Iran_ChL (see here). It's impossible to model Steppe_EMBA in such a way with genome-wide data (for instance, see here and here).

Wednesday, March 15, 2017

Failure to replicate

Just in at bioRxiv:

We fail to replicate a genetic signal for sex bias in the steppe migration to central Europe after ~5,000 years proposed by Goldberg et al. PNAS 114(10):2657-2662. Estimation of X-chromosome steppe ancestry in the Bronze Age central European population with the qpAdm method (Haak et al. Nature 522, 207-11) does not indicate lower steppe ancestry on the X-chromosome than in the autosomes. We perform a simulation which indicates presence of estimation bias of -19.5% in the inference of X-chromosome admixture proportions using the method used by Goldberg et al., largely eliminating the observed sex bias.

Iosif Lazaridis, David Reich, Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe, bioRxiv, Posted March 14, 2017, doi:

Update 04/04/2017: Goldberg et al. reply:

Comparing the sex-specifically inherited X chromosome to the autosomes in ancient genetic samples, we (1) studied sex-specific admixture for two prehistoric migrations. For each migration, we used several admixture estimation procedures, including ADMIXTURE model-based clustering (2), comparing X-chromosomal and autosomal ancestry in contemporaneous Central Europeans, interpreting greater admixture from the migrating population on the autosomes as male-biased migration. For migration into late Neolithic/Bronze Age Central Europeans (BA) from the Pontic-Caspian steppe (SP), we inferred male-biased admixture at 5-14 males per migrating female. Lazaridis & Reich (3) contest this male-biased migration claim. For simulated individuals, they claim that ADMIXTURE provides biased X-chromosomal ancestry estimates. They argue that if the bias is taken into account, then X-chromosomal steppe ancestry is similar to our autosomal ancestry estimate, and that hence, steppe male and female contributions are similar. We conduct simulations of ancient and modern data under a range of conditions. We conclude that our inference of male-biased Pontic-Caspian steppe migration, seen using ADMIXTURE, STRUCTURE, mechanistic simulations, and X/autosomal FST, is robust. Our analysis further illuminates the impact of small haploid reference samples on ADMIXTURE; we look forward to refining sex-specific migration estimates as larger, higher-coverage ancient samples become available.

Goldberg et al., Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe, bioRxiv, Posted April 3, 2017, doi:

Tuesday, March 14, 2017

Epic fail

This is the somewhat dubious conclusion from a new paper by Balanovsky et al. at Human Genetics dealing with, amongst other things, the Y-chromosomes of the Early Bronze Age Yamnaya people:

The currently available dataset does not contradict the hypothesis that R-GG400 marks a link between the East European steppe dwellers and West Asians, though the route and even direction of this migration is disputable. It does, however, demonstrate that present-day West European R1b chromosomes do not originate from the Yamnaya populations analyzed in (Haak et al. 2015; Mathieson et al. 2015) and raises the question of their origin. A Bronze Age origin is more likely than a Neolithic one (Balaresque et al. 2010), but further ancient DNA studies may be necessary to identify this source.

More to the point, the authors are trying to argue the following two rather far-fetched and tenuous positions:

- R1b-GG400, the most common Y-haplogroup in Yamnaya samples sequenced to date, moved into Eastern Europe from West Asia, and therefore the Indo-European homeland was in West Asia

- there was no massive Kurgan expansion deep into Europe from the Pontic-Caspian Steppe, because the most common type of R1b in much of Europe is R1b-L51 and not R1b-GG400.

What they're ignoring is that a wide range of European Upper Paleolithic and Mesolithic foragers, mostly from Eastern Europe, belong to R1b, including R1b-P297, the ancestral lineage to both R1b-GG400 and R1b-L51 (see here and here). On the other hand, not a single West Asian forager or even Neolithic farmer as yet belongs to R1b (see here).

Hence, even though it's still possible that R1b-GG400 moved into Eastern Europe from West Asia, it's no longer a parsimonious or convincing theory because it's contradicted by direct evidence from currently available ancient DNA.

The authors are also ignoring very solid evidence from genome-wide data that Yamnaya, or closely related populations from the Pontic-Caspian Steppe, contributed in a big way to the ethnogenesis of modern-day Europeans. Considering that R1b-L51 is a sister clade of R1b-GG400, it's only logical to think that it could have been one of the main Y-chromosome haplogroups associated with this event.

The paper has some nice data and maps, but it's an epic fail as a whole, because it's basically an exercise in confirmation bias.


Balanovsky, O., Chukhryaeva, M., Zaporozhchenko, V. et al., Genetic differentiation between upland and lowland populations shapes the Y-chromosomal landscape of West Asia, Hum Genet (2017). doi:10.1007/s00439-017-1770-2

Sunday, March 12, 2017

Eastern Scythians = Steppe_MLBA + East Eurasians

OK, I said I wasn't going to make any bold statements in regards to this issue until we see more ancient genomes from Central Asia, but I'm pretty sure now that the steppe ancestry in the eastern Scythians from Unterländer et al. is mostly of the Steppe Middle Late Bronze Age (Steppe_MLBA) kind, rather than the Steppe Early Middle Bronze Age (Steppe_EMBA) kind.

For background info, refer to the discussion in the comments here. Now, check out the graph below (based on the datasheet here). I see four things when I look at this model:

- Steppe_MLBA and Steppe_EMBA are different because the former show excess Central European Middle Neolithic (Central_MN) affinity, and thus cluster at the top of the graph and above the line of best fit, while the latter show excess Caucasus Hunter-Gatherer (Caucasus_HG) affinity, and so cluster at the top of the graph but below the line of best fit

- Indo-Aryan-speaking South Asians fall below the line of best fit, which suggests that they don't have much, if any, Central_MN ancestry, so they're probably largely of Steppe_EMBA origin (though their Iran Neolithic-related farmer ancestry might be skewing things to some extent here, because it's more closely related to Caucasus_HG than to Central_MN)

- Both the ancient and most modern-day Eastern Iranian-speakers (Sarmatians and Pamir Tajiks, respectively) more or less hug the line of best fit, suggesting that they're a mixture of Steppe_MLBA and Steppe_EMBA

- all of the Scythians fall above the line of best fit, suggesting that their steppe ancestry largely derives from Steppe_MLBA.

As per point 2, it's possible that the outcomes for the South and also Central Asians are skewed by their Iran Neolithic-related farmer ancestry, but this shouldn't be much of an issue for the eastern Scythians, and if it is, then in fact their Central_MN/Steppe_MLBA affinity is being underestimated here.

Moreover, word around the campfire is that the R1a-Z93 in the eastern Scythian bam files is of the same type as in the Sintashta samples (Z2124+). Not 100% sure if that's true, but it might well be, because it lines up very nicely with the above graph.


Unterländer et al., Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe, Nature Communications 8, Article number: 14615 (2017), doi:10.1038/ncomms14615

Tuesday, March 7, 2017

North Pontic Steppe Scythians: heirs of the Srubnaya people

Open access at Scientific Reports. Emphasis is mine:

Abstract: Scythians were nomadic and semi-nomadic people that ruled the Eurasian steppe during much of the first millennium BCE. While having been extensively studied by archaeology, very little is known about their genetic identity. To fill this gap, we analyzed ancient mitochondrial DNA (mtDNA) from Scythians of the North Pontic Region (NPR) and successfully retrieved 19 whole mtDNA genomes. We have identified three potential mtDNA lineage ancestries of the NPR Scythians tracing back to hunter-gatherer and nomadic populations of east and west Eurasia as well as the Neolithic farming expansion into Europe. One third of all mt lineages in our dataset belonged to subdivisions of mt haplogroup U5. A comparison of NPR Scythian mtDNA linages with other contemporaneous Scythian groups, the Saka and the Pazyryks, reveals a common mtDNA package comprised of haplogroups H/H5, U5a, A, D/D4, and F1/F2. Of these, west Eurasian lineages show a downward cline in the west-east direction while east Eurasian haplogroups display the opposite trajectory. An overall similarity in mtDNA lineages of the NPR Scythians was found with the late Bronze Age Srubnaya population of the Northern Black Sea region which supports the archaeological hypothesis suggesting Srubnaya people as ancestors of the NPR Scythians.


Mitochondrial lineages in the NPR Scythians analyzed in this study appear to consist of a mixture of west and east Eurasian haplogroups. West Eurasian lineages were represented by subdivisions of haplogroup U5 (U5a2a1, U5a1a1, U5a1a2b, U5a2b, U5a1b, U5b2a1a2, six individuals total, 31.6%), H (H and H5b, three individuals total, 15.8%), J (J1c2 and J2b1a6, two individuals, 10.5%), as well as haplogroups N1b1a, W3a and T2b (one individual each, 5.3% each specimen). East Eurasian mt lineages were represented by haplogroups A, D4j2, F1b, M10a1a1a, and H8c (represented by a single individual), in total, comprising 26.3% of our sample set.

Juras, A. et al. Diverse origin of mitochondrial lineages in Iron Age Black Sea Scythians. Sci. Rep. 7, 43950; doi: 10.1038/srep43950 (2017).

See also...

Cimmerians, Scythians and Sarmatians came from...

Genetic origins and legacy of the Scythians and Sarmatians

Neolithic Europe: it's complicated (Lipson et al. 2017 preprint)

The dam is breaking. Just in at bioRxiv:

Abstract: Ancient DNA studies have established that European Neolithic populations were descended from Anatolian migrants who received a limited amount of admixture from resident hunter-gatherers. Many open questions remain, however, about the spatial and temporal dynamics of population interactions and admixture during the Neolithic period. Using the highest-resolution genome-wide ancient DNA data set assembled to date---a total of 177 samples, 127 newly reported here, from the Neolithic and Chalcolithic of Hungary (6000-2900 BCE, n = 98), Germany (5500-3000 BCE, n = 42), and Spain (5500-2200 BCE, n = 37)---we investigate the population dynamics of Neolithization across Europe. We find that genetic diversity was shaped predominantly by local processes, with varied sources and proportions of hunter-gatherer ancestry among the three regions and through time. Admixture between groups with different ancestry profiles was pervasive and resulted in observable population transformation across almost all cultural transitions. Our results shed new light on the ways that gene flow reshaped European populations throughout the Neolithic period and demonstrate the potential of time-series-based sampling and modeling approaches to elucidate multiple dimensions of historical population interactions.

Lipson et al., Parallel ancient genomic transects reveal complex population history of early European farmers, bioRxiv, Posted March 6, 2017, doi:

Update 08/03/2017: In fact, there's nothing overly complicated in this manuscript. The table below says it all: Neolithic farmers across space and time in most of Europe were very closely related, and only differed in their levels of Western Hunter-Gatherer (HG) admixture. Admittedly, things would look a lot simpler if not for that somewhat unexpected R, R1 and R1b1 in Middle Neolithic Germany, but this doesn't appear to be a game changer, and is not flagged as such in the preprint.

Sunday, March 5, 2017

Scythians and Sarmatians in the Global 10

The Global 10 datasheet now includes the new Scythian and Sarmatian samples from Unterländer et al. 2017. They're freely available at the Reich lab datasets page. Here they are on the Global 10 genetic map.

This may have been pointed out in the paper, but what I find intriguing is that the Scythians from the Zevakino-Chilikta group look somewhat different from the rest, because instead of falling on the Europe-Siberia cline, they fall on the Europe-Central Asia cline. Not sure what that's about yet; might be worth investigating.

See also...

Global 10: A fresh look at global genetic diversity

Saturday, March 4, 2017

Modern-day Europeans: a post-Neolithic product

There's a new preprint at bioRxiv looking at the relationship between ancient and modern-day Europeans. I think it misses its mark, because the author concludes that the Neolithic transition created the modern-day European gene pool.

This is only partly true, because modern-day Europeans are in fact, by and large, the product of intense Indo-European expansions from the Late Neolithic to the Migration period.

Just take a look the Y-haplogroup landscape in much of Europe and you'll see that our direct ancestors did not mostly spring from Neolithic farming communities. If you want to find them in the ancient DNA record, then seek out post-Neolithic populations rich in R1b-L51, R1a-Z645 and I1-M253.

By the way, the author uses Mormons from Utah (also known as CEU) to represent Europeans. I don't know if this is a problem, it might well be, but in any case, why Utah Mormons? Why not a wide variety of actual Europeans all the way from the Atlantic to the Urals? They're freely available online nowadays.

Abstract: Genetic material sequenced from ancient samples is revolutionizing our understanding of the recent evolutionary past. However, ancient DNA is often degraded, resulting in low coverage, error-prone sequencing. Several solutions exist to this problem, ranging from simple approach such as selecting a read at random for each site to more complicated approaches involving genotype likelihoods. In this work, we present a novel method for assessing the relationship of an ancient sample with a modern population while accounting for sequencing error by analyzing raw read from multiple ancient individuals simultaneously. We show that when analyzing SNP data, it is better to sequencing more ancient samples to low coverage: two samples sequenced to 0.5x coverage provide better resolution than a single sample sequenced to 2x coverage. We also examined the power to detect whether an ancient sample is directly ancestral to a modern population, finding that with even a few high coverage individuals, even ancient samples that are very slightly diverged from the modern population can be detected with ease. When we applied our approach to European samples, we found that no ancient samples represent direct ancestors of modern Europeans. We also found that, as shown previously, the most ancient Europeans appear to have had the smallest effective population sizes, indicating a role for agriculture in modern population growth.

Joshua Schraiber, Assessing the relationship of ancient and modern populations, bioRxiv, Posted March 4, 2017, doi:

Friday, March 3, 2017

The genetic history of Northern Europe (or rather the South Baltic)

A second preprint in only a few days on the Neolithic transition in the Baltic region has just appeared at bioRxiv: Mittnik et al. 2017. You can read about the first one here. Keep in mind also that we recently saw a paper on the same topic at Current Biology.

Can't these labs coordinate things a little better and perhaps focus on different parts of Europe? Wouldn't that be the sensible thing to do considering the limited funding for ancient DNA research?

Nevertheless, Mittnik et al. is an important addition to what we've already seen, for me mainly because it shows that largely unadmixed Western Hunter-Gatherers (WHG) lived in the South Baltic region at least as late as ~4,450 calBCE, which is the date assigned to the four Narva samples in the preprint. So now we have a plausible explanation for the inflated WHG-related ancestry in modern-day Balts and Northern Slavs.

Despite its geographically vicinity to EHG [Eastern Hunter-Gatherers], the eastern Baltic individual associated with the Mesolithic Kunda culture shows a very close affinity to WHG in all our analyses, with a small but significant contribution from EHG or SHG [Scandinavian Hunter-Gatherer], as revealed by significant D-statistics of the form D(Kunda, WHG; EHG/SHG, Mbuti) (Z>3; Supplementary Information Table S2).


The results for the Kunda individual are mirrored in the four later eastern Baltic Neolithic hunter-gatherers of the Narva culture (Fig. 2) and further supported by the lack of significantly positive results for the D-statistic D(Narva, Kunda; X, Mbuti) (Supplementary Information Table S2) demonstrating population continuity at the transition from Mesolithic to Neolithic, which in the eastern Baltic region is signified by a change in networks of contacts and the use of pottery rather than a stark shift in economy as seen in Central and Southern Europe [15].


Furthermore, the individual Spiginas2, which is dated to the very end of the Late Neolithic, has a higher proportion of the hunter-gatherer ancestry, as seen in ADMIXTURE (darker blue component in Fig. 2b), and is estimated to be admixed between 78±4% Central European CWC and 22±4% Narva (Supplementary Information Table S6). A reliance on marine resources persisted especially in the north-eastern Baltic region until the end of the Late Neolithic [29] and in combination with the proposed large population size for Baltic hunter-gatherers a ‘resurgence’ of hunter-gatherer ancestry in the local population through admixture between foraging and farming groups is likely, and has been described for the European Middle Neolithic [2,30].

The only gripe I have with this manuscript are the Principal Component Analyses (PCA). They just look messy and appear to suffer from projection bias, so they're hard to read and probably confusing for a lot of people.

Projection bias is also known as shrinkage. Basically it's when the PCA space shrinks for the projected samples compared to the reference samples. It happens a lot in ancient DNA papers. I find it irritating. But whenever I bring up this issue with authors of these papers, I'm basically told that their PCA look like other PCA from similar papers, so there's no problem. So, essentially, since everybody's doing it wrong, then it's the right way to do it. Awesome logic there.


Mittnik et al., The Genetic History of Northern Europe, bioRxiv, Posted March 3, 2017, doi:

Update 08/01/2018: the paper was published at Nature Communications today under the title The genetic prehistory of the Baltic Sea region (see here).

See also...

Modern-day Poles vs Bronze Age peoples of the East Baltic

Genetic origins and legacy of the Scythians and Sarmatians

Nature Communications has a new paleogenetic paper focusing on Iron Age steppe nomads. Emphasis is mine:

Abstract: During the 1st millennium before the Common Era (BCE), nomadic tribes associated with the Iron Age Scythian culture spread over the Eurasian Steppe, covering a territory of more than 3,500 km in breadth. To understand the demographic processes behind the spread of the Scythian culture, we analysed genomic data from eight individuals and a mitochondrial dataset of 96 individuals originating in eastern and western parts of the Eurasian Steppe. Genomic inference reveals that Scythians in the east and the west of the steppe zone can best be described as a mixture of Yamnaya-related ancestry and an East Asian component. Demographic modelling suggests independent origins for eastern and western groups with ongoing gene-flow between them, plausibly explaining the striking uniformity of their material culture. We also find evidence that significant gene-flow from east to west Eurasia must have occurred early during the Iron Age.


In the East, we find a balanced mixture of mitochondrial lineages found today predominantly in west Eurasians, including a significant proportion of prehistoric hunter-gatherer lineages, and lineages that are at high frequency in modern Central and East Asians already in the earliest Iron Age individuals dating to the ninth to seventh century BCE and an even earlier mtDNA sample from Bronze Age Mongolia [49]. Typical west Eurasian mtDNA lineages are also present in the Tarim Basin [16] and Kazakhstan [8] and were even predominant in the Krasnoyarsk area during the 2nd millennium BCE [31]. This pattern points to an admixture process between west and east Eurasian populations that began in earlier periods, certainly before the 1 st millennium BCE [13,50], a finding consistent with a recent study suggesting the carriers of the Yamnaya culture are genetically indistinguishable from the Afanasievo culture peoples of the Altai-Sayan region. This further implies that carriers of the Yamnaya culture migrated not only into Europe [26] but also eastward, carrying west Eurasian genes—and potentially also Indo-European languages—to this region [17]. All of these observations provide evidence that the prevalent genetic pattern does not simply follow an isolation-by-distance model but involves significant gene flow over large distances.

All Iron Age individuals investigated in this study show genomic evidence for Caucasus hunter-gatherer and Eastern European hunter-gatherer ancestry. This is consistent with the idea that the blend of EHG and Caucasian elements in carriers of the Yamnaya culture was formed on the European steppe and exported into Central Asia and Siberia [26]. All of our analyses support the hypothesis that the genetic composition of the Scythians can best be described as a mixture of Yamnaya-related ancestry and East Asian/north Siberian elements.

Concerning the legacy of the Iron Age nomads, we find that modern human populations with a close genetic relationship to the Scythian groups are predominantly located in close geographic proximity to the sampled burial sites, suggesting a degree of population continuity through historical times. Contemporary descendants of western Scythian groups are found among various groups in the Caucasus and Central Asia, while similarities to eastern Scythian are found to be more widespread, but almost exclusively among Turkic language speaking (formerly) nomadic groups, particularly from the Kipchak branch of Turkic languages (Supplementary Note 1). The genealogical link between eastern Scythians and Turkic language speakers requires further investigation, particularly as the expansion of Turkic languages was thought to be much more recent—that is, sixth century CE onwards—and to have occurred through an elite expansion process.

Unterländer et al., Ancestry and demography and descendants of Iron Age nomads of the Eurasian Steppe, Nature Communications 8, Article number: 14615 (2017), doi:10.1038/ncomms14615

See also...

Eastern Scythians = Steppe MLBA + East Eurasians

Cimmerians, Scythians and Sarmatians came from...

Thursday, March 2, 2017

Baltic Corded Ware: rich in R1a-Z645

An important preprint has just appeared at bioRxiv. It includes ancient DNA from four Estonian Corded Ware Culture (CWC) individuals from two different sites.

These CWC samples belong to Y-haplogroup R1a, and more specifically to its R1a-Z645 clade, which encompasses almost all R1a lineages in the world today, including in South Asia, despite a relatively recent coalescent time of 5,400 yr BP. One of the samples is further classified as belonging to R1a-Z283. The vast majority of modern-day European R1a belongs to this clade.

The new data also include a Comb Ceramic Culture (CCC) male that belongs to R1a5-­YP1272. This might be an extinct line, or one that is now extremely rare in Eastern Europe. From the paper:

All four of the Estonian CWC individuals could be assigned to the R1a-Z645 sub-clade of hg R1a-M417 which together with N is one of the most common Y chromosome haplogroups in present-day Estonians (33%) [44] . Importantly, this R1a lineage is only distantly related to the R1a5 lineage we found in the CCC sample. The finding of high frequency of R1a-M417 in Estonian CWC samples is consistent with the observations made for other Corded Ware sites that, along with Late Bronze Age remains associated with Sintashta Culture, also show high frequency of hg R1a-M417 [2,25].


The coalescent time for the R1a-Z645 clade, estimated from modern data at 5,400 yr BP (95% CI 4,950–6,000) 43 , predates the time when the CWC individuals carrying the R1a-Z645 lineages lived in Estonia (4,000–4,800 yr BP). The fact that all four of the CWC male individuals from two distinct sites in Estonia belonged to this recently expanded R1a branch, different from the one carried by CCC, suggests that admixture between CWC farmers and CCC hunter-gatherers may have been limited at least in the male lineages during the early stages of farming in Estonia.

Now, can anyone explain to me how the authors came to this conclusion? Was it based on their ADMIXTURE output?

Furthermore, the presence of a genetic component associated with Caucasus hunter-gatherers and later with people representing the Yamnaya Culture in Eastern hunter-gatherers and Estonian CCC individuals means that the expansion of the CWC cannot be seen as the sole means for the spread of this genetic component, at least in Eastern Europe.

If it is indeed based on ADMIXTURE, then they really need to back it up with some robust formal stats and qpAdm, because ADMIXTURE is not a formal mixture test.

Moreover, they used the projection (P) option in their ADMIXTURE analysis. I'm not a huge fan of this option when running fine scale intra-continental analyses, because I find that it usually results in severe projection bias. In other words, the test samples are treated differently from the reference samples, and essentially show results that they shouldn't.

Speaking of projection bias, I'm quite certain that their Principal Component Analysis (PCA) suffers from it. The ancient samples look like they're being pulled into the middle of the plot, so much so that one of the foragers basically clusters with modern-day Lithuanians, while the CWC individuals appear too western. They need to fix this.

I do note that the authors used the lsqproject option when running their PCA. A lot of people assume that once they do this they've taken care of projection bias. This is not so. lsqproject doesn't solve this problem; it just makes sure that missing markers don't skew the projection.


Saag et al., Extensive farming in Estonia started through a sex-biased migration from the Steppe, bioRxiv, March 2, 2017, doi:

Confirmation bias

Every time I put up a thread that is even remotely linked to the Proto-Indo-European (PIE) homeland debate it gets hijacked by people who appear to have a pathological hate for the Kurgan PIE theory.

You'd think that considering the latest ancient DNA results from across Eurasia, which have thus far been very favorable to the Kurgan theory, these people would pipe down a little, at least for the time being, until something shows up that genuinely supports their stance. But nope.

The amount of confirmation bias in such threads is phenomenal. I'm going to start blocking and deleting the worst examples of this nonsense from now on. I'd also urge all reasonable and objective commentators here to try and ignore such comments, so that the offenders are left with no one to talk to.

If you're not quite sure how to spot an off-the-dial confirmation bias effort, here's an example from the last thread. I couldn't be bothered replying to this claptrap initially, but I will now, just to illustrate how off the mark it really is.

The truth is coming out.

Wonderful. Let's hear it.

EBA in South Asians is closer to Afanasievo than to Andronovo.

Maybe, but Afanasievo and Andronovo genomes aren't all that different, and at the moment we only have four Andronovo individuals, presumably from elite burials.

Who knows what more sampling from the territory of the vast Andronovo horizon, and a wider cross section of the Andronovo population, is going to uncover? We might find Andronovo samples that are perfect proxies for the Early Bronze Age (EBA) steppe admixture in South Asians; better than Afanasievo even.

Andronovo is different from Afanasievo because of Western/Caucasus expansions.

No, actually, Andronovo has Middle Neolithic farmer admixture from deep in Europe that Afanasievo lacks, or at least has much less of. Considering the preponderance of Y-haplogroup R1a in Andronovo remains, this admixture was probably mediated via female gene flow at the western edge of the Western Steppe, not near the Caucasus.

Afanasievo is leaning exclusively (for now) R1b. South Asians do not have Andronovo DNA.

As per above, South Asians may well have Andronovo ancestry. Or they may have ancestry from an R1a-rich sister group to the early Corded Ware population, which, based on an early Baltic Corded Ware genome, was in all likelihood basically identical to Afanasievo and Yamnaya in terms of genome-wide genetic structure (see here).

Ergo, EBA in South Asians did not come from Afanasievo. And R1a did not arrive in South Asia with Yamnayans or Andronovo/Afan.

But like I say, R1a may have arrived in South Asia with a Corded Ware-related population basically identical to Afanasievo and Yamnaya. Or with an Andronovo group basically identical to Afanasievo and Yamnaya.

This leads to at least three options:

1) PIE did not come from R1a or R1b, but J2. @Nirjhar J2 is present is UC Indians.

Y-haplogroups don't speak languages, so there's that. But we might be able to say, with a high degree of confidence, which Y-haplogroups were common in the PIE community based on their frequencies in ancient and modern-day populations.

Clearly, as things stand, the best candidates for so called PIE markers are R1a and R1b, and probably more specifically R1a-M417 and R1b-M269. J2 is a poor candidate for an PIE marker. See here: Y-hg J2 cannot be a Proto-Indo-European marker

2) R1a is source of PIE and did not come from Yamnaya, Andronovo, Afanasievo.

Well, most of the R1a in the world today may well be from an as yet unsampled Yamnaya population from the Pontic Steppe north of the Black Sea.

3) PIE came from several haplogroups, likely Caucasus area.

But why, considering the predominance of R1a and R1b on the ancient steppe and in modern-day speakers of Indo-European languages?

There are two potential locations for R1a. And I'm fine with both because they've made sense from the very beginning, unlike Yamnaya.

You mean the two potential main expansion points for most of the R1a in the world today? Surely either the Pontic Steppe or the Caspian Steppe?

One thing certain, R1a does not come from anywhere near Europe.

Hard to say where R1a comes from originally. My bet is that it was born in Upper Paleolithic Siberia, on the Mammoth Steppe that straddled Europe and Asia, possibly at a location very close to Europe.

R1a was the invasion that pushed R1b to the oceans of the Atlantic.

Actually, it seems that R1a-rich and R1b-rich steppe clans did their own thing when expanding into Asia and Europe during the Eneolithic and Early Bronze Age, and there's no evidence that they got in each other's way.

It's only during the Middle and Late Bronze Age that we see a population shift from R1b-rich to R1a-rich groups on the Caspian Steppe. This shift may have been accompanied by violence and language change, but if so, it's likely that one Indo-European language replaced another.

Also, please note that this R1a-rich population came from somewhere west of the Caspian Steppe, possibly the Pontic Steppe or the nearby forest steppe, because it had a higher level of Middle Neolithic European farmer admixture than the R1b-rich population that it replaced. So it's impossible to posit that this was an invasion from Asia, that pushed the R1b-rich population to the Atlantic.

This reply didn't take me long to put together. All I did was knock down the proverbial straw man over and over again. Easy work. But, at the same time, irritating and depressing.

Main take away point: if someone claims to know the "truth", chances are they're full of shit.

Wednesday, March 1, 2017

R1b-M269 in Afanasievo

Back in 2015, Allentoft et al. published four Afanasievo genomes that finally confirmed beyond any doubt that the enigmatic Afanasievo people were migrants to the Altai region from Eastern Europe.

However, all four samples came from female remains, which left us wondering about the Y-haplogroup composition of the Afanasievo population. As it turns out, a French study from 2014 found that three Afanasievo individuals belonged to R1b, with two classified as R1b-M269 (see here).

There's nothing remarkable about this, considering that the above mentioned four Afanasievo samples look essentially identical to M269-rich eastern Yamnaya samples from Kalmykia and Samara in terms of genome-wide genetic structure (for instance, see here). But it's a useful bit of info that has somehow eluded us all until now. Thanks to Kristiina for the find


Clémence Hollard. Peuplement du sud de la Sibérie et de l'Altaï à l'âge du Bronze : apport de la paléogénétique. Paléontologie. Université de Strasbourg, 2014. Français. NNT : 2014STRAJ002. tel-01296484