search this blog

Sunday, August 28, 2016

Ancient vs modern day West Eurasian variation

The Principal Component Analyses (PCA) with ancient samples that I post on this blog are amongst the most accurate and best examples of their kind that you'll see anywhere. That's not just wishful thinking; it's a fact.

My PCA don't suffer from projection bias or shrinkage, which is a handicap of PCA in many ancient DNA papers, and they're run only on observed (rather than imputed) genotypes.

However, even my PCA are far from perfect, because they're based entirely on present-day variation. In other words, I still project the ancients onto eigenvectors computed with modern day reference samples. I guess that's the equivalent of putting the cart before the horse, when originally the horse may have been a donkey, or something like that.

Nevertheless, it's the only sensible way to plot heavily degraded ancient samples with a lot of missing data. But it does often leave me wondering whether the output says anything useful about the ancient world?

Thanks to the recent release of a lot of fairly high quality ancient genotype data from West Eurasia (most of it freely available at the Reich Lab website here), I can now test how well my trademark PCA of ancient West Eurasia reflects reality.

Below are two PCA featuring ancient composite samples. The first PCA is based on ~650,000 SNPs, with 100% call rates in each of the composites. For the second PCA I pruned the markers to correct for LD or linkage, and also made sure that about half of the SNPs were from transversion sites, which are less likely to be affected by postmortem damage. That left ~125,000, hopefully relatively high quality, SNPs.

Obviously, the plots are very similar, which makes me wonder whether there's any point thinning the markers when running decent quality ancient sequences? The datasheets are available for download here and here.

Now, below is a recent example of my PCA of ancient West Eurasia. It's basically almost identical to the plots above. This is very cool, and also very important, because it means that my strategy for running PCA with ancient samples produces solid and relevant results.

Interestingly, on closer inspection, the distance between the western and eastern Neolithic farmers on the first two plots appears bloated. Conversely, the distances between the northern Hunter-Gatherer (HG) samples are somewhat reduced. Any ideas why?

Update 31/08/2016: Open Genomes generated a 3D plot based on a new PCA datasheet that I posted in the comments. Click on the image below to check it out.

Update 01/09/2016: I added present-day samples to the PCA. Very happy with the outcome. The relevant datasheet is available here.

See also...

Ust'-Ishim man x2

Wednesday, August 24, 2016

On the remarkable genetic homogeneity of Denmark

Open access at Genetics:

Abstract: Denmark has played a substantial role in the history of Northern Europe. Through a nationwide scientific outreach initiative, we collected genetic and anthropometrical data from ~800 high school students and used them to elucidate the genetic makeup of the Danish population, as well as to assess polygenic predictions of phenotypic traits in adolescents. We observed remarkable homogeneity across different geographic regions, although we could still detect weak signals of genetic structure reflecting the history of the country. Denmark presented genomic affinity with primarily neighboring countries with overall resemblance of decreasing weight from Britain, Sweden, Norway, Germany and France. A Polish admixture signal was detected in Zealand and Funen and our date estimates coincided with historical evidence of Wend settlements in the south of Denmark. We also observed considerably diverse demographic histories among Scandinavian countries, with Denmark having the smallest current effective population size compared to Norway and Sweden. Finally, we found that polygenic prediction of self-reported adolescent height in the population was remarkably accurate (R2 = 0.639±0.015). The high homogeneity of the Danish population could render population structure a lesser concern for the upcoming large-scale gene-mapping studies in the country.

Athanasiadis et al., Nationwide Genomic Study in Denmark Reveals Remarkable Population Homogeneity, Genetics Early online August 17, 2016; DOI: 10.1534/genetics.116.189241

Friday, August 19, 2016

Maybe first direct hints of Yamnaya-related gene flow into South Central Asia

Unfortunately, this is just an abstract for a presentation poster from the upcoming 6th DNA Polymorphisms in Human Populations conference in Paris. However, it might be important because, as far as I know, it's the first ancient DNA report supporting the idea that Bronze Age herders from the Eastern European steppe had a profound impact on the ancient populations of South Central Asia.

At the end of the Bronze Age, the proto-urban Oxus Civilisation in Southern Central Asia (Uzbekistan, Turkmenistan) disappeared and was replaced by Iron Age Yaz Cultures. Environmental changes such as aridification and geopolitical reasons are called for to explain this cultural transition. However, evidences of settlements from Andronovo populations during the late Bronze Age suggest that this transition was associated with migrations from northern steppe populations. Indeed, palaeogenetic studies (Allentoft et al., 2015; Haak et al., 2015) have already shown that gene flow from Yamnaya steppe populations occurred in Europe and Altai at the end of the Neolithic, suggesting that the steppe inhabitants spoke indo-european langages.

To investigate the role of migrations in the Bronze Age/Iron Age transition in Southern Central Asia, we turned to palaeogenetic studies. DNA was extracted from 17 skeletons excavated in Ulug Depe (Turkmenistan) archaeological site. The hypervariable region I of the mitochondrial (mt) genome was sequenced for 6 individuals from the Bronze Age and 4 from the Iron Age.

Criteria of authentication for ancient DNA were met: experiments were done in a clean room dedicated to ancient DNA analysis, and blank DNA extraction and PCR controls were performed. Indeed, we observed DNA damages specific for ancient DNA and an inverse correlation between the efficiency of the PCR and the length of the amplified DNA fragment. Thus, we first evidenced the preservation of ancient DNA in Southern Central Asia. After sequencing and assignment of individuals to human mitochondrial haplotypes, a high diversity of haplotypes at Ulug Depe was observed. All the haplogroups found in Ulug Depe belong to modern western Eurasian populations.

Haplogroups shared between steppe populations and Ulug Depe were evidenced, suggesting gene flow between Southern Central Asia and the Steppe. Genetic data suggest a close relationship between Yamnaya related populations and Iron Age Ulug Depe population. However, no significant genetic discontinuity between Bronze and Iron Age was shown, that may be due to a limited sample dataset and calls for nuclear DNA analysis.

Monnereau A., Lhuillier, J., Bendezu-Sarmiento, J.,Bon, C., Palaeogenetic analysis of Bronze Age/Iron Age transition in Southern Central Asia, poster, 6th DNA Polymorphisms in Human Populations, Musee de l’Homme, Paris, 7-10 December, 2016

See also...

Pots were people in Bronze Age southern Central Asia too

Tuesday, August 16, 2016

EAA 2016 abstracts

The abstract book for this year's meeting in Vilnius can be gotten here. I'm hoping there's a paper coming real soon based on this talk on the genetic history of the East Baltic. Emphasis is mine.

Recent studies of ancient genomes have revealed two large-scale prehistoric population movements into Europe after the initial settlement by modern humans: A first expansion from the Near East that brought agricultural practices, also known as the Neolithic revolution; and a second migration from the East that was seen in a genetic component related to the Yamnaya pastoralists of the Pontic Steppe, which appears in Central Europe in people of the Late Neolithic Corded Ware and has been present in Europeans since then in a decreasing North-East to South-West gradient. This migration has been proposed to be the source of the majority of today’s Indo-European languages within Europe.

In this paper we aim to show how these processes affected the Eastern Baltic region where the archeological record shows a drastically different picture than Central and Southern Europe. While agricultural subsistence strategies were commonplace in most of the latter by the Middle Neolithic, ceramic-producing hunter-gatherer cultures still persisted in the Eastern Baltic up until around 4000 BP and only adopted domesticated plants and animals at a late stage after which they disappeared into the widespread Corded Ware culture.

We present the results of ancient DNA analyses of 81 individuals from the territory of today’s Lithuania, Latvia and Estonia that span from the Mesolithic to Bronze Age. Through study of the uniparentally inherited mtDNA and Y-chromosome as well as positions across the entire genome that are informative about ancient ancestry we reveal the dynamics of prehistoric population continuity and change within this understudied region and how they are reflected in today’s Baltic populations.

Mittnik et al., A genetic perspective on population dynamics of the pre-historic Eastern Baltic region, EAA 2016 presentation, TH4-11 Abstract 06

Monday, August 15, 2016

A few mito genomes from Maikop (or Maykop)

The mtDNA haplogroup list below is from a new paper at the Journal of Archaeological Science. I can't remember seeing mt-hgs M52, U8 or V7 in any of the results to date from the Bronze Age steppe. So perhaps we can tentatively say that Maikop-Novosvobodnaya populations didn't have an important impact on the maternal ancestry of early steppe pastoralists?

- Krasnodar Krai, Maikop burial, 4000-3000 BCE, mt-hg U8b1a2

- Krasnodar Krai, Maikop burial, 3700-3300 BCE mt-hg U8b1a2

- Republic of Adygea, Maikop burial, 3700-3300 BCE mt-hg M52

- Republic of Adygea, Novosvobodnaya burial, 3700-3300 BCE mt-hg V7

- Krasnodar Krai, unknown burial, 3700-3300 BCE mt-hg N1b1

- Republic of Adygea, unknown burial, 3700-3300 BCE mt-hg T2b

Also, interestingly, the Novosvobodnaya individual suffered from Bang's disease. You get that from drinking unpasteurized milk.


Sokolov et al., Six complete mitochondrial genomes from Early Bronze Age humans in the North Caucasus, Journal of Archaeological Science, Volume 73, September 2016, Pages 138–144, doi:10.1016/j.jas.2016.07.017

See also...

Big deal of 2018: Yamnaya not related to Maykop

Genetic borders are usually linguistic borders too

On the genetic prehistory of the Greater Caucasus (Wang et al. 2018 preprint)

Basal-rich K7 vs D-stats: the puzzle

It's interesting that, as per the graphs below, the K7 Villabruna cluster shows an awesome correlation with Villabruna affinity. At the same time, the K7 Basal-rich cluster shows an awesome inverse correlation with AG3 (aka AfontovaGora3) affinity.

Conversely, the K7 Basal-rich cluster shows a much poorer inverse correlation with Villabruna affinity, and the K7 AG3-MA1 cluster shows a much poorer correlation with AG3 (aka. AfonovaGora3) affinity.

Why is that? Anyone know? If you think you do, please post your answer in the comments. I already know why and shall reveal the answers shortly. The relevant datasheet is available here.

Update 16/08/2016: Without further adieu, here are the answers (and feel free to disagree with me, but please make sure you have some convincing arguments if you do)...

- the Villabruna cluster shows a strong correlation with Villabruna affinity simply because the K7 is pretty good at estimating proportions of Villabruna-related admixture and, at the same time, Villabruna is an excellent reference sample for Villabruna-related affinity in Western Eurasians

- the Basal-rich cluster shows a strong inverse correlation with AG3 affinity because, paradoxically, AG3 is a fairly poor reference for AG3-related admixture in most Western Eurasians, thereby acting as a pretty good reference for overall Basal Eurasian-free forager ancestry (note that the correlation only breaks down somewhat for samples unusually rich in AG3-related ancestry compared to their neighbors, like those from the Bronze Age steppe and Caucasus)

- the Basal-rich cluster shows a fairly poor inverse correlation with Villabruna affinity because, again, as per above, and also somewhat paradoxically, Villabruna is an excellent proxy for Villabruna-related admixture in Western Eurasians, thereby forcing groups with unusually high Villabruna-related admixture and affinity well above the diagonal line

- the AG3-MA1 cluster shows a fairly poor correlation with AG3 affinity simply because, as per above, AG3 is a fairly poor reference for AG3-related admixture in Western Eurasians, and especially West Central Asians, thereby forcing them well below the diagonal line

Update 17/08/2016: And Matt's explanation is in the comments here, along with the graphs below.

See also...

The Basal-rich K7

Saturday, August 13, 2016

PCA: Neolithic Central Anatolians

Note that the individuals from the earlier site of Boncuklu basically cluster with early Neolithic Europeans, while those from Tepecik-Ciftlik are shifted south and east, suggesting an influx of admixture into central Anatolia from perhaps eastern Anatolia and the Levant after the early Neolithic. This is in accordance with the findings of Kılınç et al. who published these genomes.

I also tested the same samples with the Basal-rich K7 (refer to the spreadsheet here). Their results appear to correlate very nicely with the PCA. However, I deleted Tep001 from the PCA plot because his PCA and Basal-rich K7 outcomes didn't match, suggesting that either one or the other, or both, were spurious. This isn't surprising, however, since Tep001 only has a coverage of 0.023x.


Gülşah Merve Kılınç et al., The Demographic Development of the First Farmers in Anatolia, Current Biology, August 8, 2016, DOI:

Update 15/08/2016: Below are a few admixture f3-stats from an analysis involving the new Anatolian samples. Please note, the more negative the Z score, the more likely that the target is admixed. Also, I had to use transversion SNPs to make this work, so the Z scores aren't as imposing as they might have been with more markers behind them. I'm posting all of the outcomes with Z scores lower than -1, but it might be best to ignore anything above -2.

Boncuklu_EN + Levant_N > Barcin_N f3 -0.005525 Z -2.62 SNPs 48620
Boncuklu_EN + Natufian > Barcin_N f3 -0.004252 Z -1.34 SNPs 28893

Boncuklu_EN + Natufian > Tepecik-Ciftlik_N f3 -0.013262 Z -1.566 SNPs 4384

Barcin_N + Villabruna > LBK_EN f3 -0.003652 Z -2.685 SNPs 49325
Barcin_N + LaBrana1 > LBK_EN f3 -0.003382 Z -2.462 SNPs 53537
Barcin_N + Motala_HG > LBK_EN f3 -0.002539 Z -2.388 SNPs 57533
Barcin_N + Loschbour > LBK_EN f3 -0.003089 Z -2.272 SNPs 48728
Tepecik-Ciftlik_N + Villabruna > LBK_EN f3 -0.004176 Z -1.452 SNPs 34905
Barcin_N + Hungary_HG > LBK_EN f3 -0.001939 Z -1.32 SNPs 41610
Boncuklu_EN + Levant_N > LBK_EN f3 -0.003035 Z -1.221 SNPs 40815

Barcin_N + Loschbour > Iberia_EN f3 -0.002457 Z -1.171 SNPs 38141
Tepecik-Ciftlik_N + Hungary_HG > Iberia_EN f3 -0.004022 Z -1.039 SNPs 21848

Barcin_N + Villabruna > Hungary_N f3 -0.006408 Z -4.28 SNPs 44545
Barcin_N + Hungary_HG > Hungary_N f3 -0.005216 Z -3.355 SNPs 39808
Barcin_N + Bichon > Hungary_N f3 -0.002554 Z -1.667 SNPs 48500
Barcin_N + Motala_HG > Hungary_N f3 -0.001559 Z -1.298 SNPs 51052
Tepecik-Ciftlik_N + Villabruna > Hungary_N f3 -0.003929 Z -1.239 SNPs 31535
Tepecik-Ciftlik_N + Hungary_HG > Hungary_N f3 -0.003897 Z -1.192 SNPs 28472
Barcin_N + LaBrana1 > Hungary_N f3 -0.00179 Z -1.083 SNPs 46955

Tuesday, August 9, 2016

On the enigmatic early Neolithic farmers from Iran

There still seems to be a lot of confusion around the traps, including in the comments at this blog, about the genetic structure of the early Neolithic Iranian farmers.

They're certainly a unique and mysterious West Eurasian population, but I'd say the picture is generally pretty straightforward considering that they were dug up on the border between the Near East and Central Asia.

As per my K7 test, they're closely related to other West Eurasians, and especially Near Easterners, via an ancient component that appears to be a mixture of Basal Eurasian and something very similar to the Villabruna cluster (see post here and the last page of the accompanying comments).

Apart from that, they harbor a lot of AG3-related ancestry, albeit probably only distantly related. My guess for now is that this is mostly admixture from an as yet unsampled Central Asian forager population, perhaps with elevated affinity to Ust_Ishim (update: probably not, see here).

The graphs below are based on the datasheet available here. Like I say, these ancient Zagros farmers are unique and eastern shifted, but, at the same time, don't show the type of Southeast Asian pull that characterizes present-day South and South Central Asians.

Saturday, August 6, 2016

Yamnaya dogs (?)

Just in at bioRxiv:

Abstract: Europe has played a major role in dog evolution, harbouring the oldest uncontested Palaeolithic remains and having been the centre of modern dog breed creation. We sequenced the whole genomes of an Early and End Neolithic dog from Germany, including a sample associated with one of Europe’s earliest farming communities. Both dogs demonstrate continuity with each other and predominantly share ancestry with modern European dogs, contradicting a Late Neolithic population replacement previously suggested by analysis of mitochondrial DNA and a Late Neolithic Irish genome. However, our End Neolithic sample possesses additional ancestry found in modern Indian dogs, which we speculate may be derived from dogs that accompanied humans from the Eastern European steppe migrating into Central Europe. By calibrating the mutation rate using our oldest dog, we narrow the timing of dog domestication to 20,000-40,000 years ago. Interestingly, the extreme copy number expansion of the AMY2B gene found in modern dogs was not observed in the ancient samples, indicating that the AMY2B copy number increase arose as an adaptation to starch-rich diets after the advent of agriculture in the Neolithic period.

And on page 17:
The age of the samples provide a time frame, between ~7,000 and 5,000 years ago, for CTC to obtain its additional Indian­like ancestry component. Considering that CTC shows similar admixture patterns to Central Asian and Middle Eastern modern dog populations, as seen in the PCA (Figure 2) and ADMIXTURE (Supplementary Figure S8.3.2.) analysis, and that the cranium was found next to two individuals associated with the Neolithic Corded Ware Culture, we speculate that the Indian­-like gene flow may have been acquired by admixture with incoming populations of dogs that accompanied steppe people migrating from the East. Moreover, ADMIXTUREGRAPH and ​ f4 statistics support the possibility that the Indian and the wolf ancestry are the consequence of the same admixture event, involving a dog population that carried the two ancestries. This scenario is further supported by the model estimated by G­PhoCS, which infers substantial migration from wolves to the lineage represented by Indian village dogs (and as much as 0.36 migration rate when Indian wolves are included in the tree (Supplementary Methods 12)).

Botigue et al., Ancient European dog genomes reveal continuity since the early Neolithic, bioRxiv, posted August 5, 2016, doi:

Thursday, August 4, 2016

On the origins of the first farmers in Anatolia

Open access at Current Biology:

Summary: The archaeological documentation of the development of sedentary farming societies in Anatolia is not yet mirrored by a genetic understanding of the human populations involved, in contrast to the spread of farming in Europe [ 1–3 ]. Sedentary farming communities emerged in parts of the Fertile Crescent during the tenth millennium and early ninth millennium calibrated (cal) BC and had appeared in central Anatolia by 8300 cal BC [ 4 ]. Farming spread into west Anatolia by the early seventh millennium cal BC and quasi-synchronously into Europe, although the timing and process of this movement remain unclear. Using genome sequence data that we generated from nine central Anatolian Neolithic individuals, we studied the transition period from early Aceramic (Pre-Pottery) to the later Pottery Neolithic, when farming expanded west of the Fertile Crescent. We find that genetic diversity in the earliest farmers was conspicuously low, on a par with European foraging groups. With the advent of the Pottery Neolithic, genetic variation within societies reached levels later found in early European farmers. Our results confirm that the earliest Neolithic central Anatolians belonged to the same gene pool as the first Neolithic migrants spreading into Europe. Further, genetic affinities between later Anatolian farmers and fourth to third millennium BC Chalcolithic south Europeans suggest an additional wave of Anatolian migrants, after the initial Neolithic spread but before the Yamnaya-related migrations. We propose that the earliest farming societies demographically resembled foragers and that only after regional gene flow and rising heterogeneity did the farming population expansions into Europe occur.

Gülşah Merve Kılınç et al., The Demographic Development of the First Farmers in Anatolia, Current Biology, August 8, 2016, DOI:

See also...

PCA analysis: Neolithic Central Anatolians

The genetic structure of the world's first farmers (Lazaridis et al. preprint)

Early Neolithic genomes from the eastern Fertile Crescent (Broushaki et al. 2016)