search this blog

Sunday, July 26, 2015

Global PCA of selected Late Neolithic/Bronze Age Eurasians


I was curious how the Bronze Age steppe and Corded Ware genomes from the Rise dataset would behave in Principal Component Analyses (PCA) alongside populations from across the globe. Ten genomes had enough high confidence (transversion) markers to be analyzed accurately in such a way. I also ran an Iron Age Swedish sample, just to see how it differed from the older genomes.

Click on the links to go to my drive to download the plots. If you're having trouble finding the ancient samples, type their IDs into the PDF search field and hit enter.

RISE509_Afanasievo
RISE509_Afanasievo
RISE509_Afanasievo

RISE511_Afanasievo
RISE511_Afanasievo
RISE511_Afanasievo

RISE500_Andronovo
RISE500_Andronovo
RISE500_Andronovo

RISE505_Andronovo
RISE505_Andronovo
RISE505_Andronovo

RISE00_Corded_Ware
RISE00_Corded_Ware
RISE00_Corded_Ware

RISE94_Corded_Ware
RISE94_Corded_Ware
RISE94_Corded_Ware

RISE493_Karasuk
RISE493_Karasuk
RISE493_Karasuk

RISE496_Karasuk
RISE496_Karasuk
RISE496_Karasuk

RISE548_Yamnaya
RISE548_Yamnaya
RISE548_Yamnaya

RISE552_Yamnaya
RISE552_Yamnaya
RISE552_Yamnaya

RISE174_Iron_Age_Scandinavia
RISE174_Iron_Age_Scandinavia
RISE174_Iron_Age_Scandinavia

I can't see any major surprises. But I do find it remarkable how very European the Andronovo individuals appear on these plots. Keep in mind that they're ~3,000-year-old samples from the Altai region of Russia. Their ancestors probably migrated there from the Trans-Urals steppe sometime during the Middle Bronze Age.

The Andronovo Culture was succeeded in the Altai region during the Late Bronze Age by the Karasuk Culture, which was probably a new composite of local and perhaps foreign groups. Interestingly, the Karasuk samples featured above are obviously of mixed European/East Asian origin.

Note also that the Afanasievo and Yammnaya individuals fall outside the range of present-day European variation in many of the dimensions, basically as if they were pulling towards the Karitiana Indians of the Amazon. No doubt, this is their excess ANE talking.

By the way, I recently ran some of the same samples in PCA limited to West Eurasian populations. You can see the results here.

Wednesday, July 22, 2015

High-res R1b tree featuring 16 ancient sequences


Here's a useful R1b phylogenetic tree that was posted recently at the R1b-M269 (P312- U106-) DNA Project site.


If these results are correct (and judging by the quality of work at the aforementioned R1b project, I'm pretty sure they are), it would appear that the Samara hunter-gatherer, marked I0124, was not directly ancestral or even all that closely related to any of the Yamnaya/Pit-Grave samples from the North Caspian region (each one also marked with an I~ ID).

On the other hand, the North Caspian Yamnaya sequences are very similar to the rest of the Yamnaya sequences, which come from just north of the Caucasus (marked RISE~). Indeed, all of these Yamnaya samples are almost identical in terms of genome-wide genetic structure (see here).

What this suggests is that the Yamnaya nomads emigrated to the North Caspian from somewhere near the Caucasus, or they were the descendents of such migrants. And if we assume that their ancestral homeland abutted the territory of the Maikop Culture, as shown on this map from Dolukhanov 2014 (look for 9 - early Pit-graves), it becomes easy to understand why they carried such significant maternal and genome-wide genetic Caucasus-related admixture (usually estimated at around 50%).

However, if you're one of those online Near Eastern patriots who like to imagine the Yamnaya as your own, please don't jump for joy just yet. The Yamnaya nomads still look very much like a people native to the western steppe, and this is probably also where their R1b comes from.

Sunday, July 19, 2015

The real thing


A couple of years ago Moorjani et al. concluded that present-day Georgians of the Transcaucasus were the best available proxy for the ancient West Eurasian population that mixed into the South Asian gene pool.

This was a solid statistical fit. And you can see on the TreeMix graph below, featuring a Georgian and a Kalash, why it worked so well.




But it was also a big fat coincidence, because check out what happens when I add another migration edge to the same graph.




Thus, the Indo-Iranian and hence Indo-European speaking Kalash no longer looks very similar to the Kartvelian speaking Georgian. In fact, he appears to be most closely related to the supposedly Indo-European speaking Afanasievo and Yamnaya nomads of the Early Bronze Age Eurasian steppe. The rest of his ancestry is probably best described as South Central Asian, which is an unknown quantity to me at this stage, but probably in large part of indigenous South Asian origin (see here).

I'm only able to show this thanks to the ancient samples that are on the tree, for which, as far as I know, there aren't any useful substitutes among present-day populations. Obviously, Moorjani et al. didn't have this luxury, so they ended up with a model that was statistically sound, but didn't make much sense otherwise, especially in terms of linguistics.

My TreeMix model is easily reproducible with most of the other South Asian samples from the Human Origins, and it gels nicely with uniparental marker data too. For instance, here's a close up from a similar graph featuring a Pathan, with a few extra details.




Yep, not only do Pathans cluster among these ancients of the Eurasian steppe, but most of them also carry the same Y-chromosome haplogroup: R1a-Z93, which is derived from R1a-M417, and in all likelihood first expanded in a big way with the Proto-Indo-Iranians of the Trans-Ural steppe.

By the way, the Human Origins has four different sets of Gujarati samples from Houston, USA, marked A, B, C and D, and each one shows a different level of ancient steppe admixture as inferred with my test. GujaratiA score around 50% while GujaratiD only 40%. Does anyone know why these Gujaratis were grouped in such a way? Was it based on genetic structure or caste origin?





Full output from the analysis above is available in a zip file here. The reference samples and markers are listed here and here. The ancient samples are from Allentoft et al. 2015 and Haak et al. 2015.

See also...

The Poltavka outlier

Friday, July 17, 2015

Iron Age and Anglo-Saxon genomes from eastern England (Schiffels et al. preprint)


I haven't read this properly yet, but the results appear to be very similar to those I obtained with some of the same ancient genomes (see here), which must be very heartening for the authors (j/k). By the way, it's interesting to note that the word Celtic doesn't appear anywhere in the paper. I wonder why?

British population history has been shaped by a series of immigrations and internal movements, including the early Anglo-Saxon migrations following the breakdown of the Roman administration after 410CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences generated from ten ancient individuals found in archaeological excavations close to Cambridge in the East of England, ranging from 2,300 until 1,200 years before present (Iron Age to Anglo-Saxon period). We use present-day genetic data to characterize the relationship of these ancient individuals to contemporary British and other European populations. By analyzing the distribution of shared rare variants across ancient and modern individuals, we find that today’s British are more similar to the Iron Age individuals than to most of the Anglo-Saxon individuals, and estimate that the contemporary East English population derives 30% of its ancestry from Anglo-Saxon migrations, with a lower fraction in Wales and Scotland. We gain further insight with a new method, rarecoal, which fits a demographic model to the distribution of shared rare variants across a large number of samples, enabling fine scale analysis of subtle genetic differences and yielding explicit estimates of population sizes and split times. Using rarecoal we find that the ancestors of the Anglo-Saxon samples are closest to modern Danish and Dutch populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain.

Schiffels et al., Iron Age and Anglo-Saxon genomes from East England reveal British migration history, bioRxiv, Posted July 17, 2015. doi: http://dx.doi.org/10.1101/022723

Wednesday, July 15, 2015

Population genomics of Early Bronze Age Europe in three simple graphs


Thanks to recent advances in ancient genomics there's very little doubt now that the Pontic-Caspian Steppe was the source of massive population movements deep into Europe during the Late Neolithic/Early Bronze Age.

But some people still don't get it, maybe because genomics isn't their thing? Others just refuse to get it probably because it's at odds with what they've been hoping to see.

To help the former, and piss off the latter some more, I've put together three simple TreeMix graphs featuring ancient samples from a wide range of European archeological cultures, along with a little bit of commentary. Enjoy.





Full output from the analysis above is available in a zip file here. The samples and markers are listed here and here. The ancient samples are from Allentoft et al. 2015 and Haak et al. 2015. The Sub-Saharan Africans are from the fully public Human Origins dataset available here.

Wednesday, July 8, 2015

Another look at the ancient mtDNA from Xiaohe, Tarim Basin


BMC Genetics has just published a new paper on the famous Tarim Basin mummies. It's a bit of a shame that it only deals with their mtDNA. Here's the abstract:

Background: The Tarim Basin in western China, known for its amazingly well-preserved mummies, has been for thousands of years an important crossroad between the eastern and western parts of Eurasia. Despite its key position in communications and migration, and highly diverse peoples, languages and cultures, its prehistory is poorly understood. To shed light on the origin of the populations of the Tarim Basin, we analysed mitochondrial DNA polymorphisms in human skeletal remains excavated from the Xiaohe cemetery, used by the local community between 4000 and 3500 years before present, and possibly representing some of the earliest settlers.

Results: Xiaohe people carried a wide variety of maternal lineages, including West Eurasian lineages H, K, U5, U7, U2e, T, R*, East Eurasian lineages B, C4, C5, D, G2a and Indian lineage M5.

Conclusion: Our results indicate that the people of the Tarim Basin had a diverse maternal ancestry, with origins in Europe, central/eastern Siberia and southern/western Asia. These findings, together with information on the cultural context of the Xiaohe cemetery, can be used to test contrasting hypotheses of route of settlement into the Tarim Basin.

Five years ago some of the same scientists published a paper on an older set of human remains from the same burial site, and found that all of the males belonged to Y-chromosome haplogroup R1a (see here). Last year one of them apparently left a comment under that paper saying this:

Our results show that Xiaohe settlers carried Hg R1a1 in paternal lineages, and Hgs H, K, C4, M* in maternal lineages. Though Hg R1a1a is found at highest frequency in both Europe and South Asia, Xiaohe R1a1a more likely originate from Europe because of it not belong to R1a1a-Z93 branch (our recently unpublished data) which mainly found in Asians.

So I'm pretty sure another paper is on the way. But hopefully the data will include much more than just broad Y-haplogroup classifications. A few full genomes from several layers of the Xiaohe cemetery would be really nice.

Citation...

Chunxiang Li., Analysis of ancient human mitochondrial DNA from the Xiaohe cemetery: insights into prehistoric population movements in the Tarim Basin, China, BMC Genetics 2015, 16:78, doi:10.1186/s12863-015-0237-5

See also...

Lots of ancient Y-DNA from China

Bronze Age Tarim Basin Caucasoids belonged to Y-haplogroup R1a1a

Friday, July 3, 2015

ADMIXTURE analysis of Allentoft et al. and Haak et al. ancient genomes


I haven't had a chance to study the output in detail yet, and I don't know what the cross-validation errors are for each of these unsupervised runs, but I'd say they all look pretty good. A Principal Component Analysis (PCA) of some of the K=10 data, showing how present-day Armenians compare to two Bronze Age Armenians, can be seen here.

K=6 spreadsheet

K=7 spreadsheet

K=8 spreadsheet

K=9 spreadsheet

K=10 spreadsheet

I did attempt to go up to K=11, but the algorithm appeared to be struggling to find a solution, so I killed the run. I'll have another go when more samples come in.

By the way, the analysis is based on the Human Origins fully public dataset available at the Reich lab website here.

To reduce errors, I limited the markers to transversion SNPs, and only kept samples with minimum call rates of 20%. This left 113K SNPs and 101 ancient genomes; 47 from Allentoft et al., 36 from Haak et al., and 18 from other recent papers. I didn't thin the markers to correct for LD, because in my experience this often results in less accurate outcomes.

Friday, June 26, 2015

Genetic substructures among Late Neolithic/Bronze Age Scandinavians


I may have discovered an interesting pattern in the Allentoft et al. data. It seems that during the Late Neolithic/Bronze Age, Scandinavia was populated by two somewhat different populations; one characterized by Y-Chromosome haplogroup R1b and a genome-wide genetic structure typical of present-day Northwestern Europeans, and another by Y-Chromosome haplogroup R1a and a relatively more eastern genome-wide genetic profile.

Below are two Principal Component Analyses (PCA), both featuring ancient Swedish genomes classified as part of the Late Neolithic Battle-Axe archeological culture. However, the first sample clusters near present-day Norwegians and belongs to Y-haplogroup R1b-U106, which is nowadays typically known as a Germanic paternal marker. On the other hand, the second sample clusters among present-day Russians and Mordovians, from all the way near the Volga, and belongs to Y-haplogroup R1a-Z645, which very likely expanded from Eastern Europe during the Late Neolithic.




Here's another example of basically the same thing, but this time with two ancient genomes from Denmark. If you're having trouble finding the ancient samples, download the PDF files and type their IDs in the PDF search field.



Coincidence? Probably not, but we obviously need more samples to confirm these results and establish that there is indeed a pattern.

Citation...

Allentoft et al., Bronze Age population dynamics, selection, and the formation of Eurasian genetic structure, Nature 522, 167–172 (11 June 2015) doi:10.1038/nature14507

Monday, June 22, 2015

First look at an ancient genome from Neolithic Anatolia


Felix at GGT is in the process of uploading the genomes from the recent Pinhasi et al. paper. The file for the early Neolithic sample from Barcin, Turkey, is basically ready. I analyzed it with my K8 model and got these results (click on the image to enlarge).


I was only able to use a couple hundred SNPs for the test, so the outcome can't be taken too seriously. But it does make sense. The lack of Ancient North Eurasian (ANE) ancestry isn't surprising, because it mirrors the results of early European farmers we've seen to date.

Moreover, the relatively high level of Western European Hunter-Gatherer (WHG) ancestry, or at least something very similar, is also in line with expectations, considering that the sample was dug up in far western Anatolia, almost on the European border.

I also ran an Identical-by-State (IBS) affinity test using the Human Origins dataset and around 1800 SNPs. The results broadly back up the K8 analysis, with southern Europeans topping the list.


Citation...

Pinhasi R, Fernandes D, Sirak K, Novak M, Connell S, Alpaslan-Roodenberg S, et al. (2015) Optimal Ancient DNA Yields from the Inner Ear Part of the Human Petrous Bone. PLoS ONE 10(6): e0129102. doi:10.1371/journal.pone.0129102

See also...

The Near East ain't what it used to be

Wednesday, June 10, 2015

101 ancient Eurasian genomes (Allentoft et al. 2015)


It'll take me a while to digest all of the information in this massive new Allentoft et al. paper. But I've already noticed that, just like in Haak et al. 2015, the Yamnaya samples are again from the eastern half of the Yamnaya horizon. This time, however, not all of the Yamnaya individuals carry Y-haplogroup R1b; one of the five samples belongs to Y-haplogroup I2a (see here).

So I'm wondering what more westerly Yamnaya sites will reveal in the future, considering the predominance of Y-haplogroup R1a among the Corded Ware individuals sampled to date, and the close genome-wide relationship between the Yamnaya and Corded Ware?

Abstract: The Bronze Age of Eurasia (around 3000–1000 BC) was a period of major cultural changes. However, there is debate about whether these changes resulted from the circulation of ideas or from human migrations, potentially also facilitating the spread of languages and certain phenotypic traits. We investigated this by using new, improved methods to sequence low-coverage genomes from 101 ancient humans from across Eurasia. We show that the Bronze Age was a highly dynamic period involving large-scale population migrations and replacements, responsible for shaping major parts of present-day demographic structure in both Europe and Asia. Our findings are consistent with the hypothesized spread of Indo-European languages during the Early Bronze Age. We also demonstrate that light skin pigmentation in Europeans was already present at high frequency in the Bronze Age, but not lactose tolerance, indicating a more recent onset of positive selection on lactose tolerance than previously thought.

Allentoft et al., Population genomics of Bronze Age Eurasia, Nature 522, 167–172 (11 June 2015) doi:10.1038/nature14507

See also...

R1a-M417 from Eneolithic Ukraine!!!11