search this blog

Thursday, August 13, 2015

Kets are rich in Ancient North Eurasian (ANE) ancestry


There's an interesting and very thorough preprint at bioRxiv looking at the genomic structure of Kets, the last nomadic hunter-gatherers of Siberia. From the paper:

Based on all analyses, we can tentatively model Kets as a two-way mixture of East Asians and ANE. Therefore, ANE ancestry in Kets can be estimated using various f4-ratios from 27% to 62% (depending on the dataset and reference populations), vs. 2% in Nganasans, 30 ‒ 39% in Karitiana, and 23 ‒ 28% in Mayans (Suppl. file S7, see details in Suppl. Information Section 8). Integrating data by different methods, we conservatively estimate that Kets have the highest degree of ANE ancestry among all investigated modern Eurasian populations west of Chukotka and Kamchatka. We speculate that ANE ancestry in Kets was acquired in the Altai region, where the Bronze Age Okunevo culture was located, with a surprisingly close genetic proximity to Mal'ta. Later, Yeniseian-speaking people occupied this region until the 16th-18th centuries. We suggest that Mal'ta ancestry was later introduced into Uralic-speaking Selkups, starting to mix with Kets extensively in the 17-18th centuries.

I'd say these findings make a lot of sense. Below is a spatial map put together by Sergey, based on my K8 model, showing the distribution of ANE across much of Eurasia. Note the ANE peak of around 28% among the Kets.


Citation...

Flegontov et al., Genomic study of the Ket: a Paleo-Eskimo-related ethnic group with significant ancient North Eurasian ancestry, bioRxiv, Posted August 13, 2015, doi: https://dx.doi.org/10.1101/024554

Tuesday, August 11, 2015

Finngolians


Of course, Mongolians never made it to Finland or even Northern Russia (Kargopol area) where these Russian samples are from. How did this crap get through peer review?

The proportions of admixture from ancestral EUR and EAS [European and East Eurasian, respectively] were estimated, and are shown in Table 2. CEU populations mostly originating from France and Germany had a small fraction (0.7 +/- 0.8%) of genetic material from EAS. People from Great Britain such as British (GBR) and Orcadian inherited 2.5%–3.8% from ancestral EAS. Finnish (FIN) and Russians inherited significantly more genetic material (>12%) from ancestral EAS, which is consistent with their historical record of admixture with Mongolian populations. Besides, Adygei from Caucasus inherited 3.2 +/- 1.0% from ancestral EAS.

Pengfei Qin et al., Quantitating and Dating Recent Gene Flow between European and East Asian Populations, Scientific Reports 5, 02 April 2015, Article number: 9500, doi:10.1038/srep09500

See also...

Finngolians #2

Comic relief


When, and how exactly, did ANI become ANE? I didn't get the memo.

The geographical distribution of the dark green component (ASI or Ancestral South Indian- unique to the subcontinent) was largely limited to the Indian subcontinent, and can be seen among all the populations of the subcontinent albeit in variable amount, whereas the second major component (light green: ANI or Ancestral North Indian (now ANE- Ancestral North Eurasian [76])) was shared with Central Asia, the Caucasus, Middle East and Europe (Fig 1c). The geographical origin of light green component (ANI or ANE) is so far unclear and more research is needed from unsampled area as well as from ancient DNA; however, the time of spread of this component from its origin place (either of any; the Caucasus, Near East, Indus Valley, or Central Asia) has happened more than 12.5 thousand years before [38], which is significantly earlier than the purported expansion of Dravidians and Aryans languages from outside the subcontinent. Notably, the Andaman Islanders are not the only population carrying the ASI component exclusively, as was suggested before [37]. Austroasiatic speakers (more precisely, the South Munda) of the subcontinent also seem to possess the ASI component in near unadulterated form (Fig 1c). More research with complete genome analysis would be required to clear the geographic center of the ANE component; however, it is evident from the present analysis that the dark green component (ASI) can be considered as a connecting thread for all the Indian populations (Fig 1c). Taken together, these results support the second hypothesis suggesting that all Indians, irrespective of their caste or tribal affiliations, share a common genetic ancestry, which is undoubtedly founded over the indigenous ASI component.

Citation: Chaubey G, Kadian A, Bala S, Rao VR (2015) Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana. PLoS ONE 10(6): e0127655. doi:10.1371/journal.pone.0127655

Sunday, August 9, 2015

Latvians: very similar to Lithuanians


The Estonian Biocentre has uploaded a new dataset from a forthcoming paper on the population structure of Slavs. It includes Latvians, Slovakians and Slovenians. Below is a Principal Component Analysis (PCA) featuring these individuals. Most of the other samples are from the Human Origins dataset.


Obviously, Latvians aren't Slavs; they're Balts just like Lithuanians. They're probably in this dataset because Balts are close neighbors and relatives of Slavs.

Note that these six Latvians are overall the most northerly group in this analysis, which suggests that they have the highest ratio of European hunter-gatherer ancestry. Nevertheless, they're obviously still very similar to Lithuanians and their Uralic neighbors to the north, the Estonians.

See also...

Finno-Ugric Poles in Kushniarevich et al. 2015

Wednesday, July 29, 2015

The ancient DNA case against the Anatolian hypothesis


In the debate over the location of the Proto-Indo-European urheimat, Colin Renfrew's Anatolian hypothesis is usually mentioned as the most viable alternative to the steppe or Kurgan hypothesis. But probably not for very much longer.

Below is a Principal Component Analysis (PCA) featuring extant Indo-European and non-Indo-European groups from West Eurasia, a couple of typical early Neolithic farmers from Central Europe, a typical Western Hunter-Gatherer, also from Central Europe, and the Iceman from the Copper Age Tyrolean Alps, again typical of his time and place.*

It's just a taste of the ancient genomic data we have available from prehistoric Europe, but it has almost everything that is pertinent to the issue at hand.


You don't need to be familiar with PCA methodology to be able to read the plot. Basically, it shows that the present-day European population structure is the result of two main events:

- the arrival of early farmers from Anatolia during the Neolithic transition, which eventually caused the extinction of people like the Western Hunter-Gatherer, who is the most obvious outlier on the plot

- the expansion of Kurgan groups such as the Yamnaya, which led to the formation of the Corded Ware horizon across much of Europe and shifted the genetic structure of almost all Europeans to the east, away from the Neolithic and Copper Age samples.

These were massive population turnovers, and, as a rule, massive population turnovers are accompanied by language change. So it's highly unlikely that any Europeans today are speaking languages derived from those of the Western Hunter-Gatherers or early Neolithic farmers of Central Europe (ie. according to Renfrew the ancestors of Celts, Germanics and other Indo-Europeans). Moreover, consider this:

- most present-day Indo-European speaking Europeans form an elongated cluster between the Neolithic farmers and the Corded Ware sample, pointing to the steppe-derived Corded Ware Culture as the proximate agent of the Indo-European expansion in much of Europe

- the only present-day Europeans who closely resemble Neolithic farmers are some Sardinians (the small Romance cluster just above the two Neolithic samples), but Sardinians spoke Paleo-Sardinian or Nuragic languages until they adopted Indo-European speech, in the form of Latin, from the Romans (see page 118 here).

Also, this isn't shown on the plot, but the dominant Y-chromosome haplogroup of early Neolithic farmers is G2a, which is a low frequency marker in Europe today. The two most common Y-chromosome haplogroups among present-day Europeans are R-M198 and R-M269, which are also typical of Corded Ware and Yamnaya males, respectively, and probably originally from the steppe.

So is there any way to rework the Anatolian hypothesis so that it can be salvaged? I doubt it. Even making the steppe a homeland for all of the main Indo-European branches apart from Anatolian and Armenian probably won't help.

It is true that the Yamnaya nomads carried Near Eastern-related ancestry which may represent Proto-Indo-European admixture from outside of the steppe. But there's no evidence that it came from Anatolia.

In fact, if Neolithic Anatolians were basically identical to early Neolithic European farmers, which seems to be the case (see here and here), then it's unlikely that it did, because the latter carried a peculiar genome-wide signal that is missing in Yamnaya genomes (orange cluster in the ADMIXTURE bar graph below).** Heck, even the early Corded Ware genomes from Germany barely show any of it.

I won't go into the linguistics arguments here why the Anatolian hypothesis is implausible. But it might be worth checking out a new book on the topic by linguists Asya Pereltsvaig and Martin W. Lewis: The Indo-European Controversy: Facts and Fallacies in Historical Linguistics. I haven't read it yet, so I welcome the opinions here of those who have. I did, however, read a lot of the online articles on which the book is based. As far as I know most of them are still available here and here.


*Another version of the same PCA, with the samples labeled individually, is available here. All possible combinations of dimensions 1 to 4 are shown here. The samples are listed here. All of the samples are from Haak et al. and Allentoft et al. The PCA was run using ~56K high confidence SNPs listed here.

The Corded Ware sample is a composite of Corded Ware sequences from Germany, Scandinavia, Estonia and Poland. The Yamnaya sample is a composite of Yamnaya sequences from the Kalmykia and Samara regions of Russia.

I chose to use these composites instead of individual sequences because I didn't want to run any samples with genotype rates of less than 98%.

** For a more detailed ADMIXTURE analysis comparing early Neolithic farmers to Yamnaya refer to Haak et al. Supplementary Information 6. Note the minimal sharing of components at the higher K between the early Neolithic farmers and Yamnaya, especially at K=16, which has the lowest median cross-validation (CV) error. This is in agreement with the PCA above.

See also...

Population genomics of Early Bronze Age Europe in three simple graphs

Sunday, July 26, 2015

Global PCA of selected Late Neolithic/Bronze Age Eurasians


I was curious how the Bronze Age steppe and Corded Ware genomes from the Rise dataset would behave in Principal Component Analyses (PCA) alongside populations from across the globe. Ten genomes had enough high confidence (transversion) markers to be analyzed accurately in such a way. I also ran an Iron Age Swedish sample, just to see how it differed from the older genomes.

Click on the links to go to my drive to download the plots. If you're having trouble finding the ancient samples, type their IDs into the PDF search field and hit enter.

RISE509_Afanasievo
RISE509_Afanasievo
RISE509_Afanasievo

RISE511_Afanasievo
RISE511_Afanasievo
RISE511_Afanasievo

RISE500_Andronovo
RISE500_Andronovo
RISE500_Andronovo

RISE505_Andronovo
RISE505_Andronovo
RISE505_Andronovo

RISE00_Corded_Ware
RISE00_Corded_Ware
RISE00_Corded_Ware

RISE94_Corded_Ware
RISE94_Corded_Ware
RISE94_Corded_Ware

RISE493_Karasuk
RISE493_Karasuk
RISE493_Karasuk

RISE496_Karasuk
RISE496_Karasuk
RISE496_Karasuk

RISE548_Yamnaya
RISE548_Yamnaya
RISE548_Yamnaya

RISE552_Yamnaya
RISE552_Yamnaya
RISE552_Yamnaya

RISE174_Iron_Age_Scandinavia
RISE174_Iron_Age_Scandinavia
RISE174_Iron_Age_Scandinavia

I can't see any major surprises. But I do find it remarkable how very European the Andronovo individuals appear on these plots. Keep in mind that they're ~3,000-year-old samples from the Altai region of Russia. Their ancestors probably migrated there from the Trans-Urals steppe sometime during the Middle Bronze Age.

The Andronovo Culture was succeeded in the Altai region during the Late Bronze Age by the Karasuk Culture, which was probably a new composite of local and perhaps foreign groups. Interestingly, the Karasuk samples featured above are obviously of mixed European/East Asian origin.

Note also that the Afanasievo and Yammnaya individuals fall outside the range of present-day European variation in many of the dimensions, basically as if they were pulling towards the Karitiana Indians of the Amazon. No doubt, this is their excess ANE talking.

By the way, I recently ran some of the same samples in PCA limited to West Eurasian populations. You can see the results here.

Wednesday, July 22, 2015

High-res R1b tree featuring 16 ancient sequences


Here's a useful R1b phylogenetic tree that was posted recently at the R1b-M269 (P312- U106-) DNA Project site.


If these results are correct (and judging by the quality of work at the aforementioned R1b project, I'm pretty sure they are), it would appear that the Samara hunter-gatherer, marked I0124, was not directly ancestral or even all that closely related to any of the Yamnaya/Pit-Grave samples from the North Caspian region (each one also marked with an I~ ID).

On the other hand, the North Caspian Yamnaya sequences are very similar to the rest of the Yamnaya sequences, which come from just north of the Caucasus (marked RISE~). Indeed, all of these Yamnaya samples are almost identical in terms of genome-wide genetic structure (see here).

What this suggests is that the Yamnaya nomads emigrated to the North Caspian from somewhere near the Caucasus, or they were the descendents of such migrants. And if we assume that their ancestral homeland abutted the territory of the Maikop Culture, as shown on this map from Dolukhanov 2014 (look for 9 - early Pit-graves), it becomes easy to understand why they carried such significant maternal and genome-wide genetic Caucasus-related admixture (usually estimated at around 50%).

However, if you're one of those online Near Eastern patriots who like to imagine the Yamnaya as your own, please don't jump for joy just yet. The Yamnaya nomads still look very much like a people native to the western steppe, and this is probably also where their R1b comes from.

Sunday, July 19, 2015

The real thing


A couple of years ago Moorjani et al. concluded that present-day Georgians of the Transcaucasus were the best available proxy for the ancient West Eurasian population that mixed into the South Asian gene pool.

This was a solid statistical fit. And you can see on the TreeMix graph below, featuring a Georgian and a Kalash, why it worked so well.




But it was also a big fat coincidence, because check out what happens when I add another migration edge to the same graph.




Thus, the Indo-Iranian and hence Indo-European speaking Kalash no longer looks very similar to the Kartvelian speaking Georgian. In fact, he appears to be most closely related to the supposedly Indo-European speaking Afanasievo and Yamnaya nomads of the Early Bronze Age Eurasian steppe. The rest of his ancestry is probably best described as South Central Asian, which is an unknown quantity to me at this stage, but probably in large part of indigenous South Asian origin (see here).

I'm only able to show this thanks to the ancient samples that are on the tree, for which, as far as I know, there aren't any useful substitutes among present-day populations. Obviously, Moorjani et al. didn't have this luxury, so they ended up with a model that was statistically sound, but didn't make much sense otherwise, especially in terms of linguistics.

My TreeMix model is easily reproducible with most of the other South Asian samples from the Human Origins, and it gels nicely with uniparental marker data too. For instance, here's a close up from a similar graph featuring a Pathan, with a few extra details.




Yep, not only do Pathans cluster among these ancients of the Eurasian steppe, but most of them also carry the same Y-chromosome haplogroup: R1a-Z93, which is derived from R1a-M417, and in all likelihood first expanded in a big way with the Proto-Indo-Iranians of the Trans-Ural steppe.

By the way, the Human Origins has four different sets of Gujarati samples from Houston, USA, marked A, B, C and D, and each one shows a different level of ancient steppe admixture as inferred with my test. GujaratiA score around 50% while GujaratiD only 40%. Does anyone know why these Gujaratis were grouped in such a way? Was it based on genetic structure or caste origin?





Full output from the analysis above is available in a zip file here. The reference samples and markers are listed here and here. The ancient samples are from Allentoft et al. 2015 and Haak et al. 2015.

See also...

The Poltavka outlier

Friday, July 17, 2015

Iron Age and Anglo-Saxon genomes from eastern England (Schiffels et al. preprint)


I haven't read this properly yet, but the results appear to be very similar to those I obtained with some of the same ancient genomes (see here), which must be very heartening for the authors (j/k). By the way, it's interesting to note that the word Celtic doesn't appear anywhere in the paper. I wonder why?

British population history has been shaped by a series of immigrations and internal movements, including the early Anglo-Saxon migrations following the breakdown of the Roman administration after 410CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences generated from ten ancient individuals found in archaeological excavations close to Cambridge in the East of England, ranging from 2,300 until 1,200 years before present (Iron Age to Anglo-Saxon period). We use present-day genetic data to characterize the relationship of these ancient individuals to contemporary British and other European populations. By analyzing the distribution of shared rare variants across ancient and modern individuals, we find that today’s British are more similar to the Iron Age individuals than to most of the Anglo-Saxon individuals, and estimate that the contemporary East English population derives 30% of its ancestry from Anglo-Saxon migrations, with a lower fraction in Wales and Scotland. We gain further insight with a new method, rarecoal, which fits a demographic model to the distribution of shared rare variants across a large number of samples, enabling fine scale analysis of subtle genetic differences and yielding explicit estimates of population sizes and split times. Using rarecoal we find that the ancestors of the Anglo-Saxon samples are closest to modern Danish and Dutch populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain.

Schiffels et al., Iron Age and Anglo-Saxon genomes from East England reveal British migration history, bioRxiv, Posted July 17, 2015. doi: http://dx.doi.org/10.1101/022723

Wednesday, July 15, 2015

Population genomics of Early Bronze Age Europe in three simple graphs


Thanks to recent advances in ancient genomics there's very little doubt now that the Pontic-Caspian Steppe was the source of massive population movements deep into Europe during the Late Neolithic/Early Bronze Age.

But some people still don't get it, maybe because genomics isn't their thing? Others just refuse to get it probably because it's at odds with what they've been hoping to see.

To help the former, and piss off the latter some more, I've put together three simple TreeMix graphs featuring ancient samples from a wide range of European archeological cultures, along with a little bit of commentary. Enjoy.





Full output from the analysis above is available in a zip file here. The samples and markers are listed here and here. The ancient samples are from Allentoft et al. 2015 and Haak et al. 2015. The Sub-Saharan Africans are from the fully public Human Origins dataset available here.