Monday, March 28, 2016

PCA/nMonte open thread


Below are a few nMonte models of ancient individuals based on 25 principal components (PCs). The relevant datasheet and nMonte R script can be downloaded here and here, respectively.


Many of the outcomes are basically perfect. Others could certainly be better. But they all make sense.

The more complex the ancestry, the more difficult it is to model. Also, deamination, low coverage and missing markers are probably skewing things to some degree for most of these samples. So although time consuming, it might be a good idea to use population averages minus the most obvious outliers.

Are there any other ways to improve the analysis? Is 25 dimensions too much or too little? Let's run plenty of tests and see where this takes us.

I can update the datasheet with many more populations and dimensions later this week. Feel free to post your requests in the comments and I'll run them if I have them. Also, if anyone's wondering, I don't know yet which commercial genotype files I can run in this test, if any. I'll check.

Update 04/04/2016: A modified datasheet with 50 dimensions and many more samples is available here. It should be more useful in modeling South Central Asians, especially the Kalash. However, as far as I can tell, using just 9 dimensions, like in the version here, is faster and produces more accurate results.

Wednesday, March 16, 2016

Sintashta, BMAC and the Indo-Iranians


I'm perusing the online archives of Harvard Sanskrit Professor Michael Witzel. The links below are worth checking out for some background info on the prehistory of Eastern Europe and Central Asia. There's a very cool map on page 6 of the second PDF.

Sintashta, BMAC and the Indo-Iranians. A query.

Linguistic Evidence for Cultural Exchange in Prehistoric Western Central Asia.

The Home of the Aryans.

Autochthonous Aryans? The Evidence from Old Indian and Iranian Texts.

Looking back, these old school linguistics articles make a lot more sense than most of the supposedly cutting-edge population genetics papers coming out at around the same time dealing with the Indo-Aryan question.

Many population geneticists back then took the view that the ancestors of the Indo-Aryans could not have spread from the European steppes to India because Y-chromosome haplogroup R1a apparently showed the greatest haplotype diversity in the Indus Valley. Well, what a load of crock that turned out to be.

See also...

The Poltavka outlier

Friday, March 11, 2016

D-stats/nMonte open thread


I'll start the ball rolling with a 9-way mixture analysis of 93 European, Near Eastern and Central Asian present-day and ancient populations. The relevant datasheet and R script are available here and here.


Below is a simple tree/cluster analysis based on the results, using the freely available Past3 software. Makes perfect sense, I'd say.


It's important to understand that these sorts of tests are basically designed to estimate ancient ancestry proportions, rather than calculate minor admixtures. With that in mind, here are a few observations:

- Karasuk outlier RISE497 (the most eastern Karasuk individual) is surprisingly important for Near Eastern populations

- Nordic LNBA and Sintashta look very similar in terms of overall ancestry proportions, suggesting that they perhaps derive from the same ancestral population

- The effects of postmortem deaminantion or DNA damage appear to be expressed in many of the non-UDG treated ancient samples as minor Sub-Saharan admixture

Can anyone put together a better model for West Eurasians? Also, I'd really like to see a well thought out D-stats/nMonte analysis of South Central Asia.

See also...

Yamnaya = Khvalynsk + extra CHG + maybe something else

D-stats/nMonte open thread #2

Sunday, March 6, 2016

D-stats/4mix tour of ancient Eurasia


This 4mix experiment is based on a series of statistics of the form D(Chimp,Reference_pop/Test_pop)(Mbuti,X), where X represents one of 9 ancient and present-day outgroups. The input data is available here. Feel free to try it yourself and post your models in the comments below.


Here's a Principal Component Analysis (PCA) based on the D-stats. As far as I can see, it makes very good sense. Click to enlarge.

See also...

Yamnaya = Khvalynsk + extra CHG + maybe something else

PC/nMonte open thread

Thursday, March 3, 2016

Irano-Turko-Slavic roots of Ashkenazi Jews?


As far as I've been able to discern, Ashkenazi Jews are very similar to Sephardic Jews, except with minor admixture from Central and Eastern Europe, and perhaps Central Asia (via the Silk Road). So the hypothesis presented in this new paper at Genome Biology and Evolution doesn't work for me:

The Yiddish language is over one thousand years old and incorporates German, Slavic, and Hebrew elements. The prevalent view claims Yiddish has a German origin, whereas the opposing view posits a Slavic origin with strong Iranian and weak Turkic substrata. One of the major difficulties in deciding between these hypotheses is the unknown geographical origin of Yiddish speaking Ashkenazic Jews (AJs). An analysis of 393 Ashkenazic, Iranian, and mountain Jews and over 600 non-Jewish genomes demonstrated that Greeks, Romans, Iranians, and Turks exhibit the highest genetic similarity with AJs. The Geographic Population Structure (GPS) analysis localized most AJs along major primeval trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from "Ashkenaz." Iranian and mountain Jews were localized along trade routes on the Turkey's eastern border. Loss of maternal haplogroups was evident in non-Yiddish speaking AJs. Our results suggest that AJs originated from a Slavo-Iranian confederation, which the Jews call "Ashkenazic" (i.e., "Scythian"), though these Jews probably spoke Persian and/or Ossete. This is compatible with linguistic evidence suggesting that Yiddish is a Slavic language created by Irano-Turko-Slavic Jewish merchants along the Silk Roads as a cryptic trade language, spoken only by its originators to gain an advantage in trade. Later, in the 9th century, Yiddish underwent relexification by adopting a new vocabulary that consists of a minority of German and Hebrew and a majority of newly coined Germanoid and Hebroid elements that replaced most of the original Eastern Slavic and Sorbian vocabularies, while keeping the original grammars intact.

Das et al., Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz, Genome Biol Evol (2016), doi: 10.1093/gbe/evw046

See also...

Khazar shmazar

Khazar shmazar #2