Whose results are these? Feel free to post your guesses in the comments section below. I'll reveal the answer and make the sample available online in a couple of days.
Eurogenes K15 results
4 Ancestors Oracle results
1 MA-1+Tabassaran+Tabassaran+Tabassaran @ 7.771513
2 Kalash+MA-1+Tabassaran+Tabassaran @ 7.785069
3 Lezgin+MA-1+Tabassaran+Tabassaran @ 7.960974
4 Kalash+Lezgin+MA-1+Tabassaran @ 7.96793
5 Kalash+Kalash+MA-1+Tabassaran @ 8.119039
Update 27/08/2014: OK, the sample is a composite of two Lezgins, a people from the Northeast Caucasus, and two Ancient North Eurasian (ANE) genomes from Upper Paleolithic Siberia: Mal'ta boy or MA-1 and Afontova Gora-2 or AG-2. It can be downloaded here.
I chose these two Lezgins because they showed higher than average levels of ANE ancestry (well over 30% in most tests). Basically, I wanted to see where a Lezgin-like individual with unusually high ANE, as well as a dab of WHG, would land on a Principal Component Analysis (PCA) or genetic map of West Eurasia. That's because I now believe that a population like this played a key role in the formation of the modern European gene pool during the early metal ages.
My rough estimate is that the composite genome is around 50% ANE, around 40% early European farmer (EEF), and a few per cent Western European hunter-gatherer (WHG). For a detailed description of these three ancestral components see here.
The outcome is very interesting, because it puts the composite more or less between the Maris and North Caucasians, which roughly translates to the Russo-Kazakh border. This is an area generally accepted to be part of the Proto-Indo-European (PIE) homeland, and fits with a recent theory that populations expanding from this region after the Neolithic might be responsible for the widespread occurrence of ANE across Europe today (see here).
However, formal statistics, rather than PCA, are the favored method for studying ancient genomes in scientific literature. So I thought I'd run f3 and D-statistics to see whether this composite was indeed the closest thing to a PIE individual in my dataset.
I picked a set of French samples as the test group, and chose French Basques as the main reference group, alongside the composite and a variety of populations that are documented or suspected of carrying high levels of ANE. The assumption I made was that the French used to be like the French Basques, their non-Indo-European neighbors, before someone pushed in from the east and changed both their language and genomes.
The results can be seen in the spreadsheet below. Please note, if the f3-statistic is negative, then the target group is assumed to be admixed. Moreover, if the D-statistic Z-score is positive, then the gene flow occurred either between W and Y or X and Z. If the Z-score is negative, then the gene flow occurred either between W and Z or X and Y.