Thursday, November 26, 2015

The Khvalynsk men

This is where the three Samara Eneolithic or Khvalynsk samples from the recent Mathieson et al. paper plot on my Principal Component Analysis (PCA) of ancient West Eurasia. They're labeled as Steppe_CA (steppe Copper Age). I've also marked them with their Y-chromosome haplogroups.

Individual 10433, belonging to Y-chromosome haplogroup R1a, is almost a pure Eastern European Hunter-Gatherer, which is perhaps surprising, considering he was buried with copper artifacts. On the other hand, sample 10434, the one belonging to haplogroup Q1a, and positioned further east than the other two, appears to have been whacked over the head a few times and simply thrown into a ditch.

The PCA also has most of the other samples featured in Mathieson et al., including Neolithic Anatolians (labeled Anatolia_N), as well as extra samples from Allentoft et al. and Jones et al.

Nirjhar007 said...

the one belonging to haplogroup Q1a, and positioned further east than the other two, appears to have been whacked on the head a few times and simply thrown in a ditch.
I see.

Dmytro said...

I am willing to bet that when their yDNA is in a majority of Dnipro-Donetsk men will be R1a, and so will much if not most of Serednj Stih (Sredny Stog), as well as much of the Dnipro-Donetsk admixed North Trypilian Chapajevka people (early inhumation phase). Interesting times ahead...

Bernard said...

"Individual 10433, belonging to Y-chromosome haplogroup R1a, is almost a pure Eastern hunter-gatherer, which is somewhat surprising, considering he was buried with copper artifacts."
The individual buried with copper artifacts is 10122 of R1b Y haplogroup.
See pages 9 and 10 of Supplementary Information:
"10122 / SVP35 (grave 12)
Male (confirmed genetically), age 20-30, positioned on his back with raised knees, with 293 copper artifacts, mostly beads, amounting to 80% of the copper objects in the combined cemeteries of Khvalynsk I and II. Probably a high-status individual, his Y-chromosome
haplotype, R1b1, also characterized the high-status individuals buried under kurgans in later Yamnaya graves in this region, so he could be regarded as a founder of an elite group of patrilineally related families. His MtDNA haplotype H2a1 is unique in the Samara series."

Davidski said...

This is what it says on page 10.

10433 / SVP46 (grave 1)
Male (confirmed genetically), age 30-35, positioned on his back with raised knees, with a copper ring and a copper bead. His R1a1 haplotype shows that this haplotype was present in the region, although it is not represented later in high-status Yamnaya graves. His U5a1i MtDNA haplotype is part of a U5a1 group well documented in the Samara series.

These qualify as copper artifacts.

Bernard said...

May be he got them by trade

Davidski said...


Are you autistic or something?

Bernard said...


Rob said...

Davo is it possible to see where that ancient African genome from a couple weeks ago would pot?

Davidski said...

I can't put a Sub-Saharan African on a West Eurasian plot. You'd just see a tight ball of West Eurasians and a lone African.

Here's a global plot with Mota and Kotias.

Put their names into the search field to find them.

Rob said...


Nirjhar007 said...

I can't read it.

Gökhan said...

David do you have nay plan to creat a new calculator by using all of thos enew anatolian, CHG and greece samples?

Matt said...

@ Davidski, off topic, but, would it still be possible to run these lists of D (Chimp,Test)(Mbuti,Pop) stats at some point?:

Ust Ishim -
Dai -
Yoruba -

Rob said...

@ Gokham

I second your question

I suspect adding the new CHG to the mix might alter (possibly significantly) by canabalizing some of the other components- possibly including the EEF fraction
It'll be very interesting to se

Roy King said...

"Here's a global plot with Mota and Kotias."
Please do a global plot of PC2 vs PC3 and PC2 vs PC4 with Mota and Kotias.
Thanks! These are very helpful.

Open Genomes said...

David, you've done Human Origins World 1&2 PCAs for various ancient samples. Can you add NE1 and KO2 (as a proxy for the Neolithic Anatolians) in the same plot as Mota and Kotias? If you can accommodate K14 and MA-1, and the WHGs, SHGs, ane EHGs, add those too.

As you can see from the World 1&2 PCA, Stuttgart is *not* where the Bedouin B are at all. Stuttgart is "above the peak of the apex of the triangle", higher up than the Sardinians. La Brana-1 is not far from Kotias, near the North_Ossetians and the other WHGs are in the same vicinity. This is the true picture of Eurasian variation. We have a continuum that goes from the "EF" (or rather, Kebaran Hunter-Gatherers at the end of the LGM) to the Ulchi at the other end. Kotias is already "on the way" into Eurasia, which is why the CHGs cluster with Kostenki and MA-1 on the TreeMix graphs apparently even from 41,500 years onward, when the climate turned colder and drier, the populations became isolated, and the drift began.

The basic point here is that you cannot show the real picture of Eurasian variation without including Africans, particularly Mota and the Hadza. Are PC 1&2 83% of the variation? Other dimensions, PC 1&3, 2&3, etc. would be valuable too.

Shaikorth said...

@Open Genomes, the "triangle"-shaped PCA with Africans actually doesn't show much about the real variation of Eurasians as Africans will squeeze some of the Eurasian variation out and the Sardinia vs. East Asia polarity is largely due to drift or sample sizes. This obscures certain things easily verifiable with formal testing, like Papuans actually being the most divergent Eurasians.

If you want Africans and Eurasians on the same plot, I think SpaceMix from Coop et al. provides a decent one, like:

Davidski said...

Placing ancient samples in the less significant PCA dimensions is difficult. But here's a PCA datasheet with data for lots of ancient samples and nine dimensions.

It can be plotted with any plotting software like Gnuplot, Past3 or Excel. The Gnuplot arguments for a 1&2 dimensional PCA are...

plot 'Global9.txt' using 3:4:1 with labels

For a 1&3 PCA they are...

plot 'Global9.txt' using 3:5:1 with labels

For a 1&9 PCA they are....

plot 'Global9.txt' using 3:10:1 with labels

For a 3D PCA they are...

splot 'Global9.txt' using 3:4:5:1 with labels

You can zoom in with the zoom tool, and you can print images like this...

set term png size 2000, 1000

...hit enter...

set output "MDS.png"

...hit enter...

plot 'Global9.txt' using 3:4:1 with labels

Davidski said...

Someone else just used population averages from the Global9 datasheet to run a West Eurasia only PCA. Pretty cool.

ryukendo kendow said...
ryukendo kendow said...
Chad Rohlfsen said...


I can do a few runs again, with some South Asians. I haven't converted my new set with the Paniyas yet, but I can do several groups. It would be preferable to use CHG, but I'm not set up for that yet. The thing that I'm looking at is the potential for ENA and CHG in EHG. I've got several intriguing Dstats which I will post here in a couple minutes. I have to move to my laptop with Plink and Admixture.

VOX said...

Davidski, given that Yamnaya = EHG + CHG and given the archaeological context of this mix, would it be possible to date this admixture event using ROLLOFF, which would also tests this model's accuracy.

Open Genomes said...

Here is a 3D Global9 PC plot of PC1-PC2-PC3:

Global9 3D PC plot with ancient DNA samples

We can see that it's really critical to show *all* samples, including Africans. The presence of Africans, Oceanians, East Asians, and Native Americans changes the picture completely for Eurasia.

Rather than "compressing" Eurasians, the presence of Africans shows us some fascinating things: The WHG-SHG-EHG group trends downward toward MA-1, who in turn leads to the Inuit and Na-Dene, and distantly to the Native Americans. However, Ust'-Ishim leads off on a separate upper South-Asian / Austroasiatic edge towards a vertex consisting of Japanese and Taiwanese Aboriginals. These in turn on another edge leads downward through Paleo-Siberians (Chukchi, etc.) to Native Americans.

A closer examination of the upper left reveals that the Early Farmers (EF) are nowhere near the Bedouin B, who appear to be admixed with Sub-Saharans, but rather, represent their own separate Eurasian vertex, today only populated by WHG-admixed Sardinians. The Kebaran Levantine Hunter-Gatherers would be even more isolated, beyond KO2 (Starcevo_EN) and the Anatolian Neolithic.

The lesson here is that all ancient and modern genomes need to be plotted together, and only then can we "zoom in" on a particular region of interest, knowing which way the drift is headed. The real "projection bias" (or rather "biased projection" ;) is when certain regions are left out, and and arbitrary 2-dimensional projection leaves out key variation that makes samples appear to be "related" when in fact they are not.

Rob said...

Open Genome

I'm not an expert on PCAs but I think your PCA looks very good

Davidski said...


Austroasiatic Kanjar LBK_EN Kotias 0.0024 0.971 111571
Austroasiatic Pulliyar LBK_EN Kotias -0.0009 -0.285 111572
Austroasiatic Paniya LBK_EN Kotias 0.0007 0.264 111881
Mbuti Yamnaya_Kalmykia LBK_EN Kotias 0.011 3.253 496714
Mbuti Yamnaya_Samara LBK_EN Kotias 0.0081 2.517 504561
Mbuti Yamnaya_Kalmykia Anatolia_Neolithic Kotias 0.0147 4.431 497977
Mbuti Yamnaya_Samara Anatolia_Neolithic Kotias 0.0122 3.918 505714

South Asia is still difficult to crack, because of all the layers of admixture, and the geographic and social clines in these layers that exist there.

But Yamnaya always shows a clear preference for Kotias. This can be seen in ADMIXTURE especially, and I might post some results later today or during the week.

The interesting thing is that Yamnaya Samara often shows inflated affinity to WHG, and in ADMIXTURE also some admixture from WHG. Whatever this is, it might be pulling Yamnaya Samara closer to LBK_EN too.

Btw, I don't think it's possible to run any X chromosome tests with the steppe samples. They usually have much less than 4000 SNPs on the X, which isn't enough.


Kotias-related admixture entered the steppe during the Khvalynsk period, at the latest. This can be seen on the PCA above, with the Khvalynsk samples forming a cline from EHG to the Bronze Age steppe.

I don't think Roloff can provide more accurate dates than ancient DNA, especially in this case, because most of the admixture appears to have happened gradually over a long period of time.

So the interesting question is why the admixture happened during the Khvalynsk period. As per the sad tale above of the Q1a man ending up dead in a ditch, it might have happened amidst both hostile and friendly relations with people coming from the south and east.

In other words, I suspect that in some cases men were killed and their women taken, and in others women were married off and moved hundreds of kilometers from their homes to be with their husbands.

Chad Rohlfsen said...


Looking at some various stats with Ust_Ishim and the hunters, I'm curious about something. I think it may be possible that Motala is closer to CHG, but not because Motala is closer to Crown West Eurasian, but that they have some CHG, maybe about the same as Karelia. I've seen that despite being closer to ENA, compared to other hunters, they're also further from Ust_Ishim more significantly than WHG. I'm wondering if this means that they have a few percent of actual ENA and CHG. Maybe, someone else can come up with some more stats to test.

Could you test the following:

Primate_Gorilla Kotias Iberia_Mesolithic Karelia_HG
Primate_Gorilla Kotias Loschbour Karelia_HG
Primate_Gorilla Kotias Hungary_HG Karelia_HG
Primate_Gorilla Kotias Motala_HG Karelia_HG
Primate_Gorilla Satsurblia Iberia_Mesolithic Karelia_HG
Primate_Gorilla Satsurblia Loschbour Karelia_HG
Primate_Gorilla Satsurblia Hungary_HG Karelia_HG
Primate_Gorilla Satsurblia Motala_HG Karelia_HG

Davidski said...


Primate_Gorilla Kotias Iberia_Mesolithic Karelia_HG 0.0104 1.467 374334
Primate_Gorilla Kotias Loschbour Karelia_HG 0.0016 0.232 384919
Primate_Gorilla Kotias Hungary_HG Karelia_HG 0.0011 0.142 325855
Primate_Gorilla Kotias Motala_HG Karelia_HG -0.0038 -0.711 417770
Primate_Gorilla Satsurblia Iberia_Mesolithic Karelia_HG 0.0161 2.06 305759
Primate_Gorilla Satsurblia Loschbour Karelia_HG 0.0039 0.49 312556
Primate_Gorilla Satsurblia Hungary_HG Karelia_HG 0.003 0.366 258522
Primate_Gorilla Satsurblia Motala_HG Karelia_HG 0.0027 0.45 339369

Aram said...


Your PCA is much better scaled than all what I see in this recent studies. Just few questions to figure out who is who there.
Who are the most 'Southern' Near Easterns (Crosses) ? Bedouin_Bs?
And what sign are the modern Armenians? Caucasian Circles or Near Eastern Crosses?
Thanks in advance.

a said...

@Open Genome
I agree with Rob-
Fantastic and interesting 3d plot.

Chad Rohlfsen said...

One way to crack it would be

Primate_Gorilla Paniya Dai Kotias
Primate_ Gorilla Paniya Dai Anatolia_Neolithic

That might tell us how close the West and Wast Eurasian are and which is better.

Chad Rohlfsen said...

East, sorry.

Davidski said...



Primate_Gorilla Paniya Dai Kotias -0.0499 -9.72 101132
Primate_Gorilla Paniya Dai Anatolia_Neolithic -0.0526 -16.075 119765

Kristiina said...

@ Chad “I'm wondering if this means that [Motala] have a few percent of actual ENA and CHG”

Also haplogroups support that view. On the one hand we see C1f in Mesolithic Karelia (a sister clade of C1c found in Apache and Arsario) and C1e is found in modern Icelanders (sister clade of C1b is found in Apache and Cayapa). People (yDNA Q1a?) who brought these haplorgoups to Scandinavia probably carried Northeast Asian ENA.

On the other hand, we see H which is probably H2a2b (which I previously erroneously named H2b) and yDNA J in Mesolithic Karelia. CHG was probably carried along to Fennoscandia with these haplogroups. H2a2b is still sporadically found in all Fennoscandia. All mtDNA H2a ( looks like having spread from north of Caucasus to Fennoscandia.

As for yDNA J, this FamilyTree map is interesting . Distribution of “J2b M12 confirmed near predicted and suspected, subclade un recognizable” in Scandinavia could be the result of a Caucasian wave. J2b M12 is still today frequent in Vologda and Rybinsk area and Volga Ural.

Alberto said...

Comparing these ones:

Mbuti Yamnaya_Kalmykia LBK_EN Kotias 0.011 3.253 496714
Mbuti Yamnaya_Samara LBK_EN Kotias 0.0081 2.517 504561

Primate_Gorilla Yamnaya Kotias LBK_EN -0.001 -0.245 271071

Is it Gorilla screwing things up or something else?

ryukendo kendow said...
Chad Rohlfsen said...

The ASI got better fits as a mix of Onge, Papuan, and Atayal. I posted them over at Anthrogenica, but can't find them.

Chad Rohlfsen said...

Wasnt there a mesolithic Indian that showed more modern like admixture? Using another South or South Central Asian group without a lot of ENA could be better.

Chad Rohlfsen said...

Primate_Gorilla LBK_EN Paniya Dai
Primate_Gorilla Anatolia_Neolithic Paniya Dai
Primate_Gorilla Armenian Paniya Dai
Primate_Gorilla Georgian Paniya Dai
Primate_Gorilla Kotias Paniya Dai
Primate_Gorilla Paniya Armenia Dai
Primate_Gorilla Paniya Georgian Dai

These might at least give an idea to the amount of ENA in ASI and Paniya in general. We might have not been looking with the right pop.

To check if it is the West Eurasian in Mbuti, we could look at...
Primate_Gorilla Mbuti Anatolia_Neolithic Kotias

Chad Rohlfsen said...

I can't seem to find all of those qpAdm stats at Anthrogenica, unfortunately. I'm about to send people on a scavenger hunt.

ryukendo kendow said...
Chad Rohlfsen said...

These are three different models for the Kharia

15.5% Onge
26.6% Papuan
30.8% Atayal
05.3% Bedouin
21.8% Georgian
chi-square .342 tail-prob .951898

23.2% Onge
18.2% Papuan
30.9% Atayal
01.0% Hadza
26.7% Georgian
chi-squre .706 tail-prob .871684

37.1% Onge
32.2% Atayal
02.2% Hadza
28.5% Georgian
chi-square .811 tail-prob .936969

Chad Rohlfsen said...

Here's another..

36.2% Onge
32.9% Atayal
02.2% Hadza
25.9% Georgian
02.6% Corded Ware
Chi- 0.882 tail- 0.829837

a said...


Any idea when we will see the Mathieson et al 2015, and Jones et al 2015 data converted/incorporated into Eurogenes K6-K10 & K15, for public viewing ? For example the above R1a-R1b-Q Kvhalynsk samples?

Alberto said...


I doubt it, but if it's West Eurasian admixture in Mbuti these ones should tell better and identify the possible offender(s) and the degree of it. Yoruba has at least double the West Eurasian admixture than Mbuti, and Mota possibly none. So the 3 should be significantly different. Chimp and Gorilla should be equal, and closer to Mota:

Mbuti Yamnaya_Kalmykia LBK_EN Kotias
Yoruba Yamnaya_Kalmykia LBK_EN Kotias
Mota Yamnaya_Kalmykia LBK_EN Kotias
Chimp Yamnaya_Kalmykia LBK_EN Kotias
Primate_Gorilla Yamnaya_Kalmykia LBK_EN Kotias

It would be important to sort this out first, because either the stats with Mbuti are wrong, or the stats with Gorilla are wrong (or maybe it's something else at play).

Gihanga Rwanda said...


Actually one of the strangest things about that Mota study is it estimated a purported ~7 layer of Western Eurasian "admixture" across the board with the Mbuti at 6% and Yoruba at 7%; the Dinka, Ju'hoansi, and Bantu speakers had identical results.

Alberto said...

@Gihanga Rwanda

Yes, you're right about Yoruba. I thought it was quite higher than Mbuti. Then probably instead of Yoruba something like:

Khomani Yamnaya_Kalmykia LBK_EN Kotias

could work to know if it's West Eurasian admixture in SSA making those differences in the stats. Though Mota vs Mbuti/Yoruba would tell by itself too.

Davidski said...


Primate_Gorilla LBK_EN Paniya Dai -0.0078 -2.578 119527
Primate_Gorilla Anatolia_Neolithic Paniya Dai -0.0079 -2.663 119765
Primate_Gorilla Armenian Paniya Dai -0.0097 -3.271 119904
Primate_Gorilla Georgian Paniya Dai -0.009 -3.024 119904
Primate_Gorilla Kotias Paniya Dai -0.0121 -2.873 101132
Primate_Gorilla Paniya Armenian Dai 0.0405 12.77 119904
Primate_Gorilla Paniya Georgian Dai 0.0402 12.863 119904

Mbuti Yamnaya_Kalmykia LBK_EN Kotias 0.011 3.253 496714
Yoruba Yamnaya_Kalmykia LBK_EN Kotias 0.0123 3.713 496714
Mota Yamnaya_Kalmykia LBK_EN Kotias 0.0106 2.427 451956
Chimp Yamnaya_Kalmykia LBK_EN Kotias 0.0119 2.984 496714
Primate_Gorilla Yamnaya_Kalmykia LBK_EN Kotias 0.0115 2.787 445305
Khomani Yamnaya_Kalmykia LBK_EN Kotias 0.0163 5.048 496714

By the way, rk, Chad was probably talking about this...??

Because there's definitely no Mesolithic ancient DNA from South Asia yet.

Alberto said...

Thanks Davidski.

So Khomani does introduce some bias, but all the rest of the outgroups are pretty much the same. I wonder then why the difference between these two:

Primate_Gorilla Yamnaya Kotias LBK_EN -0.001 -0.245 271071
Primate_Gorilla Yamnaya_Kalmykia LBK_EN Kotias 0.0115 2.787 445305

Maybe the first one was using only transversion sites? Anyway I think all the others make more sense that that first one.

Re: Indian Mesolithic sample, I also remember not long ago something about it. It was some HG from the Gangetic plain that showed some degree of affinity with modern inhabitants of the region, but it was not DNA, only craniometric data, I seem to remember. Not sure, though.

Chad Rohlfsen said...

Yeah, that might be right David.

Looking at those stats, The Paniya sure as hell don't look 50% West Eurasian. Maybe, not half that. Any ideas, rk?

Looking at those Yamnaya numbers, I see no issue with Gorilla. Africans move the numbers with their differing relationship to West Eurasians.

Chad Rohlfsen said...

Time for Treemix?

Chad Rohlfsen said...

Maybe using Ju_hoan_North, Mbuti, Yoruba, Mota, Denisovan, Papuan, Atayal, Dai, Paniya, Anatolia_Neolithic, Kotias, Andronovo_BA?

Roy King said...

OGF via Ted Kandell did a nice 3D interactive graphic for the world PC1 vs PC2 vs PC3 furnished by Davidski:

Davidski said...

Very nice indeed.

Btw, qpAdm shows Paniya to be 65/35 CHG/Dai and 0% BA steppe.

Open Genomes said...

Here is an INTERACTIVE 3-D PCA Plot of Global9 PC1-PC2-PC3 which can be rotated and enlarged, and where samples can be identified when you mouse over them.

Interactive 3-D Eurogenes Global9 PCA Plot with ancient and modern samples

Here is a PCA projection / guide showing the population and migration edges for Eurasia in the foreground:

Eurogenes Global9 PCA Plot showing populations and migration edges

A 3-D PCA plot is much more informative than a 2-D projection, because any projection can appear to falsely superimpose and samples and foreshorten distances. With this interactive 3-D plot it's easy to see the true relationships between populations and ancient samples, and even the directions of admixture.

For example, it's possible to see that Mota clusters with the Hadza and Sandawe rather than with the Aari Cultivators of Ethiopia.

There does seem to be a correlation between Y haplogroups and the plot. Notice that the Early Farmer (EF) (Y-DNA G, T, and H2) is completely basal branch of Eurasians right at "Out of Africa", and that there is a progression of Y-DNA J => H1/H3 => NO => O toward the Austronesians, while another migration edge is roughly I2 => R1a/R1b => C2 => Q1a toward the Americas.

Have a look, see what you can find, and have fun!

a said...

Open Genomes T.K& company. gratitude
You can see a line from R*-H.G.`s-Karelia-Samara to R1s leading into Europe.

Chad Rohlfsen said...


I see one qpAdm that failed. Is there more? Can you drop Dai and input Papuans and Atayal and Australian and Atayal? Thanks!

Chad Rohlfsen said...

It might not hurt to add Anatolia_Neolithic too.

Qagan said...

I have off topic question.

I am confused is the ANE a West or East Eurasian component, mix of both or a unique component?

I ask this because I notice that for example, Ulchi sample score approximately 13% ANE according to estimations by Lazaridis this thread:

but at the same time score approximately 100% East Eurasian in this admixture result of a run at K3:

Does this mean that Ulchi samples actually have some West Eurasian and how much is it?

Thank you very much

Davidski said...

Ulchi are part EHG, which was classified as ANE in the Laz paper, and yes, this represents West Eurasian admixture in them.

Qagan said...


You mean Ulchi are 1/4 EHG? Do you have the spreadsheet the averages for each population?

Thank you very much

Davidski said...


As far as I can remember based on some tests I ran, Ulchi are around 15% EHG or ANE. I'd need to double check that. I don't have a spreadsheet.


The Papuans and Australians are basically interchangeable in this model. But the standard errors are too high for these results to be considered valid IMO.

Qagan said...


Thank you very much. Yes if you can check on it, I will appreciate it. So if I want to find out the actual West Eurasian ancestry I need to look at the ANE percentages in each populations?

Davidski said...

OK, for Ulchis I'm getting 8.3% MA1 and 6.6% Karelia_HG, with ~2% error margins. Scroll down to the bottom here...

The reason for these different estimates is the lack of correct ancient reference samples. But the upshot is that yes, Ulchis have some West Eurasian admixture of the hunter-gatherer kind.

Qagan said...

Thank you very much do you know why Ulchis are shown as 100% East Eurasian in Eurasia K3 run?

So Ulchis have around 6.6-8.3% West Eurasian admix based on MA1 and Karelia_HG? Sorry for asking such question out of ignorance but I am still pretty much new in population genetics.

Alberto said...

@Open Genomes

Thank you, that's look amazing. Very informative, indeed.

Is it possible with that same data to make a West Eurasian only one? Or you'd need a different dataset for that?

Alberto said...


In qpAdm the standard errors refer to the best coefficients option alone? Because otherwise the second option:

Atayal: 25.6%
Papuan: 30.6%
Kotias: 43.7%

chisq: 2.758 tail prob: 0.43

Looks quite decent. So maybe just running that same without Anatolia_Neolithic gives lower errors by picking this second option as the best one.

ryukendo kendow said...
Balaji said...

Alberto, Davidski,

Thanks for clarifying that from the D statistics, the Yamnaya (and Afanasieve) are the only populations that we know to favor Kotias over LBK_EN. Even Corded Ware which is supposed to be 80% Yamnaya favors LBK_EN over Kotias.

Mbuti Corded_Ware_LN Kotias LBK_EN 0.0178 4.702 302875

All modern European populations must strongly favor LBK_EN. South Asian and East Asian populations do not choose between LBK_EN and Kotias. For the Near East, I found the following statistics that Davidski calculated.

Primate_Gorilla BedouinB LBK_EN Kotias -0.0382 -9.955 271793
Primate_Gorilla Armenian LBK_EN Kotias -0.0235 -5.95 271793

It will be good to find out how more Near Eastern populations choose between the two. I suspect that they will favor LBK_EN, even people of the Caucasus. Davidski could you calculate the following D statistics when you get the time?

Chimp Lithuanian LBK_EN Kotias
Chimp Georgian LBK_EN Kotias
Chimp Lezgin LBK_EN Kotias
Chimp Assyrian LBK_EN Kotias
Chimp Syrian LBK_EN Kotias
Chimp Itanian LBK_EN Kotias

Davidski said...

West_Eurasia9 PCA datasheet...


This expanded Corded Ware sample varies from 75% Yamnaya to as little as 35%. The average is about 60%. The rest is Middle Neolithic European.

ryukendo kendow said...
Kurti said...

I don't understand why the people are always talking about Kotias while asking for includment of CHG in a new calculator. Isn't it clear from the paper that Kotias is A. the younger B. the EF(25%) admixed and therefore less pure sample of the both CHG samples.

Satsurbila is the one which shows no signs of outside admixture whatsoever, so if any than Satsurbila should be used for future calculators not Kotias.

Davidski said...



Chimp Lithuanian LBK_EN Kotias -0.0273 -7.781 507266
Chimp Georgian LBK_EN Kotias -0.0003 -0.081 507266
Chimp Lezgin LBK_EN Kotias -0.0021 -0.631 507266
Chimp Assyrian LBK_EN Kotias -0.0245 -5.684 112556
Chimp Syrian LBK_EN Kotias -0.0232 -6.978 507266
Chimp Italian_Tuscan LBK_EN Kotias -0.0371 -10.629 507266


Satsurblia looks less mixed because it's a low coverage haploid genome. Kotias looks more mixed because it's a high coverage diploid individual representing a whole population against various heavily drifted modern populations.

By the way, the Scythian from Mathieson shares highest drift with Latvians and Lithuanians. :p

Tobus said...


Chimp Kharia Onge Dai 0.022 7.992
Chimp Kharia Onge Japanese 0.0166 6.218
Chimp Kharia Onge Papuan -0.0397 -10.151
Kharia Papuan Onge Dai -0.0301 -10.838
Hadza Kharia Onge Dai 0.0227 10.255
Ust_Ishim Kharia Onge Dai 0.0231 6.536
Papuan Kharia Onge Dai 0.0301 10.838
LBK_EN Kharia Onge Dai 0.0171 7.504

Kharia Onge Dai Georgian -0.0032 -1.547
Kharia Onge Dai Lezgin -0.0046 -2.214
Kharia Onge Dai Armenian -0.0024 -1.166
Kharia Onge Dai Abkhasian -0.0043 -2.05
Kharia Onge Dai Balochi -0.007 -3.678
Kharia Onge Dai Brahui -0.0076 -3.935

ryukendo kendow said...
Alberto said...


Thanks. So that's interesting for the methodology of using qpAdm. Probably as RK said a while back, better to add one by one to test each combination separately and see if they improve the model or not.

Also knowing that Paniya is ASI+CHG and takes no European LNBA could serve as a more realistic base than using Dai to get accurate results about Andronovo/Sintashta admixture. For example to model an Indo-Aryan population as Paniya + Kotias + X, where X can be Sintashta/Androvo, EHG, MA1,...

Kurti said...

Davidski said

"By the way, the Scythian from Mathieson shares highest drift with Latvians and Lithuanians. :p"

According to which study lol. He fits perfectly as a "mixed" individual based on his admixture results, similar to Yamna and Andronovo (if not even slightly more South and eastern shifted) on PCA plots.

Maybe he shares "highest drifts" with them but that doesn't mean he is automatically very close to them either. As we know things such as "highest" are relative ;)

Kurti said...

And about the Kotias and Satsurbila issue, well the former might be high coverage, but the study itself states there is EEF like mixture in Kotias probably slowy reaching Anatolian farmers in the Caucasus. Satsurbila on the other hand is low coverage yes but he doesn't seem to show signs of EF admixture in combination with his age this is a strong indication that he is obviously less mixed.

Shaikorth said...

Kurti, Lithuanians and Latvians share the most drift with many populations that may look more distant based on ADMIXTURE or PCA, both ancient and modern:

Mordovian Lithuania : Chuvash MBUTI -0.0039 -4.050
[Kargopol] Russian Lithuania : Chuvash MBUTI -0.0033 -3.406

It should come as no surprise if they peak the Scythian sharing.

capra internetensis said...


Kharia are Austro-Asiatic, they share recent O2a1 haplogroups with Dai extensively.

VOX said...

Tobus, can you try:

Chimp Onge Dai Japanese

Correct me if I'm wrong, but I think this would indicate if the Dai have South Eurasian type admixture, if negative enough.

Kurti said...


Thats what I tried to say. No suprise with them sharing "highest" compared to other populations. But highest is relative. Turkic groups in Iran have "highest" East Asian admixture but that doesn't make them remotely similar to East Asians.
Just to give a more extreme and drastic example to make my point clear.

Obviously any ancient Satem Indo European and Uralic group from the Steppe region will share significant drifts with Lithuanians/Latvians. But from what I have seen the Iron Age Scythian sample looks like belonging to a population which can be modeled inbetween Lithuanians and a different West and South_Central Asian group. This leads to my statement years ago that the North and East Iranic groups are the once who are the missing gap between North Caucasus, South_Central Asians and East Europans.

Davidski said...

Turkic groups in Iran have "highest" East Asian admixture but that doesn't make them remotely similar to East Asians.

I didn't say Lithuanians share the highest drift with the Scythian from among Europeans. I said Lithuanians share the highest drift with the Scythian.

And no, the Scythian can't be modeled as Lithuanian/South Central Asian, because he lacks South Asian ancestry.

Chad Rohlfsen said...

The use of Dai is a worse fit in
qpAdm. In fact, Dai are closer to West Eurasian Caucasus pops than the Onge. I've said before, and I still believe that the Dai do have West Eurasian admixture, and cause more problems. Here are the Kharia, with the Dai

Onge 46.0%
Dai 27.6%
Hadza 1.0%
Georgian 25.4%

chisq 2.330 tail prob .675224

The Dai clearly have West Eurasian ancestry, and show clear affinity to EHG, MA1, and Nganasan. This is why they are closer to South Asians, it is their West Eurasian ancestry and ENA. The Onge, clearly model with better fits for ASI, which is probably a mix of Onge and Atayal-like stuff.

result: Gorilla Karelia_HG Onge Dai 0.0162 4.088 15538 15043 317554
result: Gorilla Yamnaya Onge Dai 0.0077 2.285 14404 14185 297501
result: Gorilla LBK_EN1 Onge Dai 0.0057 1.820 15807 15629 328505
result: Gorilla Spain_EN Onge Dai 0.0050 1.516 15556 15401 323550
result: Gorilla Armenian Onge Dai 0.0082 2.800 15876 15618 329241
result: Gorilla Georgian Onge Dai 0.0082 2.813 15883 15626 329241
result: Gorilla BedouinB Onge Dai 0.0063 2.160 15655 15458 329241
result: Gorilla Iraqi_Jew Onge Dai 0.0085 2.793 15828 15562 329241
result: Gorilla Kharia Onge Dai 0.0212 7.772 16330 15652 329241
result: Gorilla Onge Atayal Dai 0.0039 1.689 14572 14459 329241
result: Gorilla Onge Han Dai 0.0030 2.085 14592 14505 329241
result: Gorilla Onge Dai Atayal -0.0039 -1.689 14459 14572 329241
result: Gorilla Nganasan Onge Dai 0.0739 24.203 17301 14920 329241
result: Gorilla Dai Onge Nganasan 0.0577 16.732 17301 15414 329241
result: Gorilla Atayal Onge Dai 0.1093 35.865 18007 14459 329241
result: Gorilla Atayal Onge Dai 0.1093 35.865 18007 14459 329241
result: Gorilla Australian Onge Dai -0.0057 -1.582 15567 15747 329240

Atayal 89.4%
MA1 9.9%
Australian 0.7%

chisq 4.236 tail prob .237128

Atayal 58.3%
Nganasan 30.5%
Australian 11.2%

chisq 3.484 tail prob .322788

Atayal 36.3%
Nganasan 44.2%
Onge 19.6%

chisq 1.511 tail prob .679826

I'm still working on better fits.

Shaikorth said...

Chad, can you do these too:

Gorilla Karelia_HG Onge Atayal
Gorilla Georgian Onge Atayal
Gorilla Nganasan Onge Atayal

Tobus said...


Chimp Onge Dai Japanese -0.0016 -0.996

Tobus said...

.. and while I'm at it:


Gorilla Karelia_HG Onge Atayal 0.0165 3.809
Gorilla Georgian Onge Atayal 0.0062 1.89
Gorilla Nganasan Onge Atayal 0.0745 20.851

Shaikorth said...

Thanks, looks like whatever affinities there are between EHG/ANE and Dai extend to Atayal.

Seinundzeit said...

A side note, but I just realized that the qpAdm model of Pashtuns and Kalash as CHG + EHG + AEN/EEF + ENA is strikingly similar to Zack's old K11 Onge run. A comparison, using Pashtuns:

57.5% CHG + 17.7% EHG + 12.8% ENA + 12% AEN/EEF

South Asia=48%
SW Asian=17%
East Asian=1%

Obviously, we are dealing with radically different methods, and radically different kinds of output. Any direct comparison is somewhat problematic. And anything based on formal stats takes precedence over ADMIXTURE output. The qpAdm output is determinative. But I'm simply struck by the similarity. For example, "South Asian" is a West Eurasian component that peaks in South Asia, West Asia, the Caucasus, and has a bias towards appearing more strongly in Northern Europe rather than Southern Europe. Basically, it acts like CHG. "SW Asian" is a composite of the EEF-like, Bedouin-like, and CHG-like components that often appear in ADMIXTURE. Here, it takes the place of AEN/EEF for Pashtuns, but takes some CHG with it. "South Asian" + some of "SW Asian" is identical to the amount of CHG shown by qpAdm. The "European" score is almost identical to the EHG percentage. And the percentage of "Onge" + "East Asian" is identical to the ENA score in qpAdm. I just find it interesting that the two sets of results are so similar. Probably a good indication that this model closely approximates reality.

A weird detail that I just noticed, Pashtuns have about the same amount of EHG as populations from the British Isles, while the Kalash have about the same amount as Scandinavian and Eastern European populations, looking at the Haak et al. supplements.


If possible, could you try to model Pashtuns as Andronovo + Paniya + Armenian, and Kalash as Andronovo + Paniya + Georgian? In the absence of South Asian aDNA, the Paniya are great for this sort of thing. Thanks in advance.

Chad Rohlfsen said...

Treemix did show a 16% edge from the root of EHG into all non Papuan ena.

FrankN said...

@Chad: "The Dai clearly have West Eurasian ancestry, and show clear affinity to EHG, MA1, and Nganasan. This is why they are closer to South Asians."
There may be another, much more simple explanation for Dai being close to South Asians: IVC is known to have grown rice, at least during its late stages. While the "homeland" of rice domestication hasn't yet been unambiguously determined, Yunnan ranks at the top of the candidate regions. Since migration of crops tends to be associated with migration of people, I deem a migration of Dai-like people into the Indus Valley by around, say, the first half of the 3rd mill. BC, anything but unlikely.

Conversely, Yunnan is the world's single largest tin producer today - a commodity that is indispensable for bronze production, but only found in mineable concentrations in a few places around the globe. Bronze appears rather early in Yunnan, in high technical and artistic sophistication, geographically disconnected from the main entrance route of bronzeworking into East Asia along the northern branch of the silk road.
"By this time (2nd ct. BC), agricultural technology in Yunnan had improved markedly. The local people used bronze tools, plows and kept a variety of livestock, including cattle, horses, sheep, goats, pigs and dogs. Anthropologists have determined that these people were related to the people now known as the Tai."

Moreover, the standard theory of bronzemaking being disseminated southward from Northern China into SEA is more and more getting into conflict with C14 dating of SEA sites. Discussion is still on-going.

VOX said...

"Chimp Onge Dai Japanese -0.0016 -0.996"

Hi, Tobus, thanks for the stats. It looks like Dai might be slightly closer to Onge, although not at significant levels. According to the analysis of Khrunin et al, the Han, Sherpa, Dai and Malaysians harbour about 19% Australian-like admixture. Anybody else has any ideas?

Chad Rohlfsen said...

It's equal because the Dai can be modeled as atayal, siberian, and Papuan. Japanese as Atayal and Siberian. It's the same with admixture and you can see it on a PCA.

Balaji said...


Thanks for the D-stats. It is interesting to compared Georgians to Armenians.

Primate_Gorilla Armenian LBK_EN Kotias -0.0235 -5.95 271793
Chimp Georgian LBK_EN Kotias -0.0003 -0.081 507266

Whereas Armenian has much more LBK_EN than CHG related ancestry, Georgian shows no preference for either LBK_EN or CHG. Clearly Georgian has much more CHG ancestry than Armenian. The Caucasus mountains have been quite effective in impeding gene flow. I had meant to request the statistics for Iranian but had mistyped.

Chimp Iranian LBK_EN Kotias

I expect Iranian to favor LBK_EN over Kotias but I may be wrong and perhaps Iranian too will have no preference.

Open Genomes said...

@a and @Alberto - Thanks! :)

Eurogenes Global9 3-D PCA Plot PC1-PC2-PC3

Eurogenes Global9 PCA plot PC1-PC2-PC3 2-D projection showing populations and some Y haplogroups

Alberto, I think the "secret" of this 3-D PCA Plot is that in fact it does include Africans. The way to examine West Eurasians closely is just to rotate the plot appropriately, and then zoom in real close on that smaller section of Eurasia, and examine the 3-D relationships. There's no way we could have seen the "pull" toward Sub-Saharan Africans in the Palestinians, Bedouin and North Africans which is *not* from the EF Early Farmers unless we have Mota and the other Africans on the plot. The most striking finding is that the EF (Early Farmers) are indeed *the* "Basal Eurasian" branch, and the CHGs (Caucasus Hunter-Gatherers) are in fact *not* any sort of "Basal Eurasian" but something headed out to Ust'-Ishim, the Austroasiatics, and the Austronesians of Taiwan. Call it "ASI/ANI" if you will. ("ASI" includes the Andamanese.) Likewise, the WHG/SHG/EHG group is related to Mal'ta boy and on to the Native Americans, it's a kind of "WHG-ANE" continuum. Of course, this is precisely what we've seen in TreeMix, except there's a bit of confusion between the Early *European* Farmers and the EF "Basal Eurasians".

Given that there are "poles of drift" - or rather, the "points, tail and string of the kite" ;) then perhaps TreeMix would work best with Aytal, Karitiana, Ust'-Ishim, MA-1, Kostenki K14, and the Starcevo Early Farmer KO2 (who seems to be "ultra-Anatolian Farmer"), with Mota and the Ju-Hoan as outgroups. The Papuans / Australians may prove useful too. That way, adding the CHGs (preferably Satsurblia over Kotias) and the various "European" Hunter-Gatherers, will reveal the real combination of admixture found in any "test" individual or population.

Also, is Saqqaq Man in the data? What about Clovis Anzick-1 and Kennewick? Given that Native Americans are one "pole" of admixture, these ancient genomes are going to be very important, particularly to fill in the "ANE" migration path between Mal'ta boy and the Native Americans. It seems that since the R1a1* Karelian EHG comes out at "16% Native American", it's very important to have these ancient Native Americans to distinguish any so-called "ANE" from the Caucasus and the area of Tajikistan from something related to Scandinavian Q-L805 which is in a "Native American" Y clade, Q-M930 that also includes Q-M3 (Kennewick) and below Q-M1107 which includes the Q-Z780 sister clade of Q-M930 to which Clovis Anzick-1 belongs. It would seem that this one single Q-L805 represents the unique instance of actual Beringian admixture in north Eurasia.
BTW, the sample I0434 from Khvalynsk is Q-L474 xL56, in the same clade as Saqqaq Man.

I think with these additional ancient American samples the North Eurasian drift towards the Native Americans will become clearer.

Really, all Eurasians are some combination of drift toward or away from KO2, the Ami/Aytal, and the Karitiana, and back towards Mota, except for the Oceanians whose Denisovan admixture pulls them in another direction.

Let's see what these other ancient American genomes do to the PCA and TreeMix.

Davidski said...


Chimp Iranian LBK_EN Kotias -0.0099 -2.857 507266

capra internetensis said...

Would someone who is running D stats mind doing

Chimp Ust_Ishim Karitiana Clovis ?

As a test for artifacts of age differences.


Q-L805 is not the only Beringian suspect, there are also the Eurasian mitochondrial C1 clades (and it is possible that more upstream clades like L330 are Beringian too). We don't know if I0434 is more related to Saqqaq than any given Q1a; Saqqaq is in Q1a1a-NWT01(xM120) specifically, the Khvalynsk man is just Q1a(xQ1a2).

Amerindians are a "pole" of admixture because they have a lot of specific drift, not necessarily because they have any importance as an ancestral population outside the New World - though I do think there is likely to be significant Beringian ancestry in Eurasia, I strongly doubt it is the main source of ANE.

Arch Hades said...

Khvalynsk didnt have the wheeled vehicles or domesticated horse. It was only later Yamnaya that had that, right? If i'm not mistaken.

If that's true then they might be some very early form of Proto Indo-European but their cultures isn't classic Proto Indo-European.

Krefter said...


Can you post ADMIXTURE or PCA results for the Sycthian?

ryukendo kendow said...
Davidski said...


Khvalynsk is generally considered Pre-Proto-PIE, while Yamnaya late PIE. Samara and/or Sredny Stog are generally seen as Proto-PIE.


Chimp Ust_Ishim Karitiana Clovis 0.0024 0.38 352350


I'm still working on the Admixure stuff, but I can tell you that this Scythian has around 10% of Siberian ancestry, and I'm not talking about ANE here. Much more than Finns, but not as much as Chuvashs.


Must've missed it. Please re-post the list.

Balaji said...


Thank you very much. The statistics that you have provided show that CHG could not have moved from the Caucasus to the Indian Subcontinent. CHG had a hard enough time going from Georgia to Armenia. From the Caucasus to Iran to the Subcontinent there are more formidable geographical barriers.

Iranian has more CHG-related ancestry than Armenian. I think this is because Iranian received CHG-related ancestry both from the Caucasus and from India. Still Iranian has more LBK_EN related ancestry than CHG-related ancestry.

Chimp Iranian LBK_EN Kotias -0.0099 -2.857 507266

All this goes to show that agriculture in South Asia which is at least 10,000 years old did not come with migrants from the Near East. They would have had more of LBK_EN ancestry. It was an indigenous development. This also means that ANI has been in the Subcontinent since the Late Pleistocene. ADMIXTURE analysis further suggests that CHG-related ancestry in South Asia is of the Gedrosia kind different from the Caucasus kind.

Shaikorth said...

These stats should allow for a f4 ratio estimate to check how much ancestry full Austronesians share with ANE, same method was used for Siberians by Flegontov et al.

f4(Loschbour, Gorilla; Atayal, Onge)
f4(Loschbour, Gorilla; MA-1, Onge)
f4(Onge, Gorilla; Loschbour, MA-1)) (Z<2 here ensures the Onge is a decent reference)

Tobus said...

Loschbour Gorilla Atayal Onge 0.0006 0.14
Loschbour Gorilla MA1 Onge 0.0569 7.769
Onge Gorilla Loschbour MA1 -0.0086 -1.151

Karl_K said...


"agriculture in South Asia which is at least 10,000 years old did not come with migrants from the Near East. They would have had more of LBK_EN ancestry."

It is an interesting situation for sure. The archaeology shows multiple centers of early farming with local plants. Yet clearly, agriculturalists across the entire fertile crescent very early on started using the exact same domesticated crops. Any useful traits were bred into their own local landraces.

So it seems that in that region 10,000 years ago, at least some seeds and animals were traded and passed around much faster than people were admixing.

And some farming knowledge must have also been passed around. How else could such diverse people all coincidentally domesticate the exact same eight plant species at exactly the same time?

Shaikorth said...

Thanks Tobus, the result here is just 1% which may be be too low to explain the preference of Caucasus/EHG/Siberia for Atayal over Onge.

If you have time, could you do the same stats but MA-1 replaced with Karelia HG and Kostenki14?

ryukendo kendow said...
Chad Rohlfsen said...

I think Dai having West Eurasian ancestry is causing them to look closer than the Onge. It's a weak fit to make the Dai without Siberian and Onge Admixture. Using Onge and Atayal makes a much better fit for ASI.

Open Genomes said...

@capra internetensis, the idea that Native Americans are a "pole of admixture" does not mean that in fact Eurasians have (any substantial) "Beringian" ancestry, aside from the somewhat small clades you mentioned. Rather, the correct term should be a "pole of drift", where Central Siberian migration to the Americas was the end result of a process of isolation and drift we already see in Eurasia, i.e. with Mal'ta boy. Regardless, this "pole of drift" is in fact important for our understanding of Eurasian drift and admixture. As we know, "ANE" was modeled on the derived alleles shared between the Karitiana and Mal'ta boy, so this represents what has been called "ANE", even if the concept may not be entirely accurate regarding more southerly Eurasians such as in the Northeast Caucasus and the other "ANE hotspot" around Tajikistan.

The value in using the ancient Native American genomes is of course that they are much closer in time (and space) to the Eurasian source of the drift. It may be that Saqqaq man is purely Paleo-Siberian (probably a Koryak) and therefore a completely different source population and migration than Clovis and Kennewick. Between this apparent "Dorset" population in Eastern Canada and Greenland, the Na-Dene related to the Kets and other Yeniseians, the Amerinds / "First Americans" (and an "East Asian" as well as "ANE" component to their ancestry", and the apparent minor "Papuan" element among a few South American tribes like the Karitiana, we can see there were quite a few populations that contributed to this "pole of drift". This is really why Native Americans are at the extreme of a "triangle" rather than a "line". (Notice too that South American tribes "make a turn" in the general drift in the Americas, due to some additional ancestral element, perhaps this "Papuan" ancestry.)

Regardless, since the Native Americans are at several extremes of drift that was already taking place before the settlement of the Americas, all of these ancestral components accentuate and emphasize this Eurasian drift in a way that would not be possible if they were not on the 3-D PCA.

I can think of other apparent population isolates that are not in this Human Origins Array dataset, namely the Onge and the Tibetans, the Semang of Malaysia and the Aeta of the Philippines. I suspect that the Tibetans in particular will show up at some unusual place within the triangle because of their long isolation due to their physical adaptations to the extreme altitude of the Tibetan Plateau. We can see this in their unusually high percentage of Y haplogroup D, just like the Andamanese and Japanese, other physically isolated East Asian populations.

Perhaps something can be done to "round out" the dataset by including these other isolates along with the ancient Americans?

I suspect that this may create some "pull outward" even for such Siberian-admixed populations like the Karelian Hunter-Gatherers and further clarify the PCA plot.

The main point here is that even a close examination of the PCA of a small region on the plot such as Europe cannot be done properly without including *all* extremes of drift on the same analysis. We would never have seen that the CHGs were very different from the Early Farmers ("Basal Eurasians"), headed in the direction of the Austronesians, or that in fact the European Hunter-Gatherers (all three groups) were headed in the direction of the Native Americans, and the true nature of "LBK" (in fact, EF) admixture in Africa, and the fact that the EFs are the only true "Basal Eurasians" and not at all "Bedouin_B-like", without the Aytal, Karitiana, the Mbuti and the San. on the very same plot as the LBK, Corded Ware and Bell Beaker samples.

Open Genomes said...

@David, can we have a Global9 with Clovis, Kennewick, Saqqaq, Tibetans, Onge, Semang, Aeta (or related people), to "round out" the PCA?
It's seems reasonable the Europeans and the CHGs will be "less compressed" if these were on the plot, because they should accentuate the sources of drift in Eurasia. Thanks.

Shaikorth said...

However Atayal gives similar West Eurasian shifts compared to Onge as Dai do, so if there is that kind of ancestry in Dai it should be in Atayal too. This would leave Onge as the one pure ENA reference since Papuans etc. are complicated by archaic admixture.

Gorilla Karelia_HG Onge Dai 0.0162
Gorilla Nganasan Onge Dai 0.0739

Gorilla Karelia_HG Onge Atayal 0.0165
Gorilla Nganasan Onge Atayal 0.0745

postneo said...

While knowledge may have passed around not the same species of crops and animals were domesticated. Even for barley two different strains were domesticated with different regional centers.

Alberto said...

After reading FrankN's interesting comment and looking at the stats, it's looking more like most of what we call ASI could be a late migration from SE Asia to India. This migration is also supported by a recent study from National Geographic regarding Y haplogroup O-M95.

This would also be a more parsimonious explanation for the late estimates of admixture between ANI and ASI. It looks quite clear that ANI was in the Indus Valley long before 2200 BC (oldest estimate date of admixture), so it could have been ASI which arrived during the late Harappan period there.

I don't think that all the ANI-ASI will be a single event/migration. It's probably going to be quite more complicated, with different waves at different times, both ways. But it's looking like the biggest event might have been this hypothetical late Bronze Age migration from SE Asia to India.

This would increase the chances of the Harappan DNA (if/when it comes) being pure ANI, which would be quite interesting too.

Karl_K said...

"it's looking more like most of what we call ASI could be a late migration from SE Asia to India"

Then who was in India before that? ANE like people, CHG like people? This should have been a territory with a large sustained population for a very long time. How could they have disappeared with so little a trace in such recent history?

jparada said...


So, was the scythian a baltic speaker? anyways, these baltic peoples seem to be quite isolated, not only do they share most drift with Mesolithic euros but now with an Iron Age steppe individual.


Why should Europeans and west asians fall on an uninterrupted cline? if anything, before the Neolithic they were farther away from each other than they are now.

Chad Rohlfsen said...

The Kharia have more admixture from a Dai/Atayal group. Paniyas should be more Onge like. Onge like people are probably native to South Asia, and I would be surprised if something ANI like dates to the Paleolithic/Mesolithic. That may be why the South Asian cluster is a pain in the ass to break down. It's a mixed and heavily drifted group. I think the Austronesian came later on, more like the Mesolithic to Neolithic timeframe.

Chad Rohlfsen said...

Typo above. I meant to say that I wouldn't be surprised if something ANI like dated back to the Paleolthic/Mesolithic timeframe.

Alberto said...

Yes, before ASI arrived to North India, the people would be something like CHG + ANE. That's still the base of the populations of North India and Pakistan, so they didn't disappear, they just got influx from ASI populations.

Chad is probably right. There was probably an Onge-like component in South India earlier, and during the Bronze Age a SE Asian migration might have taken place, bringing Austro-Asiatic and expanding southern populations to the north.

Probably a complicated history, but in any case the point is mostly about the study about ANI-ASI admixture from a few years back with age estimates between 2200 BC and 900 BC (?). This was taken by many as a proof of Aryan invasions, but it's looking more that what it was detecting was a Dai-like migration to India and the subsequent ASI (a mix of Dai and Onge) expansion to the north.

Hypothetical, of course. But now more parsimonious than the old theory of Aryan Ivasion, I think.

postneo said...

Asi could have been there in peninsular and central India for a long time before moving to harappan areas.

capra internetensis said...

Thanks David

Zero result, no sign of any age effect (at least using genomes with decent coverage).


There was certainly a late Neolithic migration (or multiple waves of migration) from Southern China/Southeast Asia into India (c. 2000 BC?), bringing Austroasiatic languages, polished shouldered axes, and corded ware, as well as Y haplogroup O2a1-M95. Some of the Neolithic Gangetic sites have very early dates, before 6000 BC, but I'm not sure whether these are securely associated with Southeast Asian elements.

But this wave is associated with Austroasiatic tribals, and to a lesser degree with East Indians generally, O2a1 is Holocene age (there are not enough samples but I suspect in East India it is largely or almost entirely the young O2a1a2-F789 clade). South Indians with high ASI have negligible Y DNA O and do not show the Southeast Asian component in HarappaWorld admixture that Munda do (nor any significant East Eurasian outside of what is contained in the South Indian component).

There are earlier connections with Southeast Asia, e.g. Hoabinhian-type lithics, but this all poorly dated. During the LGM there was mostly horrible desert lying to the northwest of India (though along the edge of the Himalayas and Pamirs was probably OK) while India was covered mostly with savanna grassland and open forest, separated from not too different habitats in Southeast Asia by the Naga Hills.

Altogether I see no reason to think ASI is predominantly due to late gene flow from the East. I also think the Harappans were mainly ANI, but the earliest admixture dates come from Dravidian speakers of South India, and may represent the arrival of Neolithic/Chalcolithic farmers/pastoralists from the north. I guess the situation in the subcontinent was quite complicated, with plenty of gene flow in and out, and with major autochthonous components. It will be very hard to disentangle without aDNA.

capra internetensis said...

The North Indian admixture dates from Moorjani et al are very late indeed, Iron Age and historical era. Considerably later than the appearance of Southeast Asian Neolithic elements in the Ganges valley and the eastward migration of the Harappans. There must have been early admixture events but they are being obscured by late ones.

capra internetensis said...

How about

f4(Onge, Gorilla; Karelia_HG, Loschbour)
f4(Onge, Gorilla; Dai, Loschbour)

for East Asian admixture in EHG?

Rob said...

Hi. interesting hypothesis. Like Capra, however, I was going to point out that Y hg O, and austro-asiatic are not common enough in India to account for ASI ? But I'm sure you've thought about this :)

Chad Rohlfsen said...

result: Gorilla Onge Loschbour Karelia_HG -0.0035 -0.526 14410 14510 314720
result: Gorilla Onge Dai Loschbour -0.0593 -12.307 15648 17621 326345
result: Gorilla Onge Dai Karelia_HG -0.0627 -12.192 15043 17056 317554

Ryan said...

"And no, the Scythian can't be modeled as Lithuanian/South Central Asian, because he lacks South Asian ancestry."

Don't the Kalash share a strange amount of drift with Lithuanians? Is the Scythian a ghost population that contributed to both Lithuanians and the Kalash?

Davidski said...

Is the Scythian a ghost population that contributed to both Lithuanians and the Kalash?

No, the early Indo-Europeans from the Bronze Age steppe carrying R1a-M417/Z645 is the ghost population that contributed to Scythians, Lithuanians and South Asians.

There might be some minor Scythian ancestry in Lithuanians. But it can't be much considering the very low level of R1a-Z93 in the East Baltic and Siberian admixture at only a couple per cent, if even that. That Scythian is R1a-Z93 and has around 10% of Siberian admixture.

Alberto said...


Thanks, I don't know much about the details of Indian prehistory so it's good to hear a good summary and that it's not in disagreement with what I'm more or less seeing.

Indeed, the Y hg O is restricted to Austro-asiatic and not too relevant in itself, but the tests so far don't seem to show 2 clearly different types of ASI. The ENA in Paniya and in Kharia don't look too different, and both look quite Dai-like, and Dai itself being a mix of Atayal-like and Onge or Papuan-like. So yes, probably a complicated history, but with a result that these components are mixed more or less equally in South Indian and in SE Asian populations.

Maybe further test will be able to find the difference (like Admixture does, though admixture could be doing so for other reasons), but for now it looks to me that whichever migrations from SE Asia to India seem to have homogenized the ENA component, regardless of hg O or AA language. Or maybe we just don't have any good proxy for the "real" ASI so it shows up as Atayal+Onge/Papuan because Onge is just too drifted and not too related to continental ASI.

Tobus said...

@Shaikforth: If you have time, could you do the same stats but MA-1 replaced with Karelia HG and Kostenki14?

Loschbour Gorilla Karelia_HG Onge 0.1131 16.563
Onge Gorilla Loschbour Karelia_HG 0.0035 0.525

Loschbour Gorilla Kostenki14 Onge 0.06 8.919
Onge Gorilla Loschbour Kostenki14 0.0175 2.604

@ryukendo: is it possible for you to pass the data for CHG to Chad or Tobus in any form?

David has said the CHG data is freely available from the author, so I should be able to get it easily enough. The issue is that it can take a while to process and merge with my existing data set, depending on the format etc., and I don't have the time to dedicate to that at present... maybe this weekend I'll give it a go.

Shaikorth said...

Karelian gives only 0,5% into Atayal, that's definitely too little to explain the significant shift of Karelia HG towards Atayal over Onge. Maybe the formula of Flegontov paper, though it worked for Siberians and Native Americans, just isn't good enough here.

Kostenki was 1%, but that isn't a very good reference since Onge shares additional drift over Loschbour over it.

Alberto said...


I'm not sure if the above stats are correct, but the ones requested by Capra and run by Chad give some 6% Dai into Karelia_HG, so that might be enough to explain it.

Shaikorth said...

That should bring EHG closer to Onge too, assuming it is in the same clade as Dai and Atayal. North Siberians and Native Americans also prefer Atayal over Onge, which should not happen if both Atayal and Onge are fully ENA.

This is probably going to reveal nothing but we might still try

Loschbour Gorilla Motala12 Onge
Onge Gorilla Loschbour Motala12

capra internetensis said...


Unfortunately the sign is wrong, it's -6%, noise result.

Chad Rohlfsen said...

result: Loschbour Gorilla Motala_HG Onge 0.1788 36.462 19709 13729 318936
result: Onge Gorilla Loschbour Motala_HG 0.0047 0.909 13729 13600 318936

I'm not sure where you guys are getting this 1% and 6% numbers at. That is not what qpAdm shows. Best fit for Dai is a mix of Atayal, Nganasan, and Onge. If we have more Karelia into Dai and more Onge into Dai, then both can appear more closely related than they are. Onge are further from West Eurasians than the Dai and Atayal. Just Onge into Dai would make the Dai a bit further from West Eurasians, but the additional West Eurasian into Dai, almost evens it out.

bellbeakerblogger said...

@ Alberto, Capra,

You are probably already well aware of this, but it hasn't been mentioned here. The early Mehrgarh folk were largely Sundadonts, which pretty much necessitates ancestry from SE Asia (or Dai-like)

One thing to keep in mind is that Sundaland has been very heavily Sinicized, so a good proxy might be something more akin to Ainu at its northernmost edge (assuming they and Okinawans are slightly more Jomonese, less Yahyoized in their make-up)]
In that case, this Dai-Mehrgarh component of ASI or whatever, might have had paternal haplogroups more something like C & D..?

Chad Rohlfsen said...

Here's some stats. Dai is closer to Karelia than Onge. Dai is closer to Onge than Atayal and Ami. Ami is significantly closer to Atayal than Dai. They're not all the same.

result: Gorilla Loschbour Dai Atayal -0.0019 -0.683 14224 14277 326345
result: Gorilla Motala_HG Dai Atayal -0.0012 -0.514 14039 14074 321796
result: Gorilla Karelia_HG Dai Atayal 0.0004 0.131 13906 13895 317554
result: Gorilla Karitiana Dai Atayal 0.0029 1.129 14755 14671 329241
result: Gorilla Karitiana Dai Onge -0.0585 -17.356 15151 17034 329241
result: Gorilla Ami Dai Atayal 0.0255 11.476 15341 14579 329241
result: Gorilla Han Dai Atayal -0.0003 -0.140 14901 14910 329241
result: Gorilla Nganasan Dai Atayal 0.0010 0.429 14756 14727 329241
result: Gorilla Nganasan Ami Atayal -0.0013 -0.541 14392 14429 329241
result: Gorilla Nganasan Dai Ami 0.0022 1.182 14761 14695 329241
result: Gorilla Onge Dai Atayal -0.0039 -1.689 14459 14572 329241
result: Gorilla Onge Dai Ami -0.0028 -1.503 14453 14535 329241

Davidski said...

@ bbb

You are probably already well aware of this, but it hasn't been mentioned here. The early Mehrgarh folk were largely Sundadonts, which pretty much necessitates ancestry from SE Asia (or Dai-like).

I have mentioned this in the past in other comment threads. Here is a paper on the topic.

FrankN said...

@ Capra: "There was certainly a late Neolithic migration (or multiple waves of migration) from Southern China/Southeast Asia into India (c. 2000 BC?), bringing Austroasiatic languages .."

The Austroasiatic (Munda) is doubtless, but should AFAIK have come from somewhere more south than Yunnan. The issue of the Tai-Kadai homeland appears to be nearly as intensively debated as the IE homeland - as such I refrain from any opinion whether during the Late Neolithic the Dai already lived where there are recorded today, or much further to the South Chinese coast. In any case, they don't appear to be a particular good proxy for a "pure" population.

Here is another Sino-Indian link that (for climatic/ ecological reasons) possibly went via Yunnan:

"Recent archaeological discoveries in Harappa and Chanhu-daro suggest that sericulture, employing wild silk threads from native silkworm species, existed in South Asia during the time of the Indus Valley Civilization dating between 2450 BC and 2000 BC, while evidence for silk production in China back to around 2570 BC and earlier.[4][5] The Indus silks were obtained from more than one species Antheraea and Philosamia (Eri silk). Antheraea assamensis and A. mylitta were widely used. It is widely believed that silk process techniques of degumming and reeling were purely Chinese technology."

Hence, I tend to stick to my "first half of the 3rd millenium" dating. 2000 BC seems to be slightly too young for the move into India, though possibly correct for a "tin explorer and bronze producer" India-to-SW China/ SEA scenario.

Chad Rohlfsen said...

Honestly, It kind of looks like 2-3 clades of ENA (Australoid/Papuan, Onge, Atayal/Ami), with the Atayal/Ami branch having West Eurasian closer to EHG/MA1 than Loschbour.

Chad Rohlfsen said...

Nevermind. Maybe, just two branches. I've got the Papuans as follows..

Onge 95.9%
Denisovan 4.1%

chisq 2.467 tail prob .781509

standard errors were both 0.8%. The fit is worse with Atayal included and standard errors over 30%.

Chad Rohlfsen said...

Oddly though....

Onge 91%
Denisovan 4%
MA1 5%

chisq 1.810 tail prob .770567

slightly better fit, but with std erros between 09-12.3%

FrankN said...

@Chad: " I've got the Papuans as follows..

Onge 95.9%
Denisovan 4.1%"

Interesting, and confirming something I have already been supposing for some time. The possible links may be the "Sea Nomads", today scattered in three groups (Andaman Sea, Southern Sumatra, Borneo/ Sulawesi/ Southern Phillipines), but possibly a far more widespread phenomenon in ancient times (I wonder if there ever has been done DNA analysis on them..)

Why an "anciently more widespread phenomenon"? For a start, Sulawesi/E. Borneo, i.e. today's epicentre of the Sama Bajau, corresponds to the genetic and linguistic "homeland" of the Malagassy people on Madagascar.
Nearby Helmahera, the largest island of the Moluccas, has been demonstrated as genetic origin of the Polynesian rat, and is as as such believed to be the origin of the Lapita expansion (from 3.000 BC) into Melanesia, Samoa and Tonga. There is evidence for Obsidian trade from New Britain to NE Borneo at the end of the 4th mill. BC, a distance of 3,500 km!

In addition, domesticated coconut from the South Phillipines was around 300 BC shipped to Southern Ecuador:

Last but not least, there is the story of the banana: Originally domesticated on New Guinea, with additional hybridisation in the Southern Phillipines and a second one somewhere around the South Chinese Sea, all before 3.000 BC. From 2.000 BC, there is archeological evidence of bananas in Pakistan. Most banana terms on the Indian subcontinent can be traced back to *qaRutay, a root that developed in the Northern Phillipines. Reflexes of this root are both present on Northern Sumatra and the Nicobares, and along a 'land route' through North Vietnam, Yunnan, Burma and Northern Bangladesh, which makes it difficult to define the migration path. By about the same time at latest, Papuan/East Indonesian domesticates reached East Africa (East African 'banana'-terms are pre-Bantu substrate, which provides a terminus ante quem). A separate transfer brought bananas from around the Celebes Sea directly, i.e. without the genotypes in question being found in India, Arabia or East Africa, to West Africa, with the first archeological evidence (Cameroon) dating to around 500 BC.

English 'banana' is believed to have been borrowed from Wolof 'banaana', which may be a reflex of the root 'punti that is widespread around Eastern Indonesia, and also spread eastward into Melanesia. The origin of span. 'platano' is somewhat obscure: It is assumed to have been borrowed from a Carib language, which, however, would imply pre-columbian presence of bananas in the Caribbean.

In short: A maritime network centered around the Celebes Sea, today's home of the Sama Bajau "sea-nomads", appears to have existed at least from 3.000 BC onwards. Around 300 BC, this network stretched from Cameroon to Ecuador - probably sporadically, but intensive enough to transplant bananas and coconuts, and allow for colonisation of Madagascar and Polynesia. The Andamans (plus Sri Lanka, Maledives etc.) would have been apt stopovers.
If anybody gets bored over long winter nights and feels like running some admix statistics along the a/m routes - say Buginese (Sulawesi) vs. Onge, Mbum (Cameroon), Amerindians from Ecuador - I'd be curious about the results.

capra internetensis said...


We are doing f4 ratio estimates for the connection between Dai and EHG, so far with no success.


I am somewhat skeptical of the value of dental morphology in tracing long-term genetic relationships (as opposed to the appearance of a novel populations, etc). Sundadontry in particular seems to be a relatively generic pattern, possibly close to the ancestral form; e.g. Africans and some mixed populations (like South Siberians) cluster near to Sundadonts.


I expect it was complicated, as usual. The questions of Daic, Tibeto-Burman, and Austroasiatic homelands have already seen some genetic study at relatively low resolution, but the proliferation of full sequences and the likelihood of more ancient DNA from China are very promising.

Nirjhar007 said...

As i Wait for the Indian DNA to washout some of the bullshit here, here something related to the anthropology,
There appears to be two types of the hunters in the Holocene. The first type clustered strongly with upper paleolithic Europeans and was concentrated in the Ganges plane/further west, some even find that these north Indians were taller than other Mesolithic populations of Eastern and Western Europe!. The second type of hunter contrasted with the Ganges type and was concentrated in the South. A good hypothesis is that the Ganges type was perhaps related to ANE and and the southern type was related to South Eurasian (ASI).
The Harappans were largely Caucasoid same to the Modern North Indian Populations around Hariyana etc.
I think we just have to wait for the DNA to solve the riddle.

Nirjhar007 said...

^ Make it ANI instead of ANE.

ryukendo kendow said...
postneo said...

yes EDAR like hair is insignificant except in the north east,
The dominant hair phenotype either resembles that of australian aborigines or veddas.

Surprisingly similar hair traits are not seen in intervening papuans, fijians, or SE asians.
body hair is also higher

Tobus said...

I don't have Pulliya or Paniya (unless they're also known by different names). Do you know which data set they're from? If I'm going to rebuild with CHG I might as well add these pops too.

Karl_K said...

"EDAR mutation for thick straight hair, which swept to fixation in East Asians and Native Americans 10ky ago,"

It was surely long before 10ky ago, as it must have been near fixation for both the founding Native American populations and Chinese Neolithic populations.

Was there any data on Anzick-1 or Kennewick man on the EDAR variant? I assume MA1did not carry the 370A variant, but is that actually the case?

ryukendo kendow said...

@ Davidski

David, are you using the Behar et al dataset Paniya?

@ Tobus

If he is here they are. This set is a gold mine.

ryukendo kendow said...
Ebizur said...

Ryukendo wrote,

"...all populations carrying East Asian post-neolithic ancestry, incl Southeast Asians, Polynesians and Indians such as the Austroasiatics, have high levels of EDAR, while all ENA populations without East Asian post-Neolithic ancestry, such as Papuans and Onge, do not."

Chaubey et al. (2011) have published the following figures for the frequency of the 1540C allele of the EDAR gene in their Indian samples grouped by language family:

Language group n 1540C

Tibeto-Burman 57 0.61
Austroasiatic (Khasi-Aslian) 20 0.40
Austroasiatic (Munda) 379 0.05
Indo-European 338 0.01
Dravidian 283 0.00

61% in Tibeto-Burmans (but with only 57 samples), 40% in Khasis (but with only 20 samples), 5% in Mundas, 1% in Aryans, 0% in Dravidians.

The frequency of EDAR 1540C does appear to be moderately high in the Khasis (though not nearly fixed as it is in e.g. Native Americans, northern Han Chinese, or Koreans), but it is actually quite low in Kolarian populations of India. Rather than saying that "Indians such as the Austroasiatics...have high levels of EDAR," I think it would be prudent to say that Munda-speaking populations exhibit non-zero frequencies of the EDAR 1540C allele.

Alberto said...


I would think the -ve sign is irrelevant there, no? I mean, it's just because Chad changed the order of the Onge and Gorilla, so both results are negative. And -/- = + (the results would be -ve only if one was -ve and the other +ve).


But it doesn't seem very clear that Dai and Onge form any kind of tight clade. The 6% Dai in Karelia_HG is probably more from a Han-like source from Siberia, so not related to Onge.


Yes, I'm not saying that ASI actually didn't exist and South India long before as a specific component. But with the samples we have now and the test run so far, it doesn't seem to show specifically different pattern/signs from SE Asian. Maybe it's just that we don't ave the right samples to see it, but it looks strange that Paniya is no more Onge-like vs. Atayal-like than Dai is. Some kind of homogenization seems to have taken place, even if it didn't bring AA language, Y hg O, EDAR or straight hair to Paniya (or the opposite to Dai). But let's see further tests if they can actually find differences in the components or not.

Sisophon said...

Razib shared data which included Paniya samples a few years ago, but I recall that some of the samples looked like they were mislabeled. Or there are two unrelated populations of Paniya? Anyway, when you get the Paniya samples, please check that you have a single population before analyzing them as a meaningless mixture.

In the samples I have, GSM536916 is not the same population as GSM536806, GSM536807 and GSM536808 but they are all labeled Paniya.

And it is possible that I made some copy and past mistake when I was first learning how to work with the data following Razib's tutorials. I am not an expert in this.


Davidski said...

The Paniya I have are all pretty much the same.

Davidski said...

Open Genomes,

Top 9 eigenvectors for Clovis, Kennewick and Saqqaq.

The latter two, however, were missing quite a few markers in this analysis. So they're a bit iffy.

Btw, I don't have the Asian populations you specified.

capra internetensis said...


result: Gorilla Onge Loschbour Karelia_HG -0.0035 -0.526
result: Gorilla Onge Dai Loschbour -0.0593 -12.307

The position of Loschbour is switched, so the negative result of the numerator is really positive, and overall it is negative.

Since Loschbour is insignificantly closer to Onge than Karelia_HG is, using these references Karelia_HG appears to have no East Eurasian ancestry at all.

Alberto said...


Ah, you are right. I thought only Onge and Gorilla were switched in both stats, but Loschbour is also switched in the first one, but not in the second one.

I think that stat would work with Samara_HG, but not with Karelia_HG. They might have quite different affinity to Onge.

Simon_W said...

@ Open Genomes

Interesting work your 3D PCA, thanks for sharing. But I've made some observations that seem kind of odd:
- The CHG are in no way outliers but cluster closely with some modern people from SE Europe, West Asian and the Caucasus.
- The BA Armenians plot very far from each other. One is like a true outlying pole of genetic variation, much more than the CHG, while another one plots close to central European Bell Beakers, Sintashta, Swedish Battle Axe and modern Latvians! I didn't see anything that extremely northern or divergent among the BA Armenians in previous analyses. So in this PCA the IA Armenian can be modeled as a mix of different BA Armenians.
- A modern Makrani plots close to Estonians, Loschbour and Bichon. That's too odd to be true.
- The Andronovo people are extremely diverse. Some close to Corded Ware, another one far off in the Aleut area. I think that may be the admixed one, so this observation is less odd than the others.

Kristiina said...

Open Genomes, were you able to add Clovis, Kennewick and Saqqaq to your PC analysis or to your 3D model? I find your model very illustrative and advanced!