Tuesday, July 11, 2017

Working topology for Eurasian population structure

Here's my new "basic" qpGraph topology that I'll be using to test phylogenetic and mixture models for Eurasians. I think it reconciles a few key findings from recent scientific literature. Please note that since my main interest is post-Neolithic prehistory of West Eurasia, and in particular the early Indo-European expansions, I don't want to make this model unnecessarily complex by adding "dead end" Upper Paleolithic genomes.

But I welcome ideas on how to improve and make use of this topology, so if, say, adding Ust_Ishim helps, then let's do it. The ancient samples featured in the above graph are listed here and the graph file is available here. Feel free to post your own versions of the graph file in the comments and I'll run them as soon as possible. But please remember to label the samples correctly at all times.

Update 13/07/2017: Thanks to Matt in the comments, here's a neater version of the same model, with a lower (highest) Z score and slightly different mixture coefficients. It includes a couple of zero edges, which are generally undesirable, but these might disappear when more populations are added to the topology. The graph file is available here.


Tesmos said...

Is ANE part Eastern Eurasian?

Davidski said...

Looks like it.

Samuel Andrews said...

Wow super interesting and it makes a lot sense. The deep connections AnaoliaN has with IranNeo and CHG is especially interesting. There's should be no doubt AnatoliaN has non-Basal Eurasian common ancestors with IranNeo and CHG. There's no other way to explain the mtDNA links between IranNeo and AnatoliaN.

Palacista said...

What in words is "D" please.

Davidski said...

West Eurasian.

Ryan said...

I think it's been pretty obvious that ANE is part East Asian based on the Y chromosomes. Haplogroup P seems to be either South Asian or Southeast Asian, and K2 is about as firmly rooted in East Asia as one could ever ask.

Have you tried modelling WHG as part East Asian rather than Han as part WHG though? That's along what Fu et al suggested and I'd be curious to see if that works too. I wouldn't be surprised if the truth is actually more complex though.

I got further results back on my grandfather (the one with excess Caucasian ancestry) and his Y chromosome. He is R1a-Y2905, and as far as I can tell it is basal R1a-Y2905. It looks like Y2905 is poorly understood though, and is present from Poland to Finland to Khazkstan. Polish Jews seem to have it at a reasonable frequency. Is there anything else you know about it David?

His one and only Y DNA match is from a mayor of a town in Tartarstan in Russia. I think he has an Avar name but I'd be lying if I said I am confident in my ability to tell the difference between an Avar name and a Tatar one.

Any thoughts? Oh, the ancestor in question was born in Austria (now the Czech Republic) the year after the Russian conquest of the Caucasus.


Ryan said...

And pardon the incoherent spelling. I'm on my phone at my construction site waiting for a meeting :/

VOX said...

Davidski, If you put Native Americans (i.e. karitiana) into this tree's topology, are they still modeled as being ~ 40% ANE / 60% East Asian or is a revised model produced.

For the king said...

B stands for Basal Eurasian? There was 2 closely related basal Eurasian population?

Also, what's the A population? Crown Eurasian? Thanks.

epoch2013 said...

But ANE didn't show a clear signal of East-Asians, which was the discovery in the Mal'ta paper and which led that paper to conclude American Indians were a mixture. How can that be with such a large chunk of Han related affinity? Even with the rather large drifts (203 and 417) separating MA1 and proto-Han.

epoch2013 said...

Maybe add Kostenki 14 as it is well related to all Euro HG's and rather old?

Ryan said...

@Epoch - Here's what the Mal'ta paper said about MA-1's East Eurasian admixture:

Putative western Eurasian ancestry in Native Americans does not exclude the
possibility of some gene flow into the MA-1 lineage from eastern Eurasian
populations. To test this, we compared f3(Yoruba; MA-1, X) and f3(Yoruba; Sardinian,
X), where Population X represented one population from a set of worldwide
populations. Under a model where MA-1 is from the same lineage as the Sardinians,
the ratio of these two statistics for unrelated populations is expected to be 1.0, but
East Asians and Oceanians were both observed as being closer to MA-1 than to the
Sardinian (Figure SI 27). Since the ratio is ~1.09 for both Oceanians and East Asians
in the outgroup-ascertained data, the MA-1 lineage may have absorbed some ancestry
from populations ancestral to these groups. Indeed, this is also consistent with the
admixture graphs inferred using MixMapper (SI 12). However, a note of caution here
is that the MA-1 data is of lower quality than the Sardinian data.

Page 91:

If you look at their treemix runs there's a pretty large residual between Han/Dai and MA-1 too. Also a large residual between Denisova and MA-1 interestingly enough.

Matt said...

Have to admit, not sure if I literally believe it (35% of post-Basal Eurasian, pre-ENA+ANE divergence into WHG?), and I am not sure if it would work to fit a lot of extra UP samples / the Levant samples, but if it works for modeling admixture between the terminal clades for Anatolia, CHG, WHG, EHG, that's pretty cool.

Just to see if these work, or if they get "looping":

Adding Yamnaya and Iberia_CA to the graph:

Adding Corded_Ware and Bell_Beaker to the graph:

Adding Lithuanian and Sardinian to the graph:

Davidski said...

That C2 > pANE edge isn't really 36% because first there's an 8% West Eurasian edge from D2 to C1.

Ryan said...

It's 33% when you account for that.

I'm not sure why folks wouldn't believe it though. Y-chromosome aside, there's also EDAR showing up in ANE-admixed genomes in Europe.

Matt said...

Does this work, with lots of the edges from the base A, D and C groups fleshed out into separate descendants:

Chad Rohlfsen said...

Adding Ust_Ishim will really change it. As well as a UP Euro. I'll put another up soon.

Davidski said...


Your last model has too many edges at A.

Btw, in regards to the first model you posted, it seems like it's stuck. But as far as I can tell, you're trying to put CHG/EHG into Iberia_Chalcolithic. Isn't that unnecessarily complex, considering that there are no signals of CHG/EHG in Iberia_Chalcolithic in any other analyses?

Davidski said...


I don't have any insights about your seemingly exotic ancestor from the information you're giving me. It just seems that he was somewhat exotic compared to the average North/Central European.

Ryan said...

I meant more about the haplogeoup lol but no worries. :3

MaxT said...

What is that 35% in Anatolia_N, is it UHG?

Matt said...

@Davidski: Your last model has too many edges at A.

Amended to:

Isn't that unnecessarily complex, considering that there are no signals of CHG/EHG in Iberia_Chalcolithic in any other analyses?

Yeah, it's based on the same model as the Sardinian/Lithuanian and Bell_Beaker/Corded_Ware graph. I just substituted in the labels rather than taking any time to modify the topology on the thinking that it would just produce a 0% edge from a CHG/EHG into Iberia_Chalcolithic. I guess the other two don't work at all either?

Davidski said...


Yeah, this is neater and with a much lower Z. Nice quick run too.

This might work as a basic topology. If not, it might have to be downsized accordingly for each test. The others were just hanging for ages.

Project "Magnus Ducatus Lituaniae" said...

Could you include Iran_ChL and Levant_N into your tree?

Project "Magnus Ducatus Lituaniae" said...

It will be interesting to see how Z-score will change after placing a) Kostenki as a sister node of D1 and D1a (under D parent node) or b) placing Uts-Isihm as a sister clade of D1( or may be even D).

Chad Rohlfsen said...

I'm not sure if D2 exists. As in, into East Asians. No UP Euro will do that. GoyetQ is closer to Ust-Ishim with many outgroups, and also closer to East Asians than other UP Euros. Still, GoyetQ is no closer to East Asians than Ust-Ishim. Flow seems just East to West.

Davidski said...


This is the same with Levant_N added. It's just a more basal sister clade of Barcin.

Not sure yet how to model Iran_ChL exactly. Have to think about it.

Rob said...

RE: Matt's last tree that Dave put up (

1) It makes sense for ANF to have both an UHG and actual WHG admixture.

2) Also in that tree (reading it literally, which undoubtedly oversimplifies), the 'UHG" - here 'D1a1'- in ANF is the same as in Iran Neol. and the CHG foragers.

The sister clade of D1a1 (D1a2), represents the 'west Eurasian' component in ANE (Mal'ta, AFG), the other being ~ENA (30%).

3) The parallel clade of D1a is that which made is as Villabruna.

With regard to point 2, iM not sure if it was 100 % clear after the Fu paper on Ice Age Europe in the exact relations and affinities which mixed into CHG, Iran Neolithic, Levant Neolithic, etc; how good a it is Matt's tree ?

One last thing, can we 'throw in' as an extra level onto that tree, a couple of modern population from Europe to see how they are shaped as a function of these 5 or so fundamental late Upper Palaeolithic groups ? (say, Danish, Greek, Spanish, Finnish).

Davidski said...

I'm not sure if it's possible to add another level to this topology. I can't even add Iran_ChL right now. The run just hangs.

It might be necessary to strip parts of the tree when testing different models with new populations.

Ryan said...

On the Iberian Chalcolithic front, I think it'd also be interesting (at some point) to try to verify that claim of a Bell Beaker sample that has EHG but not CHG (and include Yamnaya in the graph too ideally).

Simon said...

Does any of the nodes have any semblance to the 'Basal-rich component'?

Josep Coderch said...

Does the D3 node represent Vestonice?

Ryan said...

@Simon - B is Basal Eurasian.

Matt said...

@ Davidski, cool that that model worked and lowered the stats.

There were a few arbitrary choices I made there in how to adapt the model while moving the edges away from the base to separate nodes, and a few questions I have about that seeing the final model*.

Here's one where I've kind of tried to make some opposite choices to the previous one and restructured around EHG, just to see if it works better or worse or what:

(I suspect it will work worse for reasons I don't fully understand!)

*I guess the big question is around pEHG in this model; it doesn't seem to cohere very well to a ANE+WHG type model in the normal sense. D2 in the topology I wrote up has some similarities to WHG, but is really like WHG without majority Paleolithic European (A2), and E isn't really so much like classical ANE as a mix between a Near Eastern population, only without Basal Eurasian, and East Asian... The above has an EHG and ANE model that's more like the "norm" as I understand it... Whether or not it will work better or worse.

George Okromchedlishvili said...

@Ryan - can you tell the name of this guy your Grandfather has a match with?
Couldn't he simply have some Jewish ancestry (Grandfather I mean)

epoch2013 said...


Hugely offtopic, but I'd think you'd like to know. I emailed the archaeologists of the Dalfsen Trechterbeker-graveyard which reputedly yielded some bone and tooth material that was investigated for DNA. Turned out it was too degraded. No DNA was retrieved.

Davidski said...


Only one outlier over 2.

Matt said...

@ Davidski, cheers. Looks like the only difference in fit between the two is that the earlier model resolves relatedness between Han, Andamanese, MA1 and other West Eurasians slightly better. There are 5 stats beyond the highest of the earlier model, and they all involves at least 2, usually 3 of those populations.

Largely it seems that MA1 is slightly less related to Han than this model supposes, and that may be why the other model had a slightly improved fit. In that earlier model more of ANE's ancestry came from the Near Eastern HG (branches of the D1a node), rather than the D2 node, contra EHG which took more as a direct donation from D2. So that seems to have reduced relatedness between Han and ANE and increase relatedness between Iran / Barcin and MA1 in useful ways.

(The shuffling of the A2 node and shuffling of how CHG+Barcin share ancestry doesn't seem to have made much difference between the two!).

saman sistani said...

Nice work Davidski, so looking at the graph I see ANE in a complete different timeline and branch from CHG and IranN, does this mean they dont harbour ANE afterall?

Davidski said...

I see ANE in a complete different timeline and branch from CHG and IranN, does this mean they dont harbour ANE afterall?

They do harbor ancestry closely related to ANE (via node D), except that they don't have the East Eurasian input that ANE has, or at least not much of it.

So it's not surprising that in other analyses both CHG and Iran_N look significantly ANE.

It seems like all of the non-Basal West Eurasian ancestry comes from the same ANE-related source, except proto-ANE picked up something eastern in Siberia or Central Asia, while proto-WHG picked up something old from Upper Paleolithic Europeans, which isn't as divergent from East Eurasians as the main West Eurasian clade.

saman sistani said...

Thank you Davidski, but that same node contributes 36% to Anatolian neolithic but they were never shown to have any ANE like ancestry through various methods, correct me if I'm wrong please

Chad Rohlfsen said...


I see something completely different in my runs. I'll share that as soon as I can.

Davidski said...

@saman sistani

It's all relative, especially here since these are very closely related populations and recent drift can really mask ancient phylogenetic relationships.

Lazaridis et al. 2016 modeled Anatolia_N as something like 40% Iran_N, while I found CHG-related and all sorts of eastern signals in it.

Bottom line is that at this sort of fine scale there's an infinite way to model these relationships, so saying a particular population lacks ANE is only true in a very specific context.

saman sistani said...

I got you, so basically that ~40% IranN/CHG in AnatolianN that you and Lazaridis observed is being interpreted by treegraph in that run as note D. Are you able to run Natufian in there?

Davidski said...

Natufian looks like Barcin_N in the latest tree, just more basal.

So this is basically what my PCA show. But it might be possible to show other types of influence in the Natufians by adding an extra edge from somewhere, and thus lowering the Z score. But I haven't tried that yet.

Matt said...

@ Davidski:

Just dropping the GoyetQ116-1 and Ust_Ishim into the models, to test if this really makes the Z blow up or not:



Tesmos said...


Man, that sucks. Generally, The Netherlands is a bad place for ancientDNA so it's not suprising.

Davidski said...


First run...

Second run...

> 2 edges at A2a

epoch2013 said...


Especially since they found a Single Grave Culture (=CWC) battle axe in the field! Out of any context because found underneath a medieval ploughed field.

Ryan said...

@George - Ayrat Khayrullin is the match. He shares a name with the (and I'm inclined to think is the same person as) the mayor of Almetyevsk in Tatarstan. I can't recall his most distant paternal ancestor's name off hand - I'll have to look that up and post it in a later thread.

Jewish is possible, but not Ashkenazi I think. Too Mediterranean. Crimean Karaites or Krymchaks might fit.

The identity and ethnicity of this mystery fellow wasn't ever recorded anywhere, but apparently it was known in the community, and my great-grandfather being 1/4 of whatever this group was made the Nazis consider him less than fully Aryan. The Nazis weren't exactly picky in deciding which ethnicities to dislike though - my grand grandmother was in a similar situation for being 1/4 Czech.

Ryan said...

@George - Logged into FTDNA. Best match (and only match really) is the fellow mentioned above. Distance of 2 at 25 markers. Most distant male ancestor is:

Габдерахимов Габдельвахит 1840-1920 д.Каркали Tatar

Any insights appreciated.

Matt said...

@ Davidski: -3.70, not as a bad a worse Z as I thought it could be.

Take two on the second run:

Davidski said...

That one looks way off. Z 9.683

epoch2013 said...


Bit offtopic, but could you do:

Han, Kostenki14, Yamnaya, Mbuti
Han, GoyetQ116-1, Yamnaya, Mbuti
Han, El_Miron, Yamnaya, Mbuti
Han, Vestonice16, Yamnaya, Mbuti
Han, Villabruna, Yamnaya, Mbuti
Han, MA1, Yamnaya, Mbuti
Han, Bichon, Yamnaya, Mbuti
Han, Loschbour, Yamnaya, Mbuti
Han, LaBrana1, Yamnaya, Mbuti
Han, Hungary_HG, Yamnaya, Mbuti

Also, you had a list of proper names for samples. I forgot where to find that.