Tuesday, July 11, 2017

Working topology for Eurasian population structure

Here's my new "basic" qpGraph topology that I'll be using to test phylogenetic and mixture models for Eurasians. I think it reconciles a few key findings from recent scientific literature. Please note that since my main interest is post-Neolithic prehistory of West Eurasia, and in particular the early Indo-European expansions, I don't want to make this model unnecessarily complex by adding "dead end" Upper Paleolithic genomes.

But I welcome ideas on how to improve and make use of this topology, so if, say, adding Ust_Ishim helps, then let's do it. The ancient samples featured in the above graph are listed here and the graph file is available here. Feel free to post your own versions of the graph file in the comments and I'll run them as soon as possible. But please remember to label the samples correctly at all times.

Update 13/07/2017: Thanks to Matt in the comments, here's a neater version of the same model, with a lower (highest) Z score and slightly different mixture coefficients. It includes a couple of zero edges, which are generally undesirable, but these might disappear when more populations are added to the topology. The graph file is available here.


  1. Is ANE part Eastern Eurasian?

  2. Wow super interesting and it makes a lot sense. The deep connections AnaoliaN has with IranNeo and CHG is especially interesting. There's should be no doubt AnatoliaN has non-Basal Eurasian common ancestors with IranNeo and CHG. There's no other way to explain the mtDNA links between IranNeo and AnatoliaN.

  3. I think it's been pretty obvious that ANE is part East Asian based on the Y chromosomes. Haplogroup P seems to be either South Asian or Southeast Asian, and K2 is about as firmly rooted in East Asia as one could ever ask.

    Have you tried modelling WHG as part East Asian rather than Han as part WHG though? That's along what Fu et al suggested and I'd be curious to see if that works too. I wouldn't be surprised if the truth is actually more complex though.

    I got further results back on my grandfather (the one with excess Caucasian ancestry) and his Y chromosome. He is R1a-Y2905, and as far as I can tell it is basal R1a-Y2905. It looks like Y2905 is poorly understood though, and is present from Poland to Finland to Khazkstan. Polish Jews seem to have it at a reasonable frequency. Is there anything else you know about it David?

    His one and only Y DNA match is from a mayor of a town in Tartarstan in Russia. I think he has an Avar name but I'd be lying if I said I am confident in my ability to tell the difference between an Avar name and a Tatar one.

    Any thoughts? Oh, the ancestor in question was born in Austria (now the Czech Republic) the year after the Russian conquest of the Caucasus.


  4. And pardon the incoherent spelling. I'm on my phone at my construction site waiting for a meeting :/

  5. Davidski, If you put Native Americans (i.e. karitiana) into this tree's topology, are they still modeled as being ~ 40% ANE / 60% East Asian or is a revised model produced.

  6. B stands for Basal Eurasian? There was 2 closely related basal Eurasian population?

    Also, what's the A population? Crown Eurasian? Thanks.

  7. But ANE didn't show a clear signal of East-Asians, which was the discovery in the Mal'ta paper and which led that paper to conclude American Indians were a mixture. How can that be with such a large chunk of Han related affinity? Even with the rather large drifts (203 and 417) separating MA1 and proto-Han.

  8. Maybe add Kostenki 14 as it is well related to all Euro HG's and rather old?

  9. @Epoch - Here's what the Mal'ta paper said about MA-1's East Eurasian admixture:

    Putative western Eurasian ancestry in Native Americans does not exclude the
    possibility of some gene flow into the MA-1 lineage from eastern Eurasian
    populations. To test this, we compared f3(Yoruba; MA-1, X) and f3(Yoruba; Sardinian,
    X), where Population X represented one population from a set of worldwide
    populations. Under a model where MA-1 is from the same lineage as the Sardinians,
    the ratio of these two statistics for unrelated populations is expected to be 1.0, but
    East Asians and Oceanians were both observed as being closer to MA-1 than to the
    Sardinian (Figure SI 27). Since the ratio is ~1.09 for both Oceanians and East Asians
    in the outgroup-ascertained data, the MA-1 lineage may have absorbed some ancestry
    from populations ancestral to these groups. Indeed, this is also consistent with the
    admixture graphs inferred using MixMapper (SI 12). However, a note of caution here
    is that the MA-1 data is of lower quality than the Sardinian data.

    Page 91: https://www.nature.com/article-assets/npg/nature/journal/v505/n7481/extref/nature12736-s1.pdf

    If you look at their treemix runs there's a pretty large residual between Han/Dai and MA-1 too. Also a large residual between Denisova and MA-1 interestingly enough.

  10. Have to admit, not sure if I literally believe it (35% of post-Basal Eurasian, pre-ENA+ANE divergence into WHG?), and I am not sure if it would work to fit a lot of extra UP samples / the Levant samples, but if it works for modeling admixture between the terminal clades for Anatolia, CHG, WHG, EHG, that's pretty cool.

    Just to see if these work, or if they get "looping":

    Adding Yamnaya and Iberia_CA to the graph: https://pastebin.com/7E9jZmBN

    Adding Corded_Ware and Bell_Beaker to the graph: https://pastebin.com/TD4pcgeF

    Adding Lithuanian and Sardinian to the graph: https://pastebin.com/2fYb4Yax

  11. That C2 > pANE edge isn't really 36% because first there's an 8% West Eurasian edge from D2 to C1.

  12. It's 33% when you account for that.

    I'm not sure why folks wouldn't believe it though. Y-chromosome aside, there's also EDAR showing up in ANE-admixed genomes in Europe.

  13. Does this work, with lots of the edges from the base A, D and C groups fleshed out into separate descendants: https://pastebin.com/gAuJEUNx

  14. Adding Ust_Ishim will really change it. As well as a UP Euro. I'll put another up soon.

  15. @Matt

    Your last model has too many edges at A.

    Btw, in regards to the first model you posted, it seems like it's stuck. But as far as I can tell, you're trying to put CHG/EHG into Iberia_Chalcolithic. Isn't that unnecessarily complex, considering that there are no signals of CHG/EHG in Iberia_Chalcolithic in any other analyses?

  16. @Ryan

    I don't have any insights about your seemingly exotic ancestor from the information you're giving me. It just seems that he was somewhat exotic compared to the average North/Central European.

    1. I meant more about the haplogeoup lol but no worries. :3

  17. What is that 35% in Anatolia_N, is it UHG?

  18. @Davidski: Your last model has too many edges at A.

    Amended to: https://pastebin.com/sizs8Hxt

    Isn't that unnecessarily complex, considering that there are no signals of CHG/EHG in Iberia_Chalcolithic in any other analyses?

    Yeah, it's based on the same model as the Sardinian/Lithuanian and Bell_Beaker/Corded_Ware graph. I just substituted in the labels rather than taking any time to modify the topology on the thinking that it would just produce a 0% edge from a CHG/EHG into Iberia_Chalcolithic. I guess the other two don't work at all either?

  19. @Matt

    Yeah, this is neater and with a much lower Z. Nice quick run too.


    This might work as a basic topology. If not, it might have to be downsized accordingly for each test. The others were just hanging for ages.

  20. Could you include Iran_ChL and Levant_N into your tree?

  21. It will be interesting to see how Z-score will change after placing a) Kostenki as a sister node of D1 and D1a (under D parent node) or b) placing Uts-Isihm as a sister clade of D1( or may be even D).

  22. I'm not sure if D2 exists. As in, into East Asians. No UP Euro will do that. GoyetQ is closer to Ust-Ishim with many outgroups, and also closer to East Asians than other UP Euros. Still, GoyetQ is no closer to East Asians than Ust-Ishim. Flow seems just East to West.

  23. @Vadim

    This is the same with Levant_N added. It's just a more basal sister clade of Barcin.


    Not sure yet how to model Iran_ChL exactly. Have to think about it.

  24. RE: Matt's last tree that Dave put up (https://drive.google.com/file/d/0B8XSV9HEoqpFLUNfT2ZWLTFDdVU/view?usp=sharing)

    1) It makes sense for ANF to have both an UHG and actual WHG admixture.

    2) Also in that tree (reading it literally, which undoubtedly oversimplifies), the 'UHG" - here 'D1a1'- in ANF is the same as in Iran Neol. and the CHG foragers.

    The sister clade of D1a1 (D1a2), represents the 'west Eurasian' component in ANE (Mal'ta, AFG), the other being ~ENA (30%).

    3) The parallel clade of D1a is that which made is as Villabruna.

    With regard to point 2, iM not sure if it was 100 % clear after the Fu paper on Ice Age Europe in the exact relations and affinities which mixed into CHG, Iran Neolithic, Levant Neolithic, etc; how good a it is Matt's tree ?

    One last thing, can we 'throw in' as an extra level onto that tree, a couple of modern population from Europe to see how they are shaped as a function of these 5 or so fundamental late Upper Palaeolithic groups ? (say, Danish, Greek, Spanish, Finnish).

  25. I'm not sure if it's possible to add another level to this topology. I can't even add Iran_ChL right now. The run just hangs.

    It might be necessary to strip parts of the tree when testing different models with new populations.

  26. On the Iberian Chalcolithic front, I think it'd also be interesting (at some point) to try to verify that claim of a Bell Beaker sample that has EHG but not CHG (and include Yamnaya in the graph too ideally).

  27. Does any of the nodes have any semblance to the 'Basal-rich component'?

  28. Does the D3 node represent Vestonice?

  29. @Simon - B is Basal Eurasian.

  30. @ Davidski, cool that that model worked and lowered the stats.

    There were a few arbitrary choices I made there in how to adapt the model while moving the edges away from the base to separate nodes, and a few questions I have about that seeing the final model*.

    Here's one where I've kind of tried to make some opposite choices to the previous one and restructured around EHG, just to see if it works better or worse or what: https://pastebin.com/nxfJNdqw

    (I suspect it will work worse for reasons I don't fully understand!)

    *I guess the big question is around pEHG in this model; it doesn't seem to cohere very well to a ANE+WHG type model in the normal sense. D2 in the topology I wrote up has some similarities to WHG, but is really like WHG without majority Paleolithic European (A2), and E isn't really so much like classical ANE as a mix between a Near Eastern population, only without Basal Eurasian, and East Asian... The above has an EHG and ANE model that's more like the "norm" as I understand it... Whether or not it will work better or worse.

  31. @Ryan - can you tell the name of this guy your Grandfather has a match with?
    Couldn't he simply have some Jewish ancestry (Grandfather I mean)

  32. @Tesmos

    Hugely offtopic, but I'd think you'd like to know. I emailed the archaeologists of the Dalfsen Trechterbeker-graveyard which reputedly yielded some bone and tooth material that was investigated for DNA. Turned out it was too degraded. No DNA was retrieved.

  33. @Matt


    Only one outlier over 2.


  34. @ Davidski, cheers. Looks like the only difference in fit between the two is that the earlier model resolves relatedness between Han, Andamanese, MA1 and other West Eurasians slightly better. There are 5 stats beyond the highest of the earlier model, and they all involves at least 2, usually 3 of those populations.

    Largely it seems that MA1 is slightly less related to Han than this model supposes, and that may be why the other model had a slightly improved fit. In that earlier model more of ANE's ancestry came from the Near Eastern HG (branches of the D1a node), rather than the D2 node, contra EHG which took more as a direct donation from D2. So that seems to have reduced relatedness between Han and ANE and increase relatedness between Iran / Barcin and MA1 in useful ways.

    (The shuffling of the A2 node and shuffling of how CHG+Barcin share ancestry doesn't seem to have made much difference between the two!).

  35. Nice work Davidski, so looking at the graph I see ANE in a complete different timeline and branch from CHG and IranN, does this mean they dont harbour ANE afterall?

  36. I see ANE in a complete different timeline and branch from CHG and IranN, does this mean they dont harbour ANE afterall?

    They do harbor ancestry closely related to ANE (via node D), except that they don't have the East Eurasian input that ANE has, or at least not much of it.

    So it's not surprising that in other analyses both CHG and Iran_N look significantly ANE.

    It seems like all of the non-Basal West Eurasian ancestry comes from the same ANE-related source, except proto-ANE picked up something eastern in Siberia or Central Asia, while proto-WHG picked up something old from Upper Paleolithic Europeans, which isn't as divergent from East Eurasians as the main West Eurasian clade.

  37. Thank you Davidski, but that same node contributes 36% to Anatolian neolithic but they were never shown to have any ANE like ancestry through various methods, correct me if I'm wrong please

  38. David,

    I see something completely different in my runs. I'll share that as soon as I can.

  39. @saman sistani

    It's all relative, especially here since these are very closely related populations and recent drift can really mask ancient phylogenetic relationships.

    Lazaridis et al. 2016 modeled Anatolia_N as something like 40% Iran_N, while I found CHG-related and all sorts of eastern signals in it.


    Bottom line is that at this sort of fine scale there's an infinite way to model these relationships, so saying a particular population lacks ANE is only true in a very specific context.

  40. I got you, so basically that ~40% IranN/CHG in AnatolianN that you and Lazaridis observed is being interpreted by treegraph in that run as note D. Are you able to run Natufian in there?

  41. Natufian looks like Barcin_N in the latest tree, just more basal.


    So this is basically what my PCA show. But it might be possible to show other types of influence in the Natufians by adding an extra edge from somewhere, and thus lowering the Z score. But I haven't tried that yet.

  42. @ Davidski:

    Just dropping the GoyetQ116-1 and Ust_Ishim into the models, to test if this really makes the Z blow up or not:

    1: https://pastebin.com/NJCmcWRb
    2: https://pastebin.com/RjaCg7Rm


  43. @epoch2013,

    Man, that sucks. Generally, The Netherlands is a bad place for ancientDNA so it's not suprising.

  44. @Matt

    First run...



    Second run...

    > 2 edges at A2a

  45. @Tesmos

    Especially since they found a Single Grave Culture (=CWC) battle axe in the field! Out of any context because found underneath a medieval ploughed field.

  46. @George - Ayrat Khayrullin is the match. He shares a name with the (and I'm inclined to think is the same person as) the mayor of Almetyevsk in Tatarstan. I can't recall his most distant paternal ancestor's name off hand - I'll have to look that up and post it in a later thread.

    Jewish is possible, but not Ashkenazi I think. Too Mediterranean. Crimean Karaites or Krymchaks might fit.

    The identity and ethnicity of this mystery fellow wasn't ever recorded anywhere, but apparently it was known in the community, and my great-grandfather being 1/4 of whatever this group was made the Nazis consider him less than fully Aryan. The Nazis weren't exactly picky in deciding which ethnicities to dislike though - my grand grandmother was in a similar situation for being 1/4 Czech.

  47. @George - Logged into FTDNA. Best match (and only match really) is the fellow mentioned above. Distance of 2 at 25 markers. Most distant male ancestor is:

    Габдерахимов Габдельвахит 1840-1920 д.Каркали Tatar

    Any insights appreciated.

  48. @ Davidski: -3.70, not as a bad a worse Z as I thought it could be.

    Take two on the second run: https://pastebin.com/fDEwwa99

  49. That one looks way off. Z 9.683

  50. @David

    Bit offtopic, but could you do:

    Han, Kostenki14, Yamnaya, Mbuti
    Han, GoyetQ116-1, Yamnaya, Mbuti
    Han, El_Miron, Yamnaya, Mbuti
    Han, Vestonice16, Yamnaya, Mbuti
    Han, Villabruna, Yamnaya, Mbuti
    Han, MA1, Yamnaya, Mbuti
    Han, Bichon, Yamnaya, Mbuti
    Han, Loschbour, Yamnaya, Mbuti
    Han, LaBrana1, Yamnaya, Mbuti
    Han, Hungary_HG, Yamnaya, Mbuti

    Also, you had a list of proper names for samples. I forgot where to find that.


  51. Interesting model with Oase1 as a very early West Eurasian.



Read the rules before posting.

Comments by people with the nick "Unknown" are no longer allowed.

See also...

New rules for comments

Banned commentators list