search this blog

Wednesday, July 10, 2019

Global25 workshop 4: a neighbour joining tree

Phylogenetic trees are easy to produce, but there's an infinite number of ways to run them, and, depending on the input data you're using, some methods are a lot more effective than others. In this tutorial I'm going to demonstrate one method that has worked well for me when looking at the fine scale genetic relationships between ancient and present-day human populations with my Global25 data.

To get started download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Clustering > Neighbour joining. Here's a screen cap of me doing just that...

Then, from the tabs on the right, choose Chord as the similarity index and MAR_Iberomaurusian, the most distinct unit in the datasheet, as the root. PAST offers an exceptionally large range of similarity indices and they generally produce similar results, but, in my experience, Chord creates among the most visually pleasing outcomes when dealing with fine scale genetic substructures.

This is the tree you should see after exporting the image via the graph settings tab in PAST, and, if you like, rotating it 90 degrees with an image editing software of your choice. Note the fairly substantial differences between the populations from Northwestern Europe, which are often difficult to tease apart in such analyses.

If you have your own Global25 coordinates you can add them to my PAST-compatible datasheet to see where you cluster in this tree. And, of course, you can design your own PAST-compatible datasheets and trees with any combination of populations and/or individuals from the Global25 text files at the links below. It's easy; just copy paste the coordinates of your choice into an empty text file, open it with PAST and then save it with the dat extension to create a new PAST datasheet. But make sure never to mix up the scaled and non-scaled coordinates.

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient


Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern

An important point to keep in mind when running these sorts of analyses is that PAST and other such programs need enough genetic differentiation to latch onto in order to produce meaningful results. Thus, even when studying the relationships between very closely related populations, it's not just useful to include a root population or individual, but also some near and far related groups to help the analysis algorithm flesh out the key genetic substructures.

To be honest, I don't really know whether using the Chord index and rooting the tree with MAR_Iberomaurusian is the best way to run a neighbour joining tree analysis of ancient and present-day West Eurasian genetic variation. What do you think? Feel free to let me know in the comments.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)


JuanRivera said...

A scythian paper:
Shifts in the Genetic Landscape of the Western Eurasian Steppe Associated with the Beginning and End of the Scythian Dominance - Cell Press . It may have already been covered here.

Arza said...

@ Davidski

Can you check such D-stats:

Yoruba Chern Ukrainian Swedish
Yoruba Chern Ukrainian Latvia_HG

first with all three Chern samples and then without MJ-36?

Sgt said...

Natufian (12,750 yBP) an end-branch? I1072 shows some genetic continuity to ERS1790732.SG from Sidon (3,650 yBP) and to Lebanon and the modern Levant in general.

Davidski said...


Can you check such D-stats:

Yoruba Chern Ukrainian Swedish
Yoruba Chern Ukrainian Latvia_HG

I'll run these in a day or so, but I can tell you right now that D-stats won't be able to discriminate between Ukrainians and Swedes.

Davidski said...


Natufian (12,750 yBP) an end-branch?

Yes, but this doesn't mean what you think it means. This is not a Y-chromosome mutations tree.

J.S. said...

@ Davidski
How do you explain France Brittany cluster with England-Roman?

Sgt said...

@ Davidski
No, this is supposed to be a neighbor-joining tree, but (in places) as an evolutionary scaffold it is backwards.

Davidski said...


How do you explain France Brittany cluster with England-Roman?

Aren't there some good historical reasons for that?


No, this is supposed to be a neighbor-joining tree, but (in places) as an evolutionary scaffold it is backwards.

The problem is that your assumptions about the Natufians are too basic and just wrong.

It's a fact that Natufians are outliers within West Eurasia and not significantly ancestral to other West Eurasians, even if they contributed some ancestry to other ancient and modern West Eurasians. That's why their position looks unusual in the tree.

You would need to design a different tree with ancient populations from outside of West Eurasia, like the Iberomaurusians, to correctly explain the evolutionary relationship between the Natufians and other West Eurasians.

Davidski said...


The following samples have been added to the Global25 datasheets. Same links as always.


I've also updated the neighbor joining tree in the blog post above with these samples, so the tree looks different now than it did originally.

Davidski said...

Alright, I updated the datasheet and the tree. The root population is now MAR_Iberomaurusian.

The tree does look a lot better now. LOL

Davidski said...

One of the two Chernyakhiv samples that I could put into the G25 clusters with Poles, but close to Germans, and another is a minimal outlier from Central/Eastern/Northern Europe, probably as a result of Balkan ancestry.

G25 North Euro PCA

Synome said...

I would definitely expect a mixed population in the Chernyakhov zone. It's thought that there may have been Goths, proto-Slavs, Geto-Dacians, and Sarmatians all living in the same region under a Gothic hegemony.

There's definitely a lot to be learned about the social dynamics of this crossroads of interaction.

JuanRivera said...

Did a lot of modeling in the last two hours. It turns out Yana is not only Kostenki/Sunghir+Tianyuan, but also has some Villabruna and Simulated_AASI_NW (though it isn't preferred against Tianyuan). MA1 is Yana+Villabruna+Simulated_AASI_NW, whereas AG3 is identical to MA1 in composition. Kolyma_Meso seems to be MA1+Devils_Gate_N+Alaska_Trailcreek_9000BP. Baikal_N indeed is partly Kolyma_Meso, and Baikal_EBA has some steppe ancestry (15.83% Okunevo, which translates to ~7% steppe). Ust_Belaya has no Anzick or Alaska_Trailcreek_9000BP, unlike later nearby cultures, and is strangely ~90 Shamanka_EBA, with extra Okunevo at 3.33% (translating to extra 1.47% steppe). Magadan_BA doesn't have apparent steppe ancestry, but given that it's ~13% Ust_Belaya, it has some (~0.89%). Since the Ust_Belaya samples come from the Ust'-Belaya culture (obviously), which is noted to have ties with the Bel'kachi culture, which in turn has ties with the Lake Baikal region, it provides a path of migration of steppe ancestry all the way to the Bering Sea.

Andrzejewski said...

Would the Natufian and Iberomaurusian link explain why Semitic and Hamitic languages are similar?

Would you suggest that the link bwreeen Levant_N and Anatolian_N point to similarities between pre-Islamic Middle East and Neolithic Europe, genotypically or even phenotypically?

Andrzejewski said...

Yes. Maybe Steppe ancestry is the reason why Chukchi Kamchatka have so many cognates with PIE.

And like I said before, maybe PIE are NOT a mixture of two ANE-derived populations (EHG + CHG) but a completely distinct Yana descendant population?

JuanRivera said...

Neo-eskimos (Ekven) are a mixture of Magadan_BA and Anzick, at a ratio of 2:1 (67%:33%). The Chukchi are mixtures of Magadan_BA, Ekven and traces of Yana_MA. The Koryak and Itelmen are mixtures of Magadan_BA, very minor Ust_Belaya and traces of Yana_MA. The Nivh and Nivkh (shouldn't they be the same people?) are mixtures of Devils_Gate_N, Jomon and Magadan_BA (plus 3.33% Ust_Belaya in the Nivh). As such, Ust_Belaya and Magadan_BA seem mostly to correspond to the Chukotko-Kamchatkan languages. The presence of Magadan_BA (and also Ust_Belaya) in the Nivkh and Nivh seems to validate Fortescue's hypothesis of a relationship between Chukotko-Kamchatkan and Nivkh, and as such, making a Magadan_BA-like population the likely speakers of a Chukotko-Kamchatkan-Amuric protolanguage.

JuanRivera said...

Well, the Goths very likely came from the Wielbark culture (which is in Poland), so it's not surprising to see a Chernyakhov sample that clusters with modern-day Poles.

JuanRivera said...

The Semitic and "Hamitic" (a polyphyletic group made up of Berber, Cushitic and Ancient Egyptian [the ancestor of modern Coptic]) are actually part of a single language family called Afroasiatic.

JuanRivera said...

We need Wielbark samples, as well as samples from the Neolithic of Yakutia, Chukotka and Kamchatka.

Andrzejewski said...

I thought they originated in (modern day) Sweden

Andrzejewski said...

Which may be due to Iberomaurusians’ influence on Natufians?

JuanRivera said...

Well, somewhat pulled towards modern-day Germans. The Wielbark culture is also near Germany.

JuanRivera said...

Poles (and Germans) are somewhat influenced by Scandinavians, so I wouldn't discard an ultimate origin of the Goths in Scandinavia.

Samuel Andrews said...

nMonte runs for the two Goths.....I included Lombard and Swedish as Germanic references. Ukrainian for Slavic referenence. All the non-Lombard samples from Lombard burials in HUngary as Balkan reference as well as classical Greek, Bulgaria_IA.



Cimmerian_Moldova:cim357 cim358,4.1



Cimmerian_Moldova:cim357 cim358,9.7

Samuel Andrews said...

The fit improves for the most Slavic-like Goth when Baltic_BA is included. Early Slavs were of largely Baltic BA origin.



Cimmerian_Moldova:cim357 cim358,11.9
Aegean_Italy Medieval,3

EastPole said...

“One of the two Chernyakhiv samples that I could put into the G25 clusters with Poles, but close to Germans”

What Germans? East Germans and Germans close to Poles didn’t exists at the time of Chernyakhiv culture. They were created after Germans conquered Wends in X century A.D. and mixed with them.

UKR_Chernyakhiv_Legedzine:MJ19 in PC1, which explains 55% of variance, is more eastern than Czech and Slovakian and in PC2, which explains 12% of variance, they are below anything Scandinavian. I don’t see that ‘Goths in Chernyakhiv’ theory has been proven by your PCA.
It is still an unproven theory:

André de Vasconcelos said...

Any idea what causes 'range check error'? I've always had this when trying to make Neighbour Joining

Arza said...

@ Davidski

Data S1

D-stats, Z-score > 3

Yorubas Chern Ukrainians Latvia_HG 0,0115 4,328
Yorubas Chern Ukrainians Iron_Gates_HG 0,0105 4,155
Yorubas Chern Ukrainians Lithuania_HG 0,0109 3,514
Yorubas Chern Ukrainians WHG 0,0116 3,383

Yorubas Chern Ukrainians Swedes 0,0067 4,451
Yorubas Chern Ukrainians Lithuanians 0,0063 3,589

highest f3

Romania_HG____ Chern Yorubas 0,212347 0,003795 55.961 52422
Latvia_CCC_WHG Chern Yorubas 0,210285 0,003086 68.144 98259
WHG___________ Chern Yorubas 0,209424 0,002326 90.041 141182
Latvia_HG_____ Chern Yorubas 0,20939 0,002163 96.804 145860
Bichon________ Chern Yorubas 0,209166 0,002748 76.108 139383
Iron_Gates_HG_ Chern Yorubas 0,209014 0,002075 100.722 141967
Lithuania_HG__ Chern Yorubas 0,208825 0,002281 91.546 133929
Poprad________ Chern Yorubas 0,208284 0,003415 60.998 83820
Ukr_BA________ Chern Yorubas 0,208281 0,003968 52.494 49426
SHG___________ Chern Yorubas 0,208077 0,002413 86.224 127678

Swedes_____ Chern Yorubas 0,207174 0,00193 107.371 147001
Lithuanians Chern Yorubas 0,206994 0,002026 102.167 146564
Latvians___ Chern Yorubas 0,206517 0,002057 100.393 146010
Estonians__ Chern Yorubas 0,205851 0,001917 107.401 147040
Orcadians__ Chern Yorubas 0,20549 0,001946 105.613 146912
Belarusians Chern Yorubas 0,205427 0,001904 107.911 146954
Poles______ Chern Yorubas 0,205338 0,002004 102.457 146390
Hungarians_ Chern Yorubas 0,204833 0,001916 106.914 147007
Germans____ Chern Yorubas 0,204583 0,00193 105.979 146795
Finns______ Chern Yorubas 0,204436 0,001933 105.751 147000

When you look at modern populations "Gothic source" seems to be clear - Swedes. but ancients show that possibly it's just a pull towards classic WHG that drags Chern towards Scandinavians. What I want to check is whether Swedes will be on top without MJ36, which seems to be heavily shifted towards WHG:

Latvia_HG Cher_All__ Mbuti.DG 0.292576 0.003888 75.259 131688
Latvia_HG Cher_19_37 Mbuti.DG 0.291298 0.003967 73.438 125714
Latvia_HG Cher_36___ Mbuti.DG 0.318453 0.010219 31.164 8930

BTW Note that in f3 Germanic prince from Poprad is higher than modern Swedes.

Arza said...

Visigoths looked like this::

"Olalde re: Visigoths

These individuals, archaeologically interpreted as Visigoths, are shifted from those at L'Esquerda in the direction of Northern and Central Europe (Figs. 1D and 2C and table S18), and we observe the Asian mitochondrial DNA (mtDNA) haplogroup C4a1a also found in Early Medieval Bavaria (20), supporting a recent link to groups with ancestry originally derived from Central and Eastern Europe.

Table S18. Best 2-way and 3-way models for NE_Iberia_c.6CE_PL (Pla de l'Horta). The models
in bold were used for Fig. 2C.

Selected model (p-value 9.20E-02):

NE_Iberia_c.6-8CE_ES 0.732 ±0.067
Bavaria_EarlyMedieval.SG 0.226 ±0.050
TSI 0.041 ±0.054

Bavaria -> Pannonia -> Northern Italy -> Spain?"

And now we have a "Balkan Goth" in Iberia and a "Balkan Goth" in Chernyakhov:


Iberia_Northeast_c.6CE_PL:I12163 100%
Distance 2.7157%

This is not a Sweden -> Poland -> Ukraine -> Balkans -> Italy -> Spain path, but rather:

Sweden -> Germany -> Balkans -> Chernyakhov
Sweden -> Germany -> Balkans -> Italy -> Spain

Arza said...

10 samples closest to:


HUN_MA_Szolad:SZ5 0.025195
UKR_Chernyakhiv_Shyshaky:MJ37 0.027156
German:German28 0.030095
French:French10 0.032713
Belgian:Belgium8 0.033221
German:German61 0.033360
Bell_Beaker_CZE:I4945 0.033501
German:German55 0.034090
Belgian:Belgium24 0.034096
French:French47 0.034798


Iberia_Northeast_c.6CE_PL:I12163 0.027156
HUN_MA_Szolad:SZ5 0.032573
Hungarian:NA15207 0.0329875
Montenegrin:Montenegro7 0.034371
Serbian:Serbian_Serbia5 0.034649
German:German28 0.037150
Austrian:Austria7 0.037305
Austrian:Austria13 0.037937
Austrian:Austria16 0.038019
Montenegrin:Montenegro1 0.038079

Drago said...

@ Arza

“Sweden -> Germany -> Balkans -> Chernyakhov”

The archaeological and historical evidence points to Goths coming to Balkans via Ukraine; and originally from the Vistula
The two samples thus far don’t contradict that perspective

Parastais said...

With nMonte and only pops more ancient than Chernyakhiv (not sure re Sarmatian RUS Caucasus though):
"sample": "Test1:UKR_Chernyakhiv_Legedzine",
"fit": 2.9136,
"Scythian_HUN": 30.83,
"SWE_IA": 26.67,
"Baltic_LVA_BA": 25,
"Sarmatian_RUS_Caucasus": 17.5,

J.S. said...


"Aren't there some good historical reasons for that?"

Well, yes, obviously you're right...I have always been told Brittany cluster close to the Irish samples, so, that's why I was a bit confused.

JuanRivera said...

Turns out that the best model of Ust_Belaya so far is Shamanka_EBA+Kurma_EBA+Ust_Ida_EBA+Kolyma_Meso. Both Kurma_EBA and Ust_Ida_EBA have higher Okunevo ancestry (~20% for Kurma_EBA, translating to ~8.83% steppe ancestry; ~24% for Ust_Ida_EBA, which translates to ~10.6% steppe ancestry). Shamanka_EBA ancestry comprises 60% in the model, Kolyma_Meso 4.17% and the rest is comprised of Kurma_EBA and Ust_Ida_EBA. Overall, Baikal_EBA ancestry is 95.83% in the model of Ust_Belaya.

Cy Tolliver said...


Do you have any opinions on what kind of population the Iberomaurusians actually were? They seem to be a kind of strange hybrid population of vaguely SSA/West Eurasians. I believe when the original paper came out they were modeled as a mix of Yoruba and West Eurasian, and then the Dzudzuana pre-print flipped the script and had the gene flow going IBM -> Yoruba, but had IBM deriving half or so of its ancestry from an "Ancient North Africa" that appeared to be related to Mota.

Davidski said...

@Cy Tolliver

Apparently the Iberomaurusians weren't really a mix between SSA and West Eurasians. See here...

Paleolithic DNA from the Caucasus reveals core of West Eurasian ancestry

JuanRivera said...

Discovered that Samara_HG has more extra ANE than Karelia_HG, however, it ends up having less ANE than Karelia_HG because Samara_HG has higher Ukraine_Mesolithic admixture (which has higher WHG, CHG and Pinarbasi_HG than EHG as a whole), plus it has extra raw CHG admixture and probably Hotu_HG and Pinarbasi_HG not present in either Sidelkino_HG or Ukraine_Mesolithic.

Andrzejewski said...

@JuanRivera so maybe it’s the Ukraine_Mesolithic (R1a1 guys?) who were Bug Dniester (WHG-rich) and Dnieper Donets (Anatolia farmer-rich) who introduced PIE from the west to Samara/Khvalynsk? If so, then @Dragos ain’t completely wrong.

JuanRivera said...

Dnieper-Donets are Ukraine_N (who model as Ukraine_Mesolithic+Samara_HG+CHG+Romania_HG). Bug-Dniester would be something between Ukraine_N and Romania_HG. It's later that WHG-rich EEF ancestry appears (though the Ukraine_N outlier is a WHG-rich EEF, without either Piedmont, EHG or Ukraine_HG). The fact that progressively later samples from all over the steppe (even to Western Siberia) have admixture from other parts of the steppe indicates long-distance interactions. As for Khvalynsk, I think it's very unlikely for it to have Karelia_HG admixture, instead, it has more likely West_Siberia_N admixture. Oddly, Khvalysnk seems to have a faint Kolyma_Meso signal absent in Samara_HG. I think the reason why Khvalynsk chooses Karelia_HG is similar to why Ukraine_Eneolithic chooses Maykop and Piedmont chooses Sarazm (overfitting, leading to results not observed in other programs such as qpAdm).

Davidski said...


I've got a new blog post coming about the ancestry of the East Germanics later today or early tomorrow.

Andrzejewski said...

Romania_HG as in Iron Gates?

Alexandros said...

Does the tree reveal anything regarding the origin of the Ashkelon samples? I am no expert in interpreting phylogenetic trees (hence the question), but it seems clear that Ashkelon_LBA and Ashkelon_IA2 derive their ancestry directly from the Levant. On the contrary, Ashkelon_IA1 sits nicely with post-Neolithic Anatolian samples. Would this be an indication of ancestry from that region rather than the Aegean? Particularly, given that they do not sit in the Minoan/Mycenaean/Empuries2 branch of the tree appearing further down. Not sure if I am overinterpreting..

Davidski said...


The placement of Ashkelon_IA1 with the post-Neolithic Anatolian samples might be coincidental. That is, if they're a mixture between Ashkelon_LBA and a Mediterranean European population then this might be creating a post-Neolithic Anatolian-like effect in the tree.

Bob Floy said...

"I have always been told Brittany cluster close to the Irish samples"

The Breton are descended from "Celtic" Britons who were fleeing the advancing Anglo-Saxons.

JuanRivera said...

Romania_HG is a different sample than Iron_Gates_HG.

JuanRivera said...

Neither MA1 nor AG3 show extra Kostenki/Sunghir, but they do show Villabruna (as stated before, MA1 and AG3 are autosomally identical). That and the fact that Yana_UP also shows Villabruna means that the KS cluster was replaced by a Villabruna-like population very early on. In some models, Yana_UP also shows some extra Ust_Ishim admixture.

JuanRivera said...

Though, I haven't yet tried any models with Vestonice.

JuanRivera said...

Yana_UP prefers Vestonice to Villabruna, whereas MA1 (and AG3) prefers Villabruna to Vestonice.

Matt said...

@Alexandros, as a belated comment on your question re: Ashkelon_IA1, the four infants are diverse individuals who IMO most of whom probably aren't admixed between Ashkelon_LBA and any single other population. So looking at the average probably doesn't tell you anything too much.

Puree said...

Hi I'm learning how to use this function and I'm not clear on something. The West Eurasia scaled pop averages. dat file, when plugged into PAST shows up at columns labeled 'PC2' to 'PC25' and then a column labeled 'Y'. I would like to put my own PC25 into the table but am not sure that to do about my PC1 and also what to put in the Y column. What am I missing? Thanks.