Wednesday, July 10, 2019

Global25 workshop 4: a neighbour joining tree

Phylogenetic trees are easy to produce, but there's an infinite number of ways to run them, and, depending on the input data you're using, some methods are a lot more effective than others. In this tutorial I'm going to demonstrate one method that has worked well for me when looking at the fine scale genetic relationships between ancient and present-day human populations with my Global25 data.

To get started download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Clustering > Neighbour joining. Here's a screen cap of me doing just that...

Then, from the tabs on the right, choose Chord as the similarity index and MAR_Iberomaurusian, the most distinct unit in the datasheet, as the root. PAST offers an exceptionally large range of similarity indices and they generally produce similar results, but, in my experience, Chord creates among the most visually pleasing outcomes when dealing with fine scale genetic substructures.

This is the tree you should see after exporting the image via the graph settings tab in PAST, and, if you like, rotating it 90 degrees with an image editing software of your choice. Note the fairly substantial differences between the populations from Northwestern Europe, which are often difficult to tease apart in such analyses.

If you have your own Global25 coordinates you can add them to my PAST-compatible datasheet to see where you cluster in this tree. And, of course, you can design your own PAST-compatible datasheets and trees with any combination of populations and/or individuals from the Global25 text files at the links below. It's easy; just copy paste the coordinates of your choice into an empty text file, open it with PAST and then save it with the dat extension to create a new PAST datasheet. But make sure never to mix up the scaled and non-scaled coordinates.

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient


Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern

An important point to keep in mind when running these sorts of analyses is that PAST and other such programs need enough genetic differentiation to latch onto in order to produce meaningful results. Thus, even when studying the relationships between very closely related populations, it's not just useful to include a root population or individual, but also some near and far related groups to help the analysis algorithm flesh out the key genetic substructures.

To be honest, I don't really know whether using the Chord index and rooting the tree with MAR_Iberomaurusian is the best way to run a neighbour joining tree analysis of ancient and present-day West Eurasian genetic variation. What do you think? Feel free to let me know in the comments.

Arza said...

@ Davidski

Can you check such D-stats:

Yoruba Chern Ukrainian Swedish
Yoruba Chern Ukrainian Latvia_HG

first with all three Chern samples and then without MJ-36?

Sgt said...

Natufian (12,750 yBP) an end-branch? I1072 shows some genetic continuity to ERS1790732.SG from Sidon (3,650 yBP) and to Lebanon and the modern Levant in general.

Davidski said...


Can you check such D-stats:

Yoruba Chern Ukrainian Swedish
Yoruba Chern Ukrainian Latvia_HG

I'll run these in a day or so, but I can tell you right now that D-stats won't be able to discriminate between Ukrainians and Swedes.

Davidski said...


Natufian (12,750 yBP) an end-branch?

Yes, but this doesn't mean what you think it means. This is not a Y-chromosome mutations tree.

J.S. said...

@ Davidski
How do you explain France Brittany cluster with England-Roman?

Sgt said...

@ Davidski
No, this is supposed to be a neighbor-joining tree, but (in places) as an evolutionary scaffold it is backwards.

Davidski said...


How do you explain France Brittany cluster with England-Roman?

Aren't there some good historical reasons for that?


No, this is supposed to be a neighbor-joining tree, but (in places) as an evolutionary scaffold it is backwards.

The problem is that your assumptions about the Natufians are too basic and just wrong.

It's a fact that Natufians are outliers within West Eurasia and not significantly ancestral to other West Eurasians, even if they contributed some ancestry to other ancient and modern West Eurasians. That's why their position looks unusual in the tree.

You would need to design a different tree with ancient populations from outside of West Eurasia, like the Iberomaurusians, to correctly explain the evolutionary relationship between the Natufians and other West Eurasians.

Davidski said...


The following samples have been added to the Global25 datasheets. Same links as always.


I've also updated the neighbor joining tree in the blog post above with these samples, so the tree looks different now than it did originally.

Davidski said...

Alright, I updated the datasheet and the tree. The root population is now MAR_Iberomaurusian.

The tree does look a lot better now. LOL

Davidski said...

One of the two Chernyakhiv samples that I could put into the G25 clusters with Poles, but close to Germans, and another is a minimal outlier from Central/Eastern/Northern Europe, probably as a result of Balkan ancestry.

G25 North Euro PCA

Synome said...

I would definitely expect a mixed population in the Chernyakhov zone. It's thought that there may have been Goths, proto-Slavs, Geto-Dacians, and Sarmatians all living in the same region under a Gothic hegemony.

There's definitely a lot to be learned about the social dynamics of this crossroads of interaction.

Andrzejewski said...

Would the Natufian and Iberomaurusian link explain why Semitic and Hamitic languages are similar?

Would you suggest that the link bwreeen Levant_N and Anatolian_N point to similarities between pre-Islamic Middle East and Neolithic Europe, genotypically or even phenotypically?

Andrzejewski said...

Yes. Maybe Steppe ancestry is the reason why Chukchi Kamchatka have so many cognates with PIE.

And like I said before, maybe PIE are NOT a mixture of two ANE-derived populations (EHG + CHG) but a completely distinct Yana descendant population?

Andrzejewski said...

I thought they originated in (modern day) Sweden

Andrzejewski said...

Which may be due to Iberomaurusians’ influence on Natufians?

Samuel Andrews said...

nMonte runs for the two Goths.....I included Lombard and Swedish as Germanic references. Ukrainian for Slavic referenence. All the non-Lombard samples from Lombard burials in HUngary as Balkan reference as well as classical Greek, Bulgaria_IA.



Cimmerian_Moldova:cim357 cim358,4.1



Cimmerian_Moldova:cim357 cim358,9.7

Samuel Andrews said...

The fit improves for the most Slavic-like Goth when Baltic_BA is included. Early Slavs were of largely Baltic BA origin.



Cimmerian_Moldova:cim357 cim358,11.9
Aegean_Italy Medieval,3

EastPole said...

“One of the two Chernyakhiv samples that I could put into the G25 clusters with Poles, but close to Germans”

What Germans? East Germans and Germans close to Poles didn’t exists at the time of Chernyakhiv culture. They were created after Germans conquered Wends in X century A.D. and mixed with them.

UKR_Chernyakhiv_Legedzine:MJ19 in PC1, which explains 55% of variance, is more eastern than Czech and Slovakian and in PC2, which explains 12% of variance, they are below anything Scandinavian. I don’t see that ‘Goths in Chernyakhiv’ theory has been proven by your PCA.
It is still an unproven theory:

André de Vasconcelos said...

Any idea what causes 'range check error'? I've always had this when trying to make Neighbour Joining

Arza said...

@ Davidski

Data S1

D-stats, Z-score > 3

Yorubas Chern Ukrainians Latvia_HG 0,0115 4,328
Yorubas Chern Ukrainians Iron_Gates_HG 0,0105 4,155
Yorubas Chern Ukrainians Lithuania_HG 0,0109 3,514
Yorubas Chern Ukrainians WHG 0,0116 3,383

Yorubas Chern Ukrainians Swedes 0,0067 4,451
Yorubas Chern Ukrainians Lithuanians 0,0063 3,589

highest f3

Romania_HG____ Chern Yorubas 0,212347 0,003795 55.961 52422
Latvia_CCC_WHG Chern Yorubas 0,210285 0,003086 68.144 98259
WHG___________ Chern Yorubas 0,209424 0,002326 90.041 141182
Latvia_HG_____ Chern Yorubas 0,20939 0,002163 96.804 145860
Bichon________ Chern Yorubas 0,209166 0,002748 76.108 139383
Iron_Gates_HG_ Chern Yorubas 0,209014 0,002075 100.722 141967
Lithuania_HG__ Chern Yorubas 0,208825 0,002281 91.546 133929
Poprad________ Chern Yorubas 0,208284 0,003415 60.998 83820
Ukr_BA________ Chern Yorubas 0,208281 0,003968 52.494 49426
SHG___________ Chern Yorubas 0,208077 0,002413 86.224 127678

Swedes_____ Chern Yorubas 0,207174 0,00193 107.371 147001
Lithuanians Chern Yorubas 0,206994 0,002026 102.167 146564
Latvians___ Chern Yorubas 0,206517 0,002057 100.393 146010
Estonians__ Chern Yorubas 0,205851 0,001917 107.401 147040
Orcadians__ Chern Yorubas 0,20549 0,001946 105.613 146912
Belarusians Chern Yorubas 0,205427 0,001904 107.911 146954
Poles______ Chern Yorubas 0,205338 0,002004 102.457 146390
Hungarians_ Chern Yorubas 0,204833 0,001916 106.914 147007
Germans____ Chern Yorubas 0,204583 0,00193 105.979 146795
Finns______ Chern Yorubas 0,204436 0,001933 105.751 147000

When you look at modern populations "Gothic source" seems to be clear - Swedes. but ancients show that possibly it's just a pull towards classic WHG that drags Chern towards Scandinavians. What I want to check is whether Swedes will be on top without MJ36, which seems to be heavily shifted towards WHG:

Latvia_HG Cher_All__ Mbuti.DG 0.292576 0.003888 75.259 131688
Latvia_HG Cher_19_37 Mbuti.DG 0.291298 0.003967 73.438 125714
Latvia_HG Cher_36___ Mbuti.DG 0.318453 0.010219 31.164 8930

BTW Note that in f3 Germanic prince from Poprad is higher than modern Swedes.

Arza said...

Visigoths looked like this::

"Olalde re: Visigoths

These individuals, archaeologically interpreted as Visigoths, are shifted from those at L'Esquerda in the direction of Northern and Central Europe (Figs. 1D and 2C and table S18), and we observe the Asian mitochondrial DNA (mtDNA) haplogroup C4a1a also found in Early Medieval Bavaria (20), supporting a recent link to groups with ancestry originally derived from Central and Eastern Europe.

Table S18. Best 2-way and 3-way models for NE_Iberia_c.6CE_PL (Pla de l'Horta). The models
in bold were used for Fig. 2C.

Selected model (p-value 9.20E-02):

NE_Iberia_c.6-8CE_ES 0.732 ±0.067
Bavaria_EarlyMedieval.SG 0.226 ±0.050
TSI 0.041 ±0.054

Bavaria -> Pannonia -> Northern Italy -> Spain?"

And now we have a "Balkan Goth" in Iberia and a "Balkan Goth" in Chernyakhov:


Iberia_Northeast_c.6CE_PL:I12163 100%
Distance 2.7157%

This is not a Sweden -> Poland -> Ukraine -> Balkans -> Italy -> Spain path, but rather:

Sweden -> Germany -> Balkans -> Chernyakhov
Sweden -> Germany -> Balkans -> Italy -> Spain

Arza said...

10 samples closest to:


HUN_MA_Szolad:SZ5 0.025195
UKR_Chernyakhiv_Shyshaky:MJ37 0.027156
German:German28 0.030095
French:French10 0.032713
Belgian:Belgium8 0.033221
German:German61 0.033360
Bell_Beaker_CZE:I4945 0.033501
German:German55 0.034090
Belgian:Belgium24 0.034096
French:French47 0.034798


Iberia_Northeast_c.6CE_PL:I12163 0.027156
HUN_MA_Szolad:SZ5 0.032573
Hungarian:NA15207 0.0329875
Montenegrin:Montenegro7 0.034371
Serbian:Serbian_Serbia5 0.034649
German:German28 0.037150
Austrian:Austria7 0.037305
Austrian:Austria13 0.037937
Austrian:Austria16 0.038019
Montenegrin:Montenegro1 0.038079

Drago said...

@ Arza

“Sweden -> Germany -> Balkans -> Chernyakhov”

The archaeological and historical evidence points to Goths coming to Balkans via Ukraine; and originally from the Vistula
The two samples thus far don’t contradict that perspective

Anonymous said...

With nMonte and only pops more ancient than Chernyakhiv (not sure re Sarmatian RUS Caucasus though):
"sample": "Test1:UKR_Chernyakhiv_Legedzine",
"fit": 2.9136,
"Scythian_HUN": 30.83,
"SWE_IA": 26.67,
"Baltic_LVA_BA": 25,
"Sarmatian_RUS_Caucasus": 17.5,

J.S. said...


"Aren't there some good historical reasons for that?"

Well, yes, obviously you're right...I have always been told Brittany cluster close to the Irish samples, so, that's why I was a bit confused.

Cy Tolliver said...


Do you have any opinions on what kind of population the Iberomaurusians actually were? They seem to be a kind of strange hybrid population of vaguely SSA/West Eurasians. I believe when the original paper came out they were modeled as a mix of Yoruba and West Eurasian, and then the Dzudzuana pre-print flipped the script and had the gene flow going IBM -> Yoruba, but had IBM deriving half or so of its ancestry from an "Ancient North Africa" that appeared to be related to Mota.

Davidski said...

@Cy Tolliver

Apparently the Iberomaurusians weren't really a mix between SSA and West Eurasians. See here...

Paleolithic DNA from the Caucasus reveals core of West Eurasian ancestry

Andrzejewski said...

@JuanRivera so maybe it’s the Ukraine_Mesolithic (R1a1 guys?) who were Bug Dniester (WHG-rich) and Dnieper Donets (Anatolia farmer-rich) who introduced PIE from the west to Samara/Khvalynsk? If so, then @Dragos ain’t completely wrong.

Davidski said...


I've got a new blog post coming about the ancestry of the East Germanics later today or early tomorrow.

Andrzejewski said...

Romania_HG as in Iron Gates?

Alexandros said...

Does the tree reveal anything regarding the origin of the Ashkelon samples? I am no expert in interpreting phylogenetic trees (hence the question), but it seems clear that Ashkelon_LBA and Ashkelon_IA2 derive their ancestry directly from the Levant. On the contrary, Ashkelon_IA1 sits nicely with post-Neolithic Anatolian samples. Would this be an indication of ancestry from that region rather than the Aegean? Particularly, given that they do not sit in the Minoan/Mycenaean/Empuries2 branch of the tree appearing further down. Not sure if I am overinterpreting..

Davidski said...


The placement of Ashkelon_IA1 with the post-Neolithic Anatolian samples might be coincidental. That is, if they're a mixture between Ashkelon_LBA and a Mediterranean European population then this might be creating a post-Neolithic Anatolian-like effect in the tree.

Bob Floy said...

"I have always been told Brittany cluster close to the Irish samples"

The Breton are descended from "Celtic" Britons who were fleeing the advancing Anglo-Saxons.

Matt said...

@Alexandros, as a belated comment on your question re: Ashkelon_IA1, the four infants are diverse individuals who IMO most of whom probably aren't admixed between Ashkelon_LBA and any single other population. So looking at the average probably doesn't tell you anything too much.

Puree said...

Hi I'm learning how to use this function and I'm not clear on something. The West Eurasia scaled pop averages. dat file, when plugged into PAST shows up at columns labeled 'PC2' to 'PC25' and then a column labeled 'Y'. I would like to put my own PC25 into the table but am not sure that to do about my PC1 and also what to put in the Y column. What am I missing? Thanks.