Friday, July 12, 2019

Getting the most out of the Global25

The first thing you need to know about the Global25 is that I update the relevant datasheets regularly, usually every few weeks, but they're always at these links:

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient


Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern

Each sample has a population code and an individual code. The population codes represent the countries, ethnic groups and/or archeological affinities of the samples, and I often modify these codes to suit my needs. On the other hand, the individual codes are unique to most of the samples and I usually don't change them.

So if you'd like to know more details about the samples try searching for their individual codes via a decent online search engine. Basic information about many of the samples is also available in the "anno" files here.

The main purpose of the Global25 is to provide data for mixture modeling. In other words, for estimating ancestry proportions, both ancient and modern (see here). This can be done on your computer with the R program and the nMonte R script, or online with a couple of different tools, which I discuss below.

If you don't have R installed on your computer, you can get it here, while nMonte is available here. For this tutorial please download nMonte and nMonte3, and store them in your main working folder (usually My Documents).

Once you have R set up, make sure its working directory is the same place where you stored nMonte. You can check this in R by clicking on "File" and then "Change dir". Additionally, you'll need two nMonte input files in the working directory titled "data" and "target". Examples of these files are available here. We'll be using them to test the ancient ancestry proportions of a sample set from present-day England.

Before you can begin the analysis you need to first call the nMonte script by typing or copy pasting source('nMonte.R') into the R console window, and then hitting "enter" on your keyboard. This is what you should see in the R console window afterwards.

To start the mixture modeling process, type or copy paste getMonte('data.txt', 'target.txt') into the R console window, hit "enter", and wait for the results. After a short time, probably less than a minute or two, you should see this output.

The data and target files contain population averages. And, as you can see, the results that these population averages have produced are in line with what one would expect from such a model focusing on the genetic shifts in Northern Europe during the Late Neolithic. Very similar ancient ancestry proportions have been reported for the English and other Northern Europeans recently in scientific literature.

However, when focusing on exceptionally fine-scale genetic variation that isn't reflected too well in the Global25 population averages, a more effective strategy might be to use multiple individuals from each reference population and let nMonte3 aggregate and average the inferred ancestry proportions.

This is often the case when attempting to model ancestry proportions for more recent periods, such as the Middle Ages. So let's try this with the English sample set using a modified data file, which is available here.

Replace the old data file with the new one in your working directory, and, like before, copy paste into the R console window the following two commands, hitting "enter" after each one: source('nMonte3.R') and getMonte('data.txt', 'target.txt'). This is what you should eventually see.

It's difficult to say how accurate these estimates are. But they look more or less correct considering the limited and less than ideal reference samples. For instance, the individuals labeled SWE_Viking_Age_Sigtuna are supposed to be stand ins for Danish and Norwegian Vikings, but they're a relatively heterogeneous group from Sweden, possibly with some British or Irish ancestry, so they might be skewing the results.

However, I'll be adding many more ancient samples to the Global25 datasheets as they become available, including lots of new Vikings, which should greatly improve the accuracy of these sorts of fine-scale mixture models.

An alternative to the R-based approach is the online Global25 nMonte Runner [LINK]. This is a free tool, and easy to work with via several drop down menus, but users must become sponsors to unlock all of its available features. To run an analysis follow these three steps:
1) use the first drop down menu to pick the reference populations of your choice (up to four are allowed for free users)

2) move down to the second set of the drop down lists and either pick a test population that is already in the system or copy paste a set of Global25 coordinates into the space labeled "Enter/Paste Sets of Coordinates - Scaled and Comma-separated"

3) feel free to experiment with the additional options if you're game and willing to part with a little cash to help pay for the site.

Another exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is freely available HERE. And it works offline too, after downloading the web page onto your computer. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen caps below show me doing just that.

However, it's important to note that the Global25 is a Principal Component Analysis (PCA), so it makes good sense to also use it for producing PCA graphs. To do this just plot any combination of two or three of its Principal Components (PCs) to create 2D or 3D graphs, respectively. This can be done with a wide variety of programs, including PAST, which is freely available here.

To produce a 2D graph, open a Global25 datasheet in PAST, choose comma as the separator, highlight any two columns of data, click on the "Plot" tab and, from the drop down list, pick "XY graph". Below is a series of graphs that I created in exactly this way. I also color coded the samples according to their geographic origins. This was done by ticking the "Row attributes" tab.

PAST can also be used to run PCA on subsets of the Global25 scaled data to produce remarkably accurate plots of fine-scale population structure. For instance, here's a plot based on present-day populations from north of the Alps, Balkans and Pyrenees.

To try this create a new text file with your choice of populations from the Global25 scaled datasheet, open it with PAST and choose Multivariate > Ordination > Principal Components Analysis. I've already put together several datasheets limited to European, Northern European, West Eurasian and South Asian populations. They're available at the links below along with more details on how to run them with PAST.

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

Another free, easy to use online tool that works with Global25 coordinates is the Principal Component Analysis (PCA) runner HERE. Below is a screen cap of me checking out one of the many PCA that it offers.

And if you're fond of tree-like structures as a means to describe fine-scale genetic variation, please see this blog post...

Global25 workshop 4: a neighbour joining tree


Samuel Andrews said...

Thanks for sharing G25 David. It is by far the best free ancestry tool around. It explains the fundamentals of genetic variation for the whole world.

Alexandros said...

David, are you still accepting samples? I have a few previously analysed with global10 and few other new. Can I send these over to the relevant email address?

Davidski said...

Yes, you can, and people with Global10 coordinates get their Global25 coordinates for free.

Samuel Andrews said...

David added a lot of new modern pops to the G25 PCA including Syrians.

Syrians are diverse. Some have a lot of Arab ancestry. On average, they're closest to Lebanese. They're intermediate between Kurds & Levant.


Syrian (many outliers)


Samuel Andrews said...

So just a handful of tiny North Caucasus tribes are the only other people with as much Yamnaya/Steppe as northern Europeans. They're fascinating isolates with a ancestry unlike anyone else.

Technically they live in Europe but genetically they shouldn't be considered European as they don't descend from European hunter gatherers or farmers at all. Their Steppe ancestry is straight from Yamnaya not later Corded Ware-derived Srubnaya.

Samuel Andrews said...

Test out the North Caucasus pops David added. They fit really well as Maykop+Yamnaya. Some have 40% Yamnaya ancestry.

Kaitag_Caucasus: Yamnaya_Kalmykia,38.9
Ingushian_Caucasus: Yamnaya_Kalmykia,19.1
Kubachinian_Caucasus: Yamnaya_Kalmykia,37.6
Karachay_Caucasus (Turkic): Yamnaya_Kalmykia,14

Aram said...


Even more amazing is that they got the bulk of their Steppe ancestry after Bronze Age. And even more surprising via exogamy. Because Ingushes for example virtually don't have any Steppic Y dna.
Here is a citation from Wang paper.


First, sometime after the BA present-day North Caucasian populations must have received additional gene-flow from steppe populations that now separates them from southern Caucasians, who largely retained the BA ancestry profile. The archaeological and historic records suggest numerous incursions during the subsequent Iron Age and Medieval times33, but ancient DNA from these time periods will be needed to test this directly.

Aram said...


What is better scaled or not scaled lists? Or they serve for different purposes.

Also I noticed in Your previous NJ Chord tree that Catacomb and Afanasievo form a tight cluster.
There is a archaeological theory that Catacomb was influenced by Afanasievo. Cranial deformation.
What do You think was there a back migration from Afanasievo to Catacomb?

Davidski said...


The scaled coordinates produce more stable and realistic results most of the time. But use both and see what works better in each case in comparison to other methods and scientific literature.

And I reckon that Afanasievo probably came from the same place on the Pontic-Caspian steppe as Catacomb, and this is what is likely to explain the similarities between them.

Aram said...

Catacomb Ukraine is more EEF admixed than the Catacomb RUS as I was expected.

[1] "distance%=3.628"




[1] "distance%=3.5462"



Darkveti Meshoko was included but it didn't want it.

I expect that in Multi Cordon Ware period (after Catacomb) there will be even more increase in EEF ancestry.

priscus said...

Good work as always David.
I wanted to ask, since I only use R both for nMonte but also to make G25 based PCAs, when interested for instance in west Eurasian specific variation, does PAST essentially compute something like prcomp(g25_subset, center=TRUE, scale=FALSE) ? I've been using the latter and the results seem more or less similar both to yours and academia, though I'm not completely confident.
Also, when considering WE specific variation, which "eastern" populations do you typically end up including for reference?

Davidski said...


I wanted to ask, since I only use R both for nMonte but also to make G25 based PCAs, when interested for instance in west Eurasian specific variation, does PAST essentially compute something like prcomp(g25_subset, center=TRUE, scale=FALSE)?

I haven't actually tried this yet, but yes R should be able to do exactly what PAST does.

Also, when considering WE specific variation, which "eastern" populations do you typically end up including for reference?

I often extend my West Eurasian analyses as far as West Siberia and the Indus Valley (minus the really eastern groups along the way, like the Kalmyks), which does help, especially when dealing with ANE-rich ancient populations that no longer exist.

Slumbery said...


In your opinion is it advised to use distance penalty in nMonte runs? In some cases the results can be drastically different.

For example I run some tests on Central European populations to seek sources for EF + steppe and the Lengyel vs. Globular Amphora match came out with completely different results, depending on the penalty.
The difference of the fits itself is not informative, because of course including the penalty results a worse fit.

Andrzejewski said...

You both nailed it right. It wasn’t a prehistoric Maykop -> Steppe vector but ultimately a post-BA Steppe -> NW Caucasus one, which explains why Northern Caucasus people look more “Northern shifted” than Southern Caucasus.

Some Georgians and Armenians look European-like (Stalin could pass as a Southern European) because of the Darkveti Meshoko and KA have CHG and Anatolia_N

Samuel Andrews said...


Is there a simple way to make an ADMIXTURE type test from paste?

Drago said...

@ Andre

“Stalin could pass as a Southern European”

Stalin looks neither Greek nor Italian nor Spanish. So which Southern European would he pass for ?
Is all you base your statement on the olive shade of his skin ?

Bob Floy said...

"Stalin looks neither Greek nor Italian nor Spanish. So which Southern European would he pass for ?
Is all you base your statement on the olive shade of his skin?"

C'mon, man, he's short and has a moustache, that's not good enough for you?

Drago said...

Bob; you’re right !

Bob Floy said...

Whennn the moon hits your eye like a big pizza pie, that's-a STAAAALIN...

Davidski said...


Enough with the he looks like that, she looks like this, they look European etc.

It's outdated and too subjective, and doesn't lead to anything useful. Stick to genetics and learn to analyze the data.

@Samuel Andrews

There's no simple way to estimate ancestry proportions with PAST. But it should be possible one way or another.

Samuel Andrews said...

"Even more amazing is that they got the bulk of their Steppe ancestry after Bronze Age. And even more surprising via exogamy. Because Ingushes for example virtually don't have any Steppic Y dna.
Here is a citation from Wang paper."

That is amazing. I just wonder what post-Bronze age pop from Europe it was. The Caucasus in general is interesting for genetics because it has been isolated in the last 6,000 years, has preserved a large variety of segregated ethnic groups/languages.

Davidski said...

Recent founder effects may have eliminated the steppe Y-haplogroups in some of those Caucasus ethnic groups with high levels of genome-wide steppe ancestry. Founder effects are especially common in isolated, endogamous communities.

Andrzejewski said...

Sam, would you characterize the current populations of the Caucasus as largely descending from the Meshoko-Darkveti?

Garvan said...

"Samuel Andrews said... Is there a simple way to make an ADMIXTURE type test from paste?"

I have used the mclust package in R to create clusters that can be displayed as stacked bar charts in excel. You can load mclust from the “Load Package” menu entry in R. The source.txt file in the example below is the same comma separated format as used by nmonte.
From my notes, I think this will work:


Samuel Andrews said...

"Sam, would you characterize the current populations of the Caucasus as largely descending from the Meshoko-Darkveti?"

Yes or other similar ancient Caucasus groups.

Samuel Andrews said...


Thanks. Wow, that sounds like it works. I downloaded mclust online. When I try to install it is says I must uninstall R 3.5.1. Is there a way to keep R 3.5.1 which I use for nMonte & have mclust?

Andrzejewski said...

Which by and large and for the most part, it mean that most people of the Caucasus are mainly evenly split down the middle as an approximate 1:1 admixture of CHG : ANF, with some minor WHG, Steppe and Iran_N, correct?

Alex Desira said...

I have a question about the Maori sample. It has a fair share of Austronesian ancestry, which makes sense, but it also seems to have a notable amount of European ancestry. How reliable is the sample itself?

Other than that, great work! Thanks for continuing to update this.

P.S. Also, thank you for adding country tags to some of the ancient samples, they are very helpful.

Garvan said...

Samuel Andrews said..."I downloaded mclust online. When I try to install it is says I must uninstall R 3.5.1. Is there a way to keep R 3.5.1 which I use for nMonte & have mclust?"

I have R version 3.5.2. I must have installed mclust at some stage, but have forgotten. I always install from the menu in R, so I get the compatible versions of scripts with less errors during the install. If you downloaded the source separately, try again using “Packages – Install packages…” from the menu in R, and let R download the package and install it for you automaticly.

Davidski said...

@Alex Desira

The Maori is from the Simons Genome Diversity Project. That's all I know. See here...

I'm not sure, but I don't think there are any unadmxied Maoris left.

Samuel Andrews said...

Britons in France really are (near) pure blood descendants of the 'Bretons' who settled there in the 5th centuryad. Few people know at the western tip of France there are British people who have lived there for 1,500 years, spoke their own language till a few generations ago. Few people also know England was founded by Germans but.....

This was expected based on Y DNA. I think like 80% were previously shown to be R1b P312 which is significantly higher than the French average.




Alex Desira said...

Ah, I see. Thank you for clearing that up.

Bob Floy said...


Georgians definitely have more than minor steppe, with almost no WHG.
And I don't think it would be safe to say that they're a 1:1 mix of CHG and ANF, they definitely have much more CHG than ANF.

J.S. said...

@ Samuel. Andrews

"This was expected based on Y DNA. I think like 80% were previously shown to be R1b P312 which is significantly higher than the French average. "

Actually, we still don't know the French average.

According to the study "Prehistoric migrations through the Mediterranean basin shaped Corsican Y-chromosome diversity", Provence is 90% R1b n=259

J.S. said...

The multiple maternal legacy of the Late Iron Age group of Urville-Nacqueville (France, Normandy) documents a long-standing genetic contact zone in northwestern France

"Maternal affinities with geographically close extant populations were confirmed by the low FST values between the UN group and five extant populations from regions located in northwestern France (Sarthe, FST = 0.00211; Morbihan, FST = 0.00221; Somme, FST = 0.00385; Calvados, FST = 0.00752 and Finistere, FST = 0.00867; Fig 3A) or between UN and Irish (FST = 0.00309) or British populations (FST = 0.00338) (S11 Table)."

Morbihan and Finistère are in Brittany.

Andrzejewski said...

Few people know that England was founded by Germans thanks to the Post-War (Second World War) propaganda by the BBC and British education system. Most residents of England (especially in North and East shires like York and Manchester) and also in Lowland Scotland are mostly Angles, Saxons and Jutes but because of WWII it was deliberately done the efforts to distance UK from their roots.

a said...

Andrzejewski said...
"Few people know that England was founded by Germans...."
Would you say the festival of Angeln's/Saxon's history month; to celebrate culture, food [physical customs] and the language type we use-has been replaced by other self serving groups?

Andrzejewski said...

Are Georgians really much more CHG than ANF/EEF? It’s really bewildering all these large scale population dynamics, a post-Imertian shift from pure CHG (Satsurblia) into Sioni, Meshoko Darkveti and Shulaveri -Shomu/Kura-Araxes. Apparently there was a massive introduction of agriculture from Anatolia resulting in onset and development of vinticulture at the 6th millennium BCE onward. Besides, Svans and Laz are predominately Haplogroup G y-dna, which may, as in the case with IE languages, indicate a uniparental paternal linguistic (and genetic?) founding effect.

Going off on a tangent a bit, what’s the real impact (population turnover) of the so-called “Uruk expansion”? Did people of Mesopotamia ancestry (Ubaidian and/or Sumerian) really pack up and move to work at metallurgy at the foothills of the Caucasus mountains? Was Johanna Nichols right to refer to the Nakh as descendants of the first agriculturalists from the Northern Fertile Crescent? Or was she mistaking the date and place of origins with earlier farmers from Anatolia rather?

All these questions are relevant and pertinent, and answers to them may shed more light on the prehistory of IE languages. Or not.

truth said...

There is still not many european samples.
North-East Italians, Swiss, South German, Tyrol, other parts of France ,etc.

Alexandros said...

Great thanks! Will be sending the samples over the next couple of days.

Simon_W said...

I wouldn't say England was founded by Germans. Unless by Germans you mean Germanic people, which includes other Germanic nations. The Anglo-Saxons in the Global 25 are quite distinct from modern-day Germans, except perhaps the Frisians and Low Saxons on the North Sea coast. There is a small landscape called Angeln in the Northeast of present-day Schleswig-Holstein, so part of the Angles may have come from there, but it's really small, so probably it's not their whole place of origin, which may have included parts of Jutland in Denmark. And the Jutes were from Jutland anyway.

Simon_W said...

@ Samuel Andrews




I immediately checked this with a few other samples, this is what I got:

[1] "distance%=1.2423"



[1] "distance%=0.9136"



Quite amazing! Because of phys. anthro I didn't expect an outcome like this.
But we still don't know how British-like the Gauls of Aremorica were, so this doesn't necessarily mean near complete replacement.

Simon_W said...

The best model without overfit that I found for my own ancestry (50% Alemannic from Germany and Switzerland, 25% East Prussian German, 25% Romagnol North Italian):


DEU_MA, 37.5
ITA_Collegno_MA:CL36, 22.7
CZE_Hallstatt_Bylany:DA111, 22.5
ITA_Collegno_MA:CL121, 10
Baltic_LTU_Late_Antiquity_low_res:DA171, 7.3

Leaving away the Collegno samples, and using older, more or less sensible substitutes instead:


DEU_MA, 47.1
CZE_Hallstatt_Bylany:DA111, 22.2
HRV_Early_IA, 13.1
Bell_Beaker_ITA, 6.2
Baltic_LTU_Late_Antiquity_low_res:DA171, 5
EGY_Hellenistic, 3.6
Levant_ISR_Askelon_LBA, 2.8

Overall quite similar to the former model. The biggest difference being the larger proportion of Germanic ancestry and the lower proportion of overall Southern admixture. This probably means that CL36 from Collegno has some Germanic ancestry, and eats it up in the first model. Striking also the ancient Egyptian and southern Levantine admixture, in all likelihood from my North Italian ancestors. Also note the substantial Gaulish/Hallstatt_Bylany ancestry. No, these people didn't completely vanish, I'm their descendant! But 13.1% HRV_Early_IA + 6.2% Bell_Beaker_ITA + 3.6% EGY_Hellenistic + 2.8% Levant_ISR_Askelon_LBA = 25.7%, precisely what I inherited from my Italian grandfather, so he didn't have Gaulish ancestry, inspite of being from Northern Italy.

Matt said...

Re Breton samples, one thing I'd note is that the samples in G25 actually have quite a large spread in G25:

(Brittany samples in black, other sets of samples have their own color)

Seems slightly larger than English or English Cornwall, despite fewer Breton samples?

Some of the samples are as "northern" as the most "northern" English samples, others are slightly "southern" of the most "southern" individuals in the English cluster and overlap with the most "northern" individuals in the French set.

Quite diverse relative to their sample size and relative to BI (where intra-country diversity quite low for comparable land area), almost as much as the Scots or Irish, with a smaller sample size.

Drago said...


I wouldn't say England was founded by Germans”

J.S. said...

@ Matt

Spatial variation of local genetic differentiation (Fst at 30 km) and of LD (at 15 kb).

Samuel Andrews said...

Samuel Andrews said...

@Bob Floy, Andre

Georgians do have more CHG than EEF. Roughly 30% EEF, 55% CHG. That's how they cluster in G25 PCA.

Andrzejewski said...

@Sam and 15% Steppe Indo-Europeans?

Bob Floy said...


That's more or less what I thought, thanks. More than half CHG.


I think Armenians have more ANF than Georgians, speaking of the caucuses in general. Modern Armenians, that is.

Chechens are really interesting, to me they basically look like Georgians with more steppe.

Samuel Andrews said...


This link has ancient ancestry estimates I made for West Eurasia. You should save it somewhere.

Anatolian ancestry is much bigger in (southern) Europe than anywhere in the Middle East. Anatolian ancestry does not reach above 30% in the Middle East outside of Anatolia (turkey).

Georgians, Abhkasians, some North Caucasians are basically a continuation of the Neolithic Caucasus. 50-60% CHG, 30% Anatolia, 10-20% other stuff (mostly IranN, some Steppe).

Anatolian admix in Iran & Saudi Arabia is very low. This makes sense because Neolithic Anatolians basically took over Europe. While, when they moved into new land in the Middle East it was a different story.

Bob Floy said...


Thanks for that, but am I reading this right? Northern ethnic groups like the Irish, Scots, Norwegians, etc., have less than 1% CHG? Or does it not show up in that column because it's part of the "Yamnaya"?

Samuel Andrews said...

CHG is in the Yamnaya.

Drago said...

Simon_W said...

In my opinion, the view that's favoured by current leftist/social liberals is that anyone can belong to any people, DNA and ancestry don't matter. Once you're naturalised, your old ancestry no longer matters and you're part of the new club. Seen that way it doesn't make any sense saying that I'm 1/4 North Italian, because it were my great-great-grandparents who left the newly founded Italy in the 19th century for Switzerland. That there's still quite a lot of foreign blood involved is completely overseen, because it's all human and "we're all the same". Many people with migration background embrace this view and hate being asked about their "true origins", which they consider to be a racialist question. Others however are proud of their diverse exotic roots and like sharing what they know about it. There's no consensus how to deal with this matter.

At any rate in the Swiss highschool I didn't hear anything about the Germanic, Alemannic migration to Switzerland either. I don't think it's because of leftist indoctrination, to the contrary, I rather guess it's because the Celtic Helvetii and their socii are a better projection surface for Swiss nationalist feelings, because they are common to both the French Swiss and the German Swiss, and they help setting ourself apart from Germany.

Simon_W said...


"Re Breton samples, one thing I'd note is that the samples in G25 actually have quite a large spread in G25:"

Makes sense, considering that Brittany has always been divided into a Breton speaking western half and a Gallo speaking eastern half. Despite its name, Gallo is a Romance, French-related dialect.

Matt said...

@Simon_W, yeah that's an interesting note, note that the paper which J.S. references above shows a split between the three Breton speaking provinces, and Ile-et-Vilaine which falls under Pays-Gallo (as wiki describes the linguistic geography -

ADMIXTURE results -

Intra-NW France PCA - (note position of Ile-et-Vilaine centroid vs other Bretagne regions)

With Europe PCA - and (unfortunately does not narrow down the Western France subregions of DESIR-Rep)

Definitions of provinces -

Though this paper defines Brittany as excluding some other parts of Western France that would be included in wiki's main article's definition:

It might be interesting to know which of the samples in G25 are from which subregions of Brittany/NW France - I'd imagine the samples close to matching Welsh/Irish are probably from the westernmost (and most Breton) subregions.

But I would guess the actual subregions will probably be found somewhere deep in latitude and longitude scores within the humanorigins panel accompanying sample description file.

Nezih Seven said...

I created a model with Global 25 mainly for the peoples of Anatolia, South Caucasus, Iran and Mesopotamia but it works well for Balkans, North Caucasus, some parts of Central Asia and Levant too. The article is in Turkish, but the images of the results are not:

Alexandros said...

Quick question. How do you make 'CORRELATION OF ADMIXTURE POPULATIONS' appear at the end of the output?

From the screenshot above, it seems as if it is a default setting, but my nmonte3 analyses do not show this. I guess it is important for determining overfitting in the model.

Davidski said...

You'll see the 'CORRELATION OF ADMIXTURE POPULATIONS' at the end of the output in nMonte, but not nMonte3.

Alexandros said...

Great, thanks! I' ll check it there.

ancient dna said...

Davidski, whats the meaning of the _o, _o1, _o2 in sample names? thanks!

Davidski said...

The _o suffix stands for "outlier".

So, Sintashta_MLBA_o1 means Sintashta_MLBA_outlier1.

Simon_W said...

Speaking of Switzerland, I just noticed that there are now averages for all three major Swiss ethnicities available in the Global 25. So I developped a model that should make sense for them all and checked how differently they score in that model.

First of all I noticed that the Celtic component appears to be rather like French_South than like Hallstatt_Bylany:DA111. Which does make kind of sense, because Switzerland lies Southwest of Bohemia. But I didn't want to use the modern Southern French in my models, hence I decided to make my own average of French Bell Beakers, using all French Bell Beakers except the two from northern France. Which worked pretty well, as you'll see below.

But then I also had to choose a proxy for the Roman admixture. I decided to use CL121 from Collegno, because he's from Italy, he's South Italian/Sicilian-like and he's without Longobard admixture.

So now for the models, first the French Swiss:

[1] "distance%=1.586"



More than half of their ancestry is Celtic/Gaulish. But nearly 1/4 of their ancestry is Germanic. Probably rather Burgundian than Alemannic. A considerable Roman admixture is also apparent. No wonder they call themselves "Romands", i.e. Romans!

But now on to the German Swiss:

[1] "distance%=1.1823"



They are nearly 50% Germanic, makes sense, because they speak German. The rest is a Roman admixed Gaulish Substrate, the ratio Gaulish:Roman is very similar to the ratio in the French Swiss. Interestingly they also lack Hallstatt_Bylany, like the Romands.

And finally the Italian Swiss:

[1] "distance%=2.5274"



Here CL121 has more than 50%, the Germanic admixture is low and presumably mostly from the Longobards. Gaulish-like ancestry comes second. This suggests once more how big the upheavals even at the northern fringe of Italy were during the Roman age. The Italian Swiss and the North Italians are not simply a continuation of the local Celts, but considerably Mediterraneanized and Romanized also on the genetic level.

Samuel Andrews said...


Yes, Switzerland was missing piece in G25 PCA. The Italian Swiss cluster with Tuscans so they look like immigrants from central Italy? Like your grandpa was.

Maybe you would want to try modelling German & French Swiss with Frecnh_1. Its the main cluster in France.
FrenchCluster1 0.126831429 0.142174143 0.044338714 0.013981286 0.041458143 0.004661429 -0.002047857 0.002736143 0.011015143 0.022935857 -0.003270714 0.004774286 -0.009896571 -0.007824857 0.009151429 0.000473571 -0.002868429 0.001212429 0.001598143 0.000643 -0.000802143 0.001095286 -0.003556571 0.006506857 -0.000188286

zardos said...

@Simon and all:
Anyone tried something similar with German local populations?
Very interesting if future studies will prove the Roman impact, not just in Switzerland.

Simon_W said...


Roman impact in Germany is very possible west of the Rhine and south of the Danube, the parts of Germany that belonged to the Roman empire for an extended while.

In fact, my maternal grandmother, whose ancestry is 3/4 from Swabia in southwestern Germany and 1/4 from Northwestern Switzerland, scores like this in my model:

[1] "distance%=1.1557"



Very similar to the German Swiss in the amount of Germanic admixture, just particularly Italian in relation to the Celtic proportion. I suspect it's because of her ancestry from Biberach in Upper Swabia, south of the Danube; the relatives from that branch look rather exotic and southern.

Simon_W said...

@Samuel Andrews

I tried it, but apparently there is missing a value in row 5:

FrenchCluster1 0.126831429 0.142174143 0.044338714 0.013981286 0.041458143 0.004661429 -0.002047857 0.002736143 0.011015143 0.022935857 -0.003270714 0.004774286 -0.009896571 -0.007824857 0.009151429 0.000473571 -0.002868429 0.001212429 0.001598143 0.000643 -0.000802143 0.001095286 -0.003556571 0.006506857 -0.000188286
PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15
PC16 PC17 PC18 PC19 PC20 PC21 PC22 PC23 PC24 PC25
FrenchCluster1 NA NA NA NA NA NA NA NA NA NA
Fehler in check_formats(myData, myTarget) : Missing value in row 5

Simon_W said...

Pushing my model further, Bergamo looks similar to the Italian Swiss:

[1] "distance%=1.9968"



Simon_W said...

Does anyone know where the French_East come from? Judging from my model they could be German speaking Alsatians:

[1] "distance%=1.1655"



zardos said...

Thank you. If your model is correct, it would mean about 50 old German in the South with about one quarter Southern, possibly to a large portion of it real Roman ancestry.
How about Rhine land and the North?
Do you have early Slavs for comparison?

Simon_W said...

I don't have regional German samples, I'm not a collector of such things. And what do you mean with early Slavs for comparison? How they are mixed? Or how they are mixed into the Germans?

Simon_W said...

BTW, I said French_East could be German speaking Alsatians, although we all know they're predominantly French speaking by now. More correct would be the wording: They could be French speaking Alsatians who used to be German speaking until a few generations ago. At any rate this sample looks similar to the German Swiss, just a bit less Roman.

Simon_W said...

Oops I just saw why I got that failure report when trying modelling with the French cluster 1! I have to put it into comma separated format. Wait a minute!

Simon_W said...

[1] "distance%=1.4474"



[1] "distance%=1.0016"



[1] "distance%=2.3308"



Looks like all Swiss ethnic groups alike can be modelled as roughly 50% of the French cluster 1. However, I don't think this is a useful modelling as long as we've got decent ancient samples at hand. Because the French cluster 1 is a modern cluster of mixed origin, hence it rather hides the ancient origins than uncovering them.

zardos said...

I meant how much early Slavic influence can be seen in German subpopulations.

Simon_W said...


The Slavic admixture in the East German subpopulation available in the Global25 sheet seems considerable:

[1] "distance%=1.7412"



However, if I apply the same model on the other non-Eastern German sample, I obtain an overfit:

[1] "distance%=1.0809"



The French Beakers and the early Bohemian Slavs are abused here to adjust the coords as closesly to the German sample as possible, even though historically speaking Slavic admixture West of the Elbe and Saale must be very scant and limited to a few small areas. The fit of the model is too good.

I can't deal with this otherwise than by deleting the early Slavs from the model, which results in

[1] "distance%=1.5133"



Judging from this non-Eastern Germans are predominantly Germanic, with some (regionally varying) Celtic infusion.

MasterOfAnimals said...

Please add the Copts of egypt.

WesternPonticSteppe said...

Why some paleo samples aren't in the G25 dataset (Oase1, Satsurblia, KremsWA3, Ostuni1)?

Davidski said...

They're too old, heavily damaged, and/or they lack enough data.

You can't really analyze deep ancestry that's far out of the range of modern humans with this sort of methodology.

Puree said...

Would it be possible to include the date in the names of your updated datasheets so that users may know if they already have obtained the update?

Puree said...

@Davidski Your post of Dec 15, 2019 raises a question in my mind: how many SNPs are enough to consider a sample sufficient for G25-style use? On this point could you please explain the terms 'coverage' and 'endognous' when used to describe ancient samples? If this is answered elsewhere I haven't yet found the place....

Davidski said...

The Global25 is based on ~300,000 SNPs. I generally only run samples that have at least 15% of these SNPs.

Lower coverage samples, in other words those with fewer markers, aren't included, or sometimes they are but they're marked with the "low_res" suffix.

CrM said...

Do the "Ossetian" samples represent South Ossetians?

Davidski said...


Don't know. See here...

CrM said...


Thanks. One more question, do the Georgian samples come from the same study?

Davidski said...

You can probably track them down via their individual IDs.

Unknown said...


I've always had this question is it better to use pop average spreadsheets or the full datasheets?

Davidski said...

In theory, population averages are more robust than singleton results.

However, in reality many of the population averages aren't representative enough to be useful, especially when it comes to large countries with significant genetic substructures.

So the best thing to do in many cases is to create your own population averages from the most relevant samples.

Unknown said...

How will I be able to get my G25 coordinates? Do I have to email you?

Samuel Andrews said...

There are two Armenian pops in G25 PCA. Armenian_Hemsheni and Armenian. The latter is distinguished by large dose of Levant ancestry not present in the former. Is "Armenian" a disparso population living in Levant?