search this blog

Friday, July 12, 2019

Getting the most out of the Global25


The first thing you need to know about the Global25 is that I update the relevant datasheets regularly, usually every few weeks, but they're always at these links:

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient

...

Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern

Global25 data for samples from a variety of papers that have been published recently will eventually be incorporated into the main datasheets linked above, but the process might take several weeks or even months. In the meantime, feel free to use the temporary datasheets below. Thanks for your patience.

Allentoft 2023

Chylenski 2023

Jeong 2024

Koptekin 2022

Olalde 2023

Peltola 2022

Penske 2023

Posth 2023

Sirak 2024

Skourtanioti 2023

Stolarek 2023

Varela 2023

Wang 2023

Yu 2023

Each sample has a population code and an individual code. The population codes represent the countries, ethnic groups and/or archeological affinities of the samples, and I often modify these codes to suit my needs. On the other hand, the individual codes are unique to most of the samples and I usually don't change them.

So if you'd like to know more details about the samples try searching for their individual codes via a decent online search engine. Basic information about many of the samples is also available in the "anno" files here.

The main purpose of the Global25 is to provide data for mixture modeling. In other words, for estimating ancestry proportions, both ancient and modern (see here). This can be done on your computer with the R program and the nMonte R script, or online with a couple of different tools, which I discuss below.

If you don't have R installed on your computer, you can get it here, while nMonte is available here. For this tutorial please download nMonte and nMonte3, and store them in your main working folder (usually My Documents).

Once you have R set up, make sure its working directory is the same place where you stored nMonte. You can check this in R by clicking on "File" and then "Change dir". Additionally, you'll need two nMonte input files in the working directory titled "data" and "target". Examples of these files are available here. We'll be using them to test the ancient ancestry proportions of a sample set from present-day England.

Before you can begin the analysis you need to first call the nMonte script by typing or copy pasting source('nMonte.R') into the R console window, and then hitting "enter" on your keyboard. This is what you should see in the R console window afterwards.


To start the mixture modeling process, type or copy paste getMonte('data.txt', 'target.txt') into the R console window, hit "enter", and wait for the results. After a short time, probably less than a minute or two, you should see this output.


The data and target files contain population averages. And, as you can see, the results that these population averages have produced are in line with what one would expect from such a model focusing on the genetic shifts in Northern Europe during the Late Neolithic. Very similar ancient ancestry proportions have been reported for the English and other Northern Europeans recently in scientific literature.

However, when focusing on exceptionally fine-scale genetic variation that isn't reflected too well in the Global25 population averages, a more effective strategy might be to use multiple individuals from each reference population and let nMonte3 aggregate and average the inferred ancestry proportions.

This is often the case when attempting to model ancestry proportions for more recent periods, such as the Middle Ages. So let's try this with the English sample set using a modified data file, which is available here.

Replace the old data file with the new one in your working directory, and, like before, copy paste into the R console window the following two commands, hitting "enter" after each one: source('nMonte3.R') and getMonte('data.txt', 'target.txt'). This is what you should eventually see.


It's difficult to say how accurate these estimates are. But they look more or less correct considering the limited and less than ideal reference samples. For instance, the individuals labeled SWE_Viking_Age_Sigtuna are supposed to be stand ins for Danish and Norwegian Vikings, but they're a relatively heterogeneous group from Sweden, possibly with some British or Irish ancestry, so they might be skewing the results.

However, I'll be adding many more ancient samples to the Global25 datasheets as they become available, including lots of new Vikings, which should greatly improve the accuracy of these sorts of fine-scale mixture models.

An exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is the VahaduoJS. It's freely available HERE, and it also works offline after downloading the web page. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen caps below show me doing just that.






However, it's important to note that the Global25 is a Principal Component Analysis (PCA), so it makes good sense to also use it for producing PCA graphs. To do this just plot any combination of two or three of its Principal Components (PCs) to create 2D or 3D graphs, respectively. This can be done with a wide variety of programs, including PAST, which is freely available here.

To produce a 2D graph, open a Global25 datasheet in PAST, choose comma as the separator, highlight any two columns of data, click on the "Plot" tab and, from the drop down list, pick "XY graph". Below is a series of graphs that I created in exactly this way. I also color coded the samples according to their geographic origins. This was done by ticking the "Row attributes" tab.


PAST can also be used to run PCA on subsets of the Global25 scaled data to produce remarkably accurate plots of fine-scale population structure. For instance, here's a plot based on present-day populations from north of the Alps, Balkans and Pyrenees.


To try this create a new text file with your choice of populations from the Global25 scaled datasheet, open it with PAST and choose Multivariate > Ordination > Principal Components Analysis. I've already put together several datasheets limited to European, Northern European, West Eurasian and South Asian populations. They're available at the links below along with more details on how to run them with PAST.

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

Another free, easy to use online tool that works with Global25 coordinates is the Vahaduo Global25 Views [LINK]. Below is a screen cap of me checking out one of the many PCA that it offers.


And if you're fond of tree-like structures as a means to describe fine-scale genetic variation, please see this blog post...

Global25 workshop 4: a neighbour joining tree

See also...

New Global25 interpretation tools

165 comments:

Samuel Andrews said...

Thanks for sharing G25 David. It is by far the best free ancestry tool around. It explains the fundamentals of genetic variation for the whole world.

Alexandros said...

David, are you still accepting samples? I have a few previously analysed with global10 and few other new. Can I send these over to the relevant email address?

Davidski said...

Yes, you can, and people with Global10 coordinates get their Global25 coordinates for free.

Samuel Andrews said...

David added a lot of new modern pops to the G25 PCA including Syrians.

Syrians are diverse. Some have a lot of Arab ancestry. On average, they're closest to Lebanese. They're intermediate between Kurds & Levant.

1.0572"

Syrian (many outliers)

Kurdish,30.1
Lebanese_Druze,21.1
Samaritan,20.6
Levant_BA_North,12.7
Sintashta_MLBA,5.4
BedouinB,5.4
Yoruba,3.4
Lebanese_Christian,1.3
Assyrian,0

Samuel Andrews said...

So just a handful of tiny North Caucasus tribes are the only other people with as much Yamnaya/Steppe as northern Europeans. They're fascinating isolates with a ancestry unlike anyone else.

Technically they live in Europe but genetically they shouldn't be considered European as they don't descend from European hunter gatherers or farmers at all. Their Steppe ancestry is straight from Yamnaya not later Corded Ware-derived Srubnaya.

Samuel Andrews said...

Test out the North Caucasus pops David added. They fit really well as Maykop+Yamnaya. Some have 40% Yamnaya ancestry.

Kaitag_Caucasus: Yamnaya_Kalmykia,38.9
Ingushian_Caucasus: Yamnaya_Kalmykia,19.1
Kubachinian_Caucasus: Yamnaya_Kalmykia,37.6
Karachay_Caucasus (Turkic): Yamnaya_Kalmykia,14

Aram said...

Samuel

Even more amazing is that they got the bulk of their Steppe ancestry after Bronze Age. And even more surprising via exogamy. Because Ingushes for example virtually don't have any Steppic Y dna.
Here is a citation from Wang paper.

-----

First, sometime after the BA present-day North Caucasian populations must have received additional gene-flow from steppe populations that now separates them from southern Caucasians, who largely retained the BA ancestry profile. The archaeological and historic records suggest numerous incursions during the subsequent Iron Age and Medieval times33, but ancient DNA from these time periods will be needed to test this directly.

Aram said...

Davidski

What is better scaled or not scaled lists? Or they serve for different purposes.

Also I noticed in Your previous NJ Chord tree that Catacomb and Afanasievo form a tight cluster.
There is a archaeological theory that Catacomb was influenced by Afanasievo. Cranial deformation.
What do You think was there a back migration from Afanasievo to Catacomb?

Davidski said...

@Aram

The scaled coordinates produce more stable and realistic results most of the time. But use both and see what works better in each case in comparison to other methods and scientific literature.

And I reckon that Afanasievo probably came from the same place on the Pontic-Caspian steppe as Catacomb, and this is what is likely to explain the similarities between them.

Aram said...

Catacomb Ukraine is more EEF admixed than the Catacomb RUS as I was expected.

[1] "distance%=3.628"

UKR_Catacomb

RUS_Catacomb,92.6
UKR_Trypillia_En,7.4

and

[1] "distance%=3.5462"

UKR_Catacomb

RUS_Catacomb,90.4
UKR_Trypillia_En,4.2
Corded_Ware_POL,2.4
Corded_Ware_Proto-Unetice_POL,1.6
RUS_Afanasievo,1.4

Darkveti Meshoko was included but it didn't want it.

I expect that in Multi Cordon Ware period (after Catacomb) there will be even more increase in EEF ancestry.

claravallensis said...

Good work as always David.
I wanted to ask, since I only use R both for nMonte but also to make G25 based PCAs, when interested for instance in west Eurasian specific variation, does PAST essentially compute something like prcomp(g25_subset, center=TRUE, scale=FALSE) ? I've been using the latter and the results seem more or less similar both to yours and academia, though I'm not completely confident.
Also, when considering WE specific variation, which "eastern" populations do you typically end up including for reference?

Davidski said...

@priscus

I wanted to ask, since I only use R both for nMonte but also to make G25 based PCAs, when interested for instance in west Eurasian specific variation, does PAST essentially compute something like prcomp(g25_subset, center=TRUE, scale=FALSE)?

I haven't actually tried this yet, but yes R should be able to do exactly what PAST does.

Also, when considering WE specific variation, which "eastern" populations do you typically end up including for reference?

I often extend my West Eurasian analyses as far as West Siberia and the Indus Valley (minus the really eastern groups along the way, like the Kalmyks), which does help, especially when dealing with ANE-rich ancient populations that no longer exist.

Slumbery said...

@Davidski

In your opinion is it advised to use distance penalty in nMonte runs? In some cases the results can be drastically different.

For example I run some tests on Central European populations to seek sources for EF + steppe and the Lengyel vs. Globular Amphora match came out with completely different results, depending on the penalty.
The difference of the fits itself is not informative, because of course including the penalty results a worse fit.

Andrzejewski said...

You both nailed it right. It wasn’t a prehistoric Maykop -> Steppe vector but ultimately a post-BA Steppe -> NW Caucasus one, which explains why Northern Caucasus people look more “Northern shifted” than Southern Caucasus.

Some Georgians and Armenians look European-like (Stalin could pass as a Southern European) because of the Darkveti Meshoko and KA have CHG and Anatolia_N

Samuel Andrews said...

@Davidski,

Is there a simple way to make an ADMIXTURE type test from paste?

Drago said...

@ Andre

“Stalin could pass as a Southern European”

Lol
Stalin looks neither Greek nor Italian nor Spanish. So which Southern European would he pass for ?
Is all you base your statement on the olive shade of his skin ?

Bob Floy said...

@Drago
"Stalin looks neither Greek nor Italian nor Spanish. So which Southern European would he pass for ?
Is all you base your statement on the olive shade of his skin?"

C'mon, man, he's short and has a moustache, that's not good enough for you?

Drago said...

Bob; you’re right !

Bob Floy said...

Whennn the moon hits your eye like a big pizza pie, that's-a STAAAALIN...

Davidski said...

@Andrzejewski

Enough with the he looks like that, she looks like this, they look European etc.

It's outdated and too subjective, and doesn't lead to anything useful. Stick to genetics and learn to analyze the data.

@Samuel Andrews

There's no simple way to estimate ancestry proportions with PAST. But it should be possible one way or another.

Samuel Andrews said...

@Aram,
"Even more amazing is that they got the bulk of their Steppe ancestry after Bronze Age. And even more surprising via exogamy. Because Ingushes for example virtually don't have any Steppic Y dna.
Here is a citation from Wang paper."

That is amazing. I just wonder what post-Bronze age pop from Europe it was. The Caucasus in general is interesting for genetics because it has been isolated in the last 6,000 years, has preserved a large variety of segregated ethnic groups/languages.

Davidski said...

Recent founder effects may have eliminated the steppe Y-haplogroups in some of those Caucasus ethnic groups with high levels of genome-wide steppe ancestry. Founder effects are especially common in isolated, endogamous communities.

Andrzejewski said...

Sam, would you characterize the current populations of the Caucasus as largely descending from the Meshoko-Darkveti?

Garvan said...

"Samuel Andrews said... Is there a simple way to make an ADMIXTURE type test from paste?"

I have used the mclust package in R to create clusters that can be displayed as stacked bar charts in excel. You can load mclust from the “Load Package” menu entry in R. The source.txt file in the example below is the same comma separated format as used by nmonte.
From my notes, I think this will work:

data<-read.csv('source.txt',head=T,row.names=1)
C<-Mclust(data)
write.csv(C$z,'test.csv',quote=F)

Samuel Andrews said...

@Andre,
"Sam, would you characterize the current populations of the Caucasus as largely descending from the Meshoko-Darkveti?"

Yes or other similar ancient Caucasus groups.

Samuel Andrews said...

@Garvan,

Thanks. Wow, that sounds like it works. I downloaded mclust online. When I try to install it is says I must uninstall R 3.5.1. Is there a way to keep R 3.5.1 which I use for nMonte & have mclust?

Andrzejewski said...

Which by and large and for the most part, it mean that most people of the Caucasus are mainly evenly split down the middle as an approximate 1:1 admixture of CHG : ANF, with some minor WHG, Steppe and Iran_N, correct?

Alex Desira said...

I have a question about the Maori sample. It has a fair share of Austronesian ancestry, which makes sense, but it also seems to have a notable amount of European ancestry. How reliable is the sample itself?

Other than that, great work! Thanks for continuing to update this.

P.S. Also, thank you for adding country tags to some of the ancient samples, they are very helpful.

Garvan said...

Samuel Andrews said..."I downloaded mclust online. When I try to install it is says I must uninstall R 3.5.1. Is there a way to keep R 3.5.1 which I use for nMonte & have mclust?"

I have R version 3.5.2. I must have installed mclust at some stage, but have forgotten. I always install from the menu in R, so I get the compatible versions of scripts with less errors during the install. If you downloaded the source separately, try again using “Packages – Install packages…” from the menu in R, and let R download the package and install it for you automaticly.

Davidski said...

@Alex Desira

The Maori is from the Simons Genome Diversity Project. That's all I know. See here...

https://www.simonsfoundation.org/simons-genome-diversity-project

I'm not sure, but I don't think there are any unadmxied Maoris left.

Samuel Andrews said...

Britons in France really are (near) pure blood descendants of the 'Bretons' who settled there in the 5th centuryad. Few people know at the western tip of France there are British people who have lived there for 1,500 years, spoke their own language till a few generations ago. Few people also know England was founded by Germans but.....

This was expected based on Y DNA. I think like 80% were previously shown to be R1b P312 which is significantly higher than the French average.


0.7811"

French_Brittany

Welsh,44.2
Irish,38.7
FrenchCluster1,12
England_IA,5.1

Alex Desira said...

Ah, I see. Thank you for clearing that up.

Bob Floy said...

@Andre

Georgians definitely have more than minor steppe, with almost no WHG.
And I don't think it would be safe to say that they're a 1:1 mix of CHG and ANF, they definitely have much more CHG than ANF.

J.S. said...

@ Samuel. Andrews

"This was expected based on Y DNA. I think like 80% were previously shown to be R1b P312 which is significantly higher than the French average. "

Actually, we still don't know the French average.

According to the study "Prehistoric migrations through the Mediterranean basin shaped Corsican Y-chromosome diversity", Provence is 90% R1b n=259

J.S. said...

The multiple maternal legacy of the Late Iron Age group of Urville-Nacqueville (France, Normandy) documents a long-standing genetic contact zone in northwestern France

"Maternal affinities with geographically close extant populations were confirmed by the low FST values between the UN group and five extant populations from regions located in northwestern France (Sarthe, FST = 0.00211; Morbihan, FST = 0.00221; Somme, FST = 0.00385; Calvados, FST = 0.00752 and Finistere, FST = 0.00867; Fig 3A) or between UN and Irish (FST = 0.00309) or British populations (FST = 0.00338) (S11 Table)."

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0207459

Morbihan and Finistère are in Brittany.

Andrzejewski said...

Few people know that England was founded by Germans thanks to the Post-War (Second World War) propaganda by the BBC and British education system. Most residents of England (especially in North and East shires like York and Manchester) and also in Lowland Scotland are mostly Angles, Saxons and Jutes but because of WWII it was deliberately done the efforts to distance UK from their roots.

a said...

Andrzejewski said...
"Few people know that England was founded by Germans...."
Would you say the festival of Angeln's/Saxon's history month; to celebrate culture, food [physical customs] and the language type we use-has been replaced by other self serving groups?

Andrzejewski said...

Are Georgians really much more CHG than ANF/EEF? It’s really bewildering all these large scale population dynamics, a post-Imertian shift from pure CHG (Satsurblia) into Sioni, Meshoko Darkveti and Shulaveri -Shomu/Kura-Araxes. Apparently there was a massive introduction of agriculture from Anatolia resulting in onset and development of vinticulture at the 6th millennium BCE onward. Besides, Svans and Laz are predominately Haplogroup G y-dna, which may, as in the case with IE languages, indicate a uniparental paternal linguistic (and genetic?) founding effect.

Going off on a tangent a bit, what’s the real impact (population turnover) of the so-called “Uruk expansion”? Did people of Mesopotamia ancestry (Ubaidian and/or Sumerian) really pack up and move to work at metallurgy at the foothills of the Caucasus mountains? Was Johanna Nichols right to refer to the Nakh as descendants of the first agriculturalists from the Northern Fertile Crescent? Or was she mistaking the date and place of origins with earlier farmers from Anatolia rather?

All these questions are relevant and pertinent, and answers to them may shed more light on the prehistory of IE languages. Or not.

truth said...

There is still not many european samples.
North-East Italians, Swiss, South German, Tyrol, other parts of France ,etc.

Alexandros said...

Great thanks! Will be sending the samples over the next couple of days.

Simon_W said...

I wouldn't say England was founded by Germans. Unless by Germans you mean Germanic people, which includes other Germanic nations. The Anglo-Saxons in the Global 25 are quite distinct from modern-day Germans, except perhaps the Frisians and Low Saxons on the North Sea coast. There is a small landscape called Angeln in the Northeast of present-day Schleswig-Holstein, so part of the Angles may have come from there, but it's really small, so probably it's not their whole place of origin, which may have included parts of Jutland in Denmark. And the Jutes were from Jutland anyway.

Simon_W said...

@ Samuel Andrews

"0.7811"

French_Brittany

Welsh,44.2
Irish,38.7
FrenchCluster1,12
England_IA,5.1"

I immediately checked this with a few other samples, this is what I got:

[1] "distance%=1.2423"

French_Brittany

English_Cornwall,94.9
French,5.1

[1] "distance%=0.9136"

French_Brittany

England_IA,48.7
England_Roman,35.2
CZE_Hallstatt_Bylany:DA111,16.1

Quite amazing! Because of phys. anthro I didn't expect an outcome like this.
But we still don't know how British-like the Gauls of Aremorica were, so this doesn't necessarily mean near complete replacement.

Simon_W said...

The best model without overfit that I found for my own ancestry (50% Alemannic from Germany and Switzerland, 25% East Prussian German, 25% Romagnol North Italian):

"distance%=1.6053"

DEU_MA, 37.5
ITA_Collegno_MA:CL36, 22.7
CZE_Hallstatt_Bylany:DA111, 22.5
ITA_Collegno_MA:CL121, 10
Baltic_LTU_Late_Antiquity_low_res:DA171, 7.3

Leaving away the Collegno samples, and using older, more or less sensible substitutes instead:


"distance%=1.7152

DEU_MA, 47.1
CZE_Hallstatt_Bylany:DA111, 22.2
HRV_Early_IA, 13.1
Bell_Beaker_ITA, 6.2
Baltic_LTU_Late_Antiquity_low_res:DA171, 5
EGY_Hellenistic, 3.6
Levant_ISR_Askelon_LBA, 2.8

Overall quite similar to the former model. The biggest difference being the larger proportion of Germanic ancestry and the lower proportion of overall Southern admixture. This probably means that CL36 from Collegno has some Germanic ancestry, and eats it up in the first model. Striking also the ancient Egyptian and southern Levantine admixture, in all likelihood from my North Italian ancestors. Also note the substantial Gaulish/Hallstatt_Bylany ancestry. No, these people didn't completely vanish, I'm their descendant! But 13.1% HRV_Early_IA + 6.2% Bell_Beaker_ITA + 3.6% EGY_Hellenistic + 2.8% Levant_ISR_Askelon_LBA = 25.7%, precisely what I inherited from my Italian grandfather, so he didn't have Gaulish ancestry, inspite of being from Northern Italy.

Matt said...

Re Breton samples, one thing I'd note is that the samples in G25 actually have quite a large spread in G25: https://imgur.com/a/V3cBjvw

(Brittany samples in black, other sets of samples have their own color)

Seems slightly larger than English or English Cornwall, despite fewer Breton samples?

Some of the samples are as "northern" as the most "northern" English samples, others are slightly "southern" of the most "southern" individuals in the English cluster and overlap with the most "northern" individuals in the French set.

Quite diverse relative to their sample size and relative to BI (where intra-country diversity quite low for comparable land area), almost as much as the Scots or Irish, with a smaller sample size.

Drago said...

Simon

I wouldn't say England was founded by Germans”

But the funny thing is was the claim that only a few people knew that . I guess that might be true, in some parts of the western world.- a reflection of the education system; which is being held captive by marxists / feminists

Bob Floy said...

@Drago
"But the funny thing is was the claim that only a few people knew that. I guess that might be true, in some parts of the western world.- a reflection of the education system; which is being held captive by marxists / feminists"

That's it right there.

Bob Floy said...

@Andre
"Are Georgians really much more CHG than ANF/EEF?"

If I'm wrong, someone can correct me.

"what’s the real impact (population turnover) of the so-called 'Uruk expansion'?"

The impact seems to be shrinking.

J.S. said...

@ Matt

Spatial variation of local genetic differentiation (Fst at 30 km) and of LD (at 15 kb).

https://www.nature.com/articles/ejhg2014175/figures/4

Samuel Andrews said...

@Dragos, Andre

I agree with Dragos it has nothing in particular to do with British being afraid of associating with Germans after WW2 but more to do with post-WW2 worldview many historians have which is a reaction to Nazis, racism, etc.

I think it really comes down to is a left-wing worldview dislike ingroup bias towards people with the same ancestry, language, culture as you have. Or in other words they don't like nationalism.

So, left-leaning historians do their best to downplay 'nationalism' (or simply preference for one's ethnic group/tribe/family) in history.

They would rather people in the past have had no ingroup bias for their ethnic group but instead to be just as likely to work/join/identify with people from different ethnic groups/languages/cultures.


So, far genetic replacement/change is the norm when a new group migrates into new land where people already live there. The reason replacement/change so far is the norm is because obviously people have ingroup bias for people who share ancestry, language, culture with them.

People work to make their ethnic group/tribe grow in size, have more food, have more land, etc. This creates large healthy population who 'replaces' many of the genes & languages of the people already living there.

Left-leaning historians willingness to downplay this fundamental aspect of human nature is why they were so wrong about the population history of Europe and so far it seems for most of the world.

Samuel Andrews said...

@Bob Floy, Andre

Georgians do have more CHG than EEF. Roughly 30% EEF, 55% CHG. That's how they cluster in G25 PCA.

Andrzejewski said...

@Sam and 15% Steppe Indo-Europeans?

Bob Floy said...

@Sam

That's more or less what I thought, thanks. More than half CHG.

@Andre

I think Armenians have more ANF than Georgians, speaking of the caucuses in general. Modern Armenians, that is.

Chechens are really interesting, to me they basically look like Georgians with more steppe.

Samuel Andrews said...

@Andre,

This link has ancient ancestry estimates I made for West Eurasia. You should save it somewhere.
https://docs.google.com/spreadsheets/d/1LPWAEC3dbAEDu8aBAAcxIOa5CQjuflt0f0cvhCpZ_ME/edit#gid=2101783313

Anatolian ancestry is much bigger in (southern) Europe than anywhere in the Middle East. Anatolian ancestry does not reach above 30% in the Middle East outside of Anatolia (turkey).

Georgians, Abhkasians, some North Caucasians are basically a continuation of the Neolithic Caucasus. 50-60% CHG, 30% Anatolia, 10-20% other stuff (mostly IranN, some Steppe).

Anatolian admix in Iran & Saudi Arabia is very low. This makes sense because Neolithic Anatolians basically took over Europe. While, when they moved into new land in the Middle East it was a different story.

Bob Floy said...

@Sam

Thanks for that, but am I reading this right? Northern ethnic groups like the Irish, Scots, Norwegians, etc., have less than 1% CHG? Or does it not show up in that column because it's part of the "Yamnaya"?

Samuel Andrews said...

CHG is in the Yamnaya.

Drago said...

@ Sam

'' it has nothing in particular to do with British being afraid of associating with Germans after WW2 but more to do with post-WW2 worldview many historians have which is a reaction to Nazis, racism, etc.
So, left-leaning historians do their best to downplay 'nationalism' (or simply preference for one's ethnic group/tribe/family) in history.''

Sure, there will be in-discipline debates between approaches, and these might relate to broader ideological backgrounds. Some of the anti-aDNA statements have put forth points which are mostly in the theoretical realm. Otherwise, there is nothing wrong with some classically 'Left wing'' arguement - especially social progress and a balance on unfetered capitalism. What better my bind a people if they (truly) have some collective ownership of their land ?

However, I was more referring to some of the currents in the general education system. Little European history is being taught, as there are some attempts to minimise Stem-type subjects.
Relating back to the original point, the has resulted in the average person in Slovakia knowing more about the Anglo-Saxon migration than the average person in UK, Australia or New Zealand.

Simon_W said...

In my opinion, the view that's favoured by current leftist/social liberals is that anyone can belong to any people, DNA and ancestry don't matter. Once you're naturalised, your old ancestry no longer matters and you're part of the new club. Seen that way it doesn't make any sense saying that I'm 1/4 North Italian, because it were my great-great-grandparents who left the newly founded Italy in the 19th century for Switzerland. That there's still quite a lot of foreign blood involved is completely overseen, because it's all human and "we're all the same". Many people with migration background embrace this view and hate being asked about their "true origins", which they consider to be a racialist question. Others however are proud of their diverse exotic roots and like sharing what they know about it. There's no consensus how to deal with this matter.

At any rate in the Swiss highschool I didn't hear anything about the Germanic, Alemannic migration to Switzerland either. I don't think it's because of leftist indoctrination, to the contrary, I rather guess it's because the Celtic Helvetii and their socii are a better projection surface for Swiss nationalist feelings, because they are common to both the French Swiss and the German Swiss, and they help setting ourself apart from Germany.

Simon_W said...

@Matt

"Re Breton samples, one thing I'd note is that the samples in G25 actually have quite a large spread in G25: https://imgur.com/a/V3cBjvw"

Makes sense, considering that Brittany has always been divided into a Breton speaking western half and a Gallo speaking eastern half. Despite its name, Gallo is a Romance, French-related dialect.

Matt said...

@Simon_W, yeah that's an interesting note, note that the paper which J.S. references above shows a split between the three Breton speaking provinces, and Ile-et-Vilaine which falls under Pays-Gallo (as wiki describes the linguistic geography - https://en.wikipedia.org/wiki/Gallo_language#/media/File:Pays_Gallo.svg).

ADMIXTURE results - https://media.nature.com/original/nature-assets/ejhg/journal/v23/n6/extref/ejhg2014175x3.jpg

Intra-NW France PCA - https://media.nature.com/original/nature-assets/ejhg/journal/v23/n6/extref/ejhg2014175x4.jpg (note position of Ile-et-Vilaine centroid vs other Bretagne regions)

With Europe PCA - https://media.nature.com/original/nature-assets/ejhg/journal/v23/n6/extref/ejhg2014175x6.jpg and https://media.nature.com/original/nature-assets/ejhg/journal/v23/n6/extref/ejhg2014175x7.jpg (unfortunately does not narrow down the Western France subregions of DESIR-Rep)

Definitions of provinces - https://media.nature.com/original/nature-assets/ejhg/journal/v23/n6/extref/ejhg2014175x1.jpg

Though this paper defines Brittany as excluding some other parts of Western France that would be included in wiki's main article's definition: https://en.wikipedia.org/wiki/Brittany.

It might be interesting to know which of the samples in G25 are from which subregions of Brittany/NW France - I'd imagine the samples close to matching Welsh/Irish are probably from the westernmost (and most Breton) subregions.

But I would guess the actual subregions will probably be found somewhere deep in latitude and longitude scores within the humanorigins panel accompanying sample description file.

Nezih Seven said...

I created a model with Global 25 mainly for the peoples of Anatolia, South Caucasus, Iran and Mesopotamia but it works well for Balkans, North Caucasus, some parts of Central Asia and Levant too. The article is in Turkish, but the images of the results are not:

https://nezihseven.wordpress.com/2019/07/17/antik-dna-analizi/

Alexandros said...

Quick question. How do you make 'CORRELATION OF ADMIXTURE POPULATIONS' appear at the end of the output?

From the screenshot above, it seems as if it is a default setting, but my nmonte3 analyses do not show this. I guess it is important for determining overfitting in the model.

Davidski said...

You'll see the 'CORRELATION OF ADMIXTURE POPULATIONS' at the end of the output in nMonte, but not nMonte3.

Alexandros said...

Great, thanks! I' ll check it there.

ancient dna said...

Davidski, whats the meaning of the _o, _o1, _o2 in sample names? thanks!

Davidski said...

The _o suffix stands for "outlier".

So, Sintashta_MLBA_o1 means Sintashta_MLBA_outlier1.

Simon_W said...

Speaking of Switzerland, I just noticed that there are now averages for all three major Swiss ethnicities available in the Global 25. So I developped a model that should make sense for them all and checked how differently they score in that model.

First of all I noticed that the Celtic component appears to be rather like French_South than like Hallstatt_Bylany:DA111. Which does make kind of sense, because Switzerland lies Southwest of Bohemia. But I didn't want to use the modern Southern French in my models, hence I decided to make my own average of French Bell Beakers, using all French Bell Beakers except the two from northern France. Which worked pretty well, as you'll see below.

But then I also had to choose a proxy for the Roman admixture. I decided to use CL121 from Collegno, because he's from Italy, he's South Italian/Sicilian-like and he's without Longobard admixture.

So now for the models, first the French Swiss:

[1] "distance%=1.586"

Swiss_French

Bell_Beaker_FRA,52.8
DEU_MA,24.5
ITA_Collegno_MA:CL121,22.7
CZE_Hallstatt_Bylany:DA111,0

More than half of their ancestry is Celtic/Gaulish. But nearly 1/4 of their ancestry is Germanic. Probably rather Burgundian than Alemannic. A considerable Roman admixture is also apparent. No wonder they call themselves "Romands", i.e. Romans!

But now on to the German Swiss:

[1] "distance%=1.1823"

Swiss_German

DEU_MA,48.2
Bell_Beaker_FRA,33.7
ITA_Collegno_MA:CL121,18.1
CZE_Hallstatt_Bylany:DA111,0

They are nearly 50% Germanic, makes sense, because they speak German. The rest is a Roman admixed Gaulish Substrate, the ratio Gaulish:Roman is very similar to the ratio in the French Swiss. Interestingly they also lack Hallstatt_Bylany, like the Romands.

And finally the Italian Swiss:

[1] "distance%=2.5274"

Swiss_Italian

ITA_Collegno_MA:CL121,56.6
Bell_Beaker_FRA,28.4
DEU_MA,15
CZE_Hallstatt_Bylany:DA111,0

Here CL121 has more than 50%, the Germanic admixture is low and presumably mostly from the Longobards. Gaulish-like ancestry comes second. This suggests once more how big the upheavals even at the northern fringe of Italy were during the Roman age. The Italian Swiss and the North Italians are not simply a continuation of the local Celts, but considerably Mediterraneanized and Romanized also on the genetic level.

Samuel Andrews said...

@Simon_W,

Yes, Switzerland was missing piece in G25 PCA. The Italian Swiss cluster with Tuscans so they look like immigrants from central Italy? Like your grandpa was.

Maybe you would want to try modelling German & French Swiss with Frecnh_1. Its the main cluster in France.
FrenchCluster1 0.126831429 0.142174143 0.044338714 0.013981286 0.041458143 0.004661429 -0.002047857 0.002736143 0.011015143 0.022935857 -0.003270714 0.004774286 -0.009896571 -0.007824857 0.009151429 0.000473571 -0.002868429 0.001212429 0.001598143 0.000643 -0.000802143 0.001095286 -0.003556571 0.006506857 -0.000188286

zardos said...

@Simon and all:
Anyone tried something similar with German local populations?
Very interesting if future studies will prove the Roman impact, not just in Switzerland.

Simon_W said...

@zardos

Roman impact in Germany is very possible west of the Rhine and south of the Danube, the parts of Germany that belonged to the Roman empire for an extended while.

In fact, my maternal grandmother, whose ancestry is 3/4 from Swabia in southwestern Germany and 1/4 from Northwestern Switzerland, scores like this in my model:

[1] "distance%=1.1557"

maternal_grandmother

DEU_MA,53.4
ITA_Collegno_MA:CL121,22.9
Bell_Beaker_FRA,22.7
Hun_Tian_Shan,1
CZE_Hallstatt_Bylany:DA111,0

Very similar to the German Swiss in the amount of Germanic admixture, just particularly Italian in relation to the Celtic proportion. I suspect it's because of her ancestry from Biberach in Upper Swabia, south of the Danube; the relatives from that branch look rather exotic and southern.

Simon_W said...

@Samuel Andrews

I tried it, but apparently there is missing a value in row 5:

FrenchCluster1 0.126831429 0.142174143 0.044338714 0.013981286 0.041458143 0.004661429 -0.002047857 0.002736143 0.011015143 0.022935857 -0.003270714 0.004774286 -0.009896571 -0.007824857 0.009151429 0.000473571 -0.002868429 0.001212429 0.001598143 0.000643 -0.000802143 0.001095286 -0.003556571 0.006506857 -0.000188286
PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15
FrenchCluster1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA
PC16 PC17 PC18 PC19 PC20 PC21 PC22 PC23 PC24 PC25
FrenchCluster1 NA NA NA NA NA NA NA NA NA NA
Fehler in check_formats(myData, myTarget) : Missing value in row 5
:-/

Simon_W said...

Pushing my model further, Bergamo looks similar to the Italian Swiss:

[1] "distance%=1.9968"

Italian_Bergamo

ITA_Collegno_MA:CL121,49.2
Bell_Beaker_FRA,40.5
DEU_MA,5.5
CZE_Hallstatt_Bylany:DA111,4.8

Simon_W said...

Does anyone know where the French_East come from? Judging from my model they could be German speaking Alsatians:

[1] "distance%=1.1655"

French_East

DEU_MA,49
Bell_Beaker_FRA,36.1
ITA_Collegno_MA:CL121,14.9
CZE_Hallstatt_Bylany:DA111,0

zardos said...

Thank you. If your model is correct, it would mean about 50 old German in the South with about one quarter Southern, possibly to a large portion of it real Roman ancestry.
How about Rhine land and the North?
Do you have early Slavs for comparison?

Simon_W said...

I don't have regional German samples, I'm not a collector of such things. And what do you mean with early Slavs for comparison? How they are mixed? Or how they are mixed into the Germans?

Simon_W said...

BTW, I said French_East could be German speaking Alsatians, although we all know they're predominantly French speaking by now. More correct would be the wording: They could be French speaking Alsatians who used to be German speaking until a few generations ago. At any rate this sample looks similar to the German Swiss, just a bit less Roman.

Simon_W said...

Oops I just saw why I got that failure report when trying modelling with the French cluster 1! I have to put it into comma separated format. Wait a minute!

Simon_W said...

[1] "distance%=1.4474"

Swiss_French

FrenchCluster1,46.7
Bell_Beaker_FRA,32.1
ITA_Collegno_MA:CL121,11.9
DEU_MA,9.3
CZE_Hallstatt_Bylany:DA111,0

[1] "distance%=1.0016"

Swiss_German

FrenchCluster1,45.1
DEU_MA,33.5
Bell_Beaker_FRA,13.8
ITA_Collegno_MA:CL121,7.6
CZE_Hallstatt_Bylany:DA111,0

[1] "distance%=2.3308"

Swiss_Italian

FrenchCluster1,53.8
ITA_Collegno_MA:CL121,44.2
Bell_Beaker_FRA,2
CZE_Hallstatt_Bylany:DA111,0
DEU_MA,0

Looks like all Swiss ethnic groups alike can be modelled as roughly 50% of the French cluster 1. However, I don't think this is a useful modelling as long as we've got decent ancient samples at hand. Because the French cluster 1 is a modern cluster of mixed origin, hence it rather hides the ancient origins than uncovering them.

zardos said...

I meant how much early Slavic influence can be seen in German subpopulations.

Simon_W said...

@zardos

The Slavic admixture in the East German subpopulation available in the Global25 sheet seems considerable:

[1] "distance%=1.7412"

German_East

CZE_Early_Slav,44.6
DEU_MA,40.5
CZE_Hallstatt_Bylany:DA111,14.9
Bell_Beaker_FRA,0
ITA_Collegno_MA:CL121,0

However, if I apply the same model on the other non-Eastern German sample, I obtain an overfit:

[1] "distance%=1.0809"

German

DEU_MA,56
CZE_Early_Slav,20.5
Bell_Beaker_FRA,20.1
ITA_Collegno_MA:CL121,3.4
CZE_Hallstatt_Bylany:DA111,0

The French Beakers and the early Bohemian Slavs are abused here to adjust the coords as closesly to the German sample as possible, even though historically speaking Slavic admixture West of the Elbe and Saale must be very scant and limited to a few small areas. The fit of the model is too good.

I can't deal with this otherwise than by deleting the early Slavs from the model, which results in

[1] "distance%=1.5133"

German

DEU_MA,75.9
Bell_Beaker_FRA,20.1
ITA_Collegno_MA:CL121,4
CZE_Hallstatt_Bylany:DA111,0

Judging from this non-Eastern Germans are predominantly Germanic, with some (regionally varying) Celtic infusion.

MasterOfAnimals said...

David
Please add the Copts of egypt.
thanks

WesternPonticSteppe said...

Why some paleo samples aren't in the G25 dataset (Oase1, Satsurblia, KremsWA3, Ostuni1)?

Davidski said...

They're too old, heavily damaged, and/or they lack enough data.

You can't really analyze deep ancestry that's far out of the range of modern humans with this sort of methodology.

Puree said...

Would it be possible to include the date in the names of your updated datasheets so that users may know if they already have obtained the update?

Puree said...

@Davidski Your post of Dec 15, 2019 raises a question in my mind: how many SNPs are enough to consider a sample sufficient for G25-style use? On this point could you please explain the terms 'coverage' and 'endognous' when used to describe ancient samples? If this is answered elsewhere I haven't yet found the place....

Davidski said...

The Global25 is based on ~300,000 SNPs. I generally only run samples that have at least 15% of these SNPs.

Lower coverage samples, in other words those with fewer markers, aren't included, or sometimes they are but they're marked with the "low_res" suffix.

CrM said...

Do the "Ossetian" samples represent South Ossetians?

Davidski said...

@AuckeS

Don't know. See here...

https://www.nature.com/articles/s41559-019-0878-2

CrM said...

@Davidski

Thanks. One more question, do the Georgian samples come from the same study?

Davidski said...

You can probably track them down via their individual IDs.

Unknown said...

@Davidski

I've always had this question is it better to use pop average spreadsheets or the full datasheets?

Davidski said...

In theory, population averages are more robust than singleton results.

However, in reality many of the population averages aren't representative enough to be useful, especially when it comes to large countries with significant genetic substructures.

So the best thing to do in many cases is to create your own population averages from the most relevant samples.

Unknown said...

How will I be able to get my G25 coordinates? Do I have to email you?

Samuel Andrews said...

There are two Armenian pops in G25 PCA. Armenian_Hemsheni and Armenian. The latter is distinguished by large dose of Levant ancestry not present in the former. Is "Armenian" a disparso population living in Levant?

E-Smoove said...

Im new to this, can anybody tell me what's the difference between our two sets of coordinate...the scaled one and the regular one...??

Which one should I use as target, when doing comparaisons...??

Davidski said...

@E-Smoove

Scaled coordinates are optimized to correlate with analyses straight from raw data, that's why, for instance, it's possible to create PCA plots with them that look like PCA plots from genetics papers, or even better.

But the Global25 is all about experimentation, so why not experiment with both sets of coordinates to see how they line up with your known ancestry?

However, don't ever run your scaled coordinates with the non-scaled reference coordinates, or your non-scaled coordinates with the scaled reference coordinates, because you'll get nonsensical results.

E-Smoove said...

What's up
Do you know where I could find the populations abbreviations at??

For instance which population represent ZAF_400BP??SWE_LN_low_res??IRN_Ganj_Dareh_N??

Davidski said...

@E-Smoove

From the blog post above...

Each sample has a population code and an individual code. The population codes represent the countries, ethnic groups and/or archeological affinities of the samples, and I often modify these codes to suit my needs. On the other hand, the individual codes are unique to most of the samples and I usually don't change them.

So if you'd like to know more details about the samples try searching for their individual codes via a decent online search engine. Basic information about many of the samples is also available in the "anno" files here.

https://reich.hms.harvard.edu/downloadable-genotypes-worlds-published-ancient-dna-data


Anonymous said...

@Davidski

are you still accepting g25 coordinate requests?
I would love to purchase mine.

Davidski said...

@Frequency

Yes I am. You'll find the instructions here...

https://eurogenes.blogspot.com/2017/10/genetic-ancestry-online-store-to-be.html

Alfred said...

@Davidski are you still accepting g25 request ? I’m new and would like to purshase mine thanks

Davidski said...

Starting up again on February 1.

Alfred said...

Thank you for your prompt reply @davidski looking forward

ancient dna said...

Hi David - assuming you still going to accept G25 requests starting tomorrow? I can pay and send money today, do you have any preference over a myheritage or livingdna file?

Thanks,

Mathew.

Davidski said...

MyHeritage files are better than LivingDNA for this.

Alfred said...

@Davidski can i send you the request for the G25 today on your email ? thanks in advanced

Anonymous said...

@Davidski I know this will sound very paranoid, but I wanted to ask what you do with the raw data of someone after giving them the coordinates?
Do you just delete it? What about their coordinates? I am really interested in G25, but also really concerned about my privacy.
Also, how good is 23andme V5 data for this, compared to other sorts of raw data?

I would greatly appreciate it if you could answer my questions.

Davidski said...

I delete the data and results.

23andMe V5 works OK.

Anonymous said...

Thanks a lot for the quick response.

Josh Grammenos said...

Good evening Mr. Wesolowski,
I’ve used your services before for me and friends of mine and i’m really satisfied with the result, i greatly appreciate your good work and your Global 25 calculator.
I know you use your modern average references from Academic studies, and i found the following Academic study which has really interesting subgroups and subpopulations.
Probably it could be useful for you and your calculator, if you contact the people who made it, they could provide you with the samples they used so you can enlarge your database.
Keep up the good work, have a good day.

https://www.nature.com/articles/s41598-017-01802-4

Maestro said...

Hi, Mr. Davidski. First, I want to say I appreciate the toolkit setup you have provided an amateur like me to explore stuff on G25. But to my main question, are you going to upload the coordinates of the Christian Nubian samples from the newly published Kulubnarti paper?

Davidski said...

Sure, do you have a link to the genotype data for these Nubian ancients?

Ayhan said...

I have had a dna test at ftdna my origin. I would like to see my results in the G25. I would be glad if you could help me.

Davidski said...

Please email me to discuss this further.

zeza said...

Hallo sir

One of my friends told me to convert my raw dna files to g25,
so i search on web and i came across a website called Dnagenics.
I ended up paying 8 euros to do G25 but instead they give me a broken link to get my cordites .

PLEASE HELP ME

I really want to do this i am not quit satisficed with my results

Davidski said...

@zeza

I'm sorry to hear about your predicament, but you need to pay more attention to what you're paying for online.

If you email me in July, we can discuss this further.

eurogenesblog@gmail.com

zeza said...

@Davidski

Sure thing, will email in july .

Thank you

Aaa said...

Why is Turkish Trabzon closer to Armenian than to Georgian Laz when Trabzon formed a part of colchis and are known as Hellenized Laz people? Armenians never inhabited Trabzon as well. Something similar is present in the mdlkp 23b calculator where Armenian is much closer to Artvin Laz than Georgian Laz is. No need to mention that the Laz speaking people from Artvin live literally right next to Georgian Lazes. So considering these, would it be right to assume that there is a need for a better calculator for Caucasus populations?

Onur Dincer said...

@Aaa

Why is Turkish Trabzon closer to Armenian than to Georgian Laz when Trabzon formed a part of colchis and are known as Hellenized Laz people? Armenians never inhabited Trabzon as well. Something similar is present in the mdlkp 23b calculator where Armenian is much closer to Artvin Laz than Georgian Laz is. No need to mention that the Laz speaking people from Artvin live literally right next to Georgian Lazes. So considering these, would it be right to assume that there is a need for a better calculator for Caucasus populations?

Greek Trabzon and Turkish Trabzon, who are basically the same population genetically speaking, are indeed genetically closer to Armenians than to Laz, so they are not basically Hellenized Laz, they are a mixture of ancient Anatolian, Armenian Highlander and Colchian populations, not to mention whatever mix they have from the Greek colonists. Laz, on the other hand, are genetically between Western Georgians and Trabzon people, normal given their geography.

throne said...

hey david i was wondering if a contaminated sample should be avoided when trying to find an accurate fit for a target population

Davidski said...

If possible, try not to use contaminated samples for anything.

Cynthia=== said...

Are we still able to order Global25 kits? I took the Ancestry kit but I really want to learn more about everything.

Chemx said...

I've seen there's a website offering G25 coordinates called "Illustrativedna" and I've read they work with you... Is that so? Do you know if they're legit?

Davidski said...

Yes, they're legit.

CrM said...

David, could you add the Georgian samples from the Behar et al. 2010 paper?
The raw data can be found here: https://evolbio.ut.ee/jew/

Sadly the samples aren't regionally categorized, they seem to be very diverse judging by their K12 models, so they should be from all over Georgia.

Ramber said...

@Davidski

I want to ask something. G25 distance runs shows many Uralics such as Mari, Udmurt, Saami, etc to be genetically closer to many Turkics/Central Asians than to most Euros. Is this true?:

Distance to: Mari
0.11547308 Bashkir
0.13857526 Tatar_Siberian
0.14622919 Tatar_Siberian_Zabolotniye
0.15879922 Mansi
0.16609517 Turkmen
0.17229907 Uzbek
0.17348576 Khanty
0.17705239 Finnish_East
0.17783813 Nogai
0.18422090 Iran_Turkmen
0.19255765 Finnish
0.19535284 Tajik_Shugnan
0.19586713 Hazara_Afghanistan
0.19923069 Uygur
0.19996889 Karakalpak
0.20260554 Tajik_Ishkashim
0.20344348 Hazara
0.21081083 Russian_Tver
0.21182997 Tubalar
0.21232139 Tajik_Yagnobi
0.21763448 Estonian
0.22557314 Russian_Smolensk
0.22981453 Polish
0.23081106 Hungarian
0.23128921 Latvian
0.23172467 Swedish
0.23796045 Icelandic
0.23996963 Bosnian
0.24026907 German
0.24118352 Irish
0.24241418 English
0.24252065 Romanian
0.26465765 Italian_Piedmont

Distance to: Udmurt
0.09797134 Bashkir
0.12750545 Finnish_East
0.13114207 Turkmen
0.13310796 Tatar_Siberian
0.14089777 Tajik_Shugnan
0.14179315 Finnish
0.14393915 Iran_Turkmen
0.15093452 Uzbek
0.15098617 Tajik_Ishkashim
0.15155844 Tatar_Siberian_Zabolotniye
0.15961422 Tajik_Yagnobi
0.15979505 Russian_Tver
0.17017327 Estonian
0.17152629 Mansi
0.17241783 Nogai
0.17796271 Russian_Smolensk
0.18020743 Swedish
0.18059071 Polish
0.18103482 Hungarian
0.18369906 Latvian
0.18382164 Hazara_Afghanistan
0.18579149 Khanty
0.18647838 Icelandic
0.18890984 Uygur
0.18894553 Irish
0.19010287 German
0.19029509 Bosnian
0.19164537 English
0.19445041 Hazara
0.19510247 Romanian
0.19868197 Karakalpak
0.21214030 Tubalar
0.22032703 Italian_Piedmont

Distance to: Saami
0.11182593 Bashkir
0.12036459 Finnish_East
0.13826217 Finnish
0.14216068 Tatar_Siberian
0.15607780 Tatar_Siberian_Zabolotniye
0.16075159 Russian_Tver
0.16336793 Turkmen
0.16632238 Estonian
0.17367742 Uzbek
0.17400811 Mansi
0.17827602 Tajik_Shugnan
0.17868672 Iran_Turkmen
0.17931411 Latvian
0.17952044 Russian_Smolensk
0.18157993 Nogai
0.18279075 Swedish
0.18381285 Polish
0.18837516 Tajik_Ishkashim
0.18848438 Khanty
0.18949914 Hungarian
0.19039592 Icelandic
0.19534091 Irish
0.19658845 Tajik_Yagnobi
0.19719818 German
0.19782542 English
0.20117721 Bosnian
0.20194140 Hazara_Afghanistan
0.20432707 Uygur
0.20636918 Karakalpak
0.20953171 Romanian
0.21102027 Hazara
0.21653079 Tubalar
0.23687648 Italian_Piedmont

Most East Eurasian-shifted Udmurt individual
Distance to: Udmurt:udmurd8
0.08187760 Bashkir
0.11499101 Tatar_Siberian
0.12768661 Turkmen
0.13030726 Tatar_Siberian_Zabolotniye
0.14008865 Uzbek
0.14503560 Iran_Turkmen
0.15002433 Finnish_East
0.15051991 Mansi
0.15082985 Tajik_Shugnan
0.15733351 Nogai
0.15932898 Tajik_Ishkashim
0.16412459 Finnish
0.16433979 Khanty
0.16977493 Hazara_Afghanistan
0.17150233 Tajik_Yagnobi
0.17470455 Uygur
0.17977632 Hazara
0.18225833 Karakalpak
0.18263746 Russian_Tver
0.19334510 Estonian
0.19378021 Tubalar
0.20046634 Russian_Smolensk
0.20108449 Swedish
0.20161450 Hungarian
0.20302204 Polish
0.20680134 Icelandic
0.20715504 Latvian
0.20848813 Irish
0.21031218 German
0.21038218 Bosnian
0.21127078 English
0.21365826 Romanian
0.23643305 Italian_Piedmont

Just wondering if these Finno-Ugrics are really genetically closer to many Turkics/Central Asians than to most Euros?

CHG Chad said...

David when the store will be open?

zeza said...

Hi
please check your email- how can i do g25 test ?
david please help me on this
thanks

Davidski said...

I'll reply to all emails asking for the G25 before the end of this month.

MistH028 said...
This comment has been removed by the author.
Davidski said...

@All

The Global25 is available again.

https://eurogenes.blogspot.com/2017/10/genetic-ancestry-online-store-to-be.html

Please email me on eurogenesblog@gmail.com

CHG Chad said...

For some reason i can't donate you on Paypal.It dosn't allow me to send you money on USD.

Davidski said...

Yep, I don't use PayPal, so people have to email me to get the details about how to pay.

Puree said...

To G25 Modern Users: I am seeking to identify the gender of a group of samples in the DB. I have tried searching the sample names as they appear on the samples, but am not getting any results. I have done this many times with the Ancient samples. This is my first attempt with Modern samples. Is there a way of locating some metadata on the Modern samples (such as Gender, location, etc)? Thanks.

Davidski said...

@Puree

It's possible to find metadata on many of the modern samples by putting their individual codes into Google.

However, many of the modern samples are sourced from rather obscure collections and their details aren't available online or anywhere.

Puree said...

I've noticed that most scientific studies use components 1 and 2 for their scatter charts. In an array of 25 components there are many other pairs, and some pairs can produce vastly different scatter charts. This fact raises various questions in my mind: Is there a valid scientific reason why most researchers use 1 and 2? Is there any informational value a researcher can get by using other pairs of components in their modeling? Does a Vahaduo modeling examine all the pairs, or only some, or one? The kernel of my question is this: if I look at a scatter chart which plots, say, component 6 and component 20, what aspect of the underlying autosomal data am I looking at versus a component 1 and 2 chart? Are there any good reference guides available to me on this question, in the context of autosomal data? Thanks.

Davidski said...

The reason that Principal Components 1 and 2 are usually used to build scatter plots is because they carry most of the variation.

But even though they carry most variation, they only carry fractions of total variation.

So sometimes it's useful to look at the other significant PCs, like 3, 4 or 5, because they can reveal relationships that aren't obvious or even shown by plotting PCs 1 and 2.

And if you're modeling ancestry, then obviously you need more than two PCs to do that accurately.

CeRcVa said...

In general Georgians look like Europeans(balkan-southern).

But Armenians and Georgians differ in both culture and appearance. We are not like each other as Wales and England or European countries in general.

Gedrosia said...

IRN_Seh_Gabi_LN,0.05122,0.081242,-0.153488,-0.021964,-0.101865,0.006414,0.014336,0.003231,-0.068925,-0.045559,0,0.00045,0.012636,-0.008533,0.02348,0.059665,0.001956,0.008615,0.008296,-0.034266,0.012478,-0.020897,-0.00456,-0.024823,0.01916

IRN_Tepe_Abdul_Hosein_N,0.0421147,0.0639783,-0.1541163,0.0010767,-0.119407,0.0212883,0.0104187,-0.0023077,-0.0809913,-0.057283,-0.002436,-0.002148,0.0040137,-0.0096337,0.0314873,0.0555107,-0.0067363,0.0061233,0.0131143,-0.0351837,0.0062807,-0.0250607,-0.0085453,-0.0374753,0.019958

What are the object ID's for these above sampels? for example I mean for Seh Gabi is it "I1671" or "I1674" or ...etc

Davidski said...

https://drive.google.com/open?id=1UrhcfNMLW0oMXIbHGUE60v2taCM7PFw1

Gedrosia said...

@Davidski, Many Thanks for providing the original reference sources, I have put my inquiry because many references ancient calculators on Vahaduo doesn’t put these “G25 coordinates” with the “object ID’s”, so it was confusing to use them in a solid proof studies purpose…

Simon_W said...

The "reduce" function on vahaduo is a very useful improvement. Previously we either had to pre-select a small set of samples to be used, which is prone to human error and prejudice, or live with an insanely overfitted, nonsensical model.

I'll post some models of my close ancestors and myself, which I obtained using the "reduce" function.
First of all, I removed all samples older than the Bronze Age, because I wasn't caring about the deep ancestry, but about the more recent one.

My East Prussian German grandmother, fully from the northern Ermland/northern Warmia:

Distance: 2.1428% / R4P

30.8 VK2020_Scotland_Orkney_VA
29.6 VK2020_UKR_Shestovitsa_VA
26.8 Baltic_LTU_Late_Antiquity_low_res
12.8 ITA_Boville_Ernica_IA

I had to remove VK2020_SWE_Oland_IA for this model, because it tended to mask the Balto-Slavic admixture. Apparently this pop is somewhat Balto-Slavic shifted. Thus, my East Prussian grandmother seems to be about 50% Balto-Slavic, with Baltic and Slavic admixture in nearly equal quantities. Boville Ernica can't mean real Italian admixture; instead it must represent some ancient Celtic ancestry that is inbetween the northwest Europeans and Boville Ernica.

My father, with one half from the above East Prussian grandmother, and the other half mostly from the Hotzenwald region near the southwestern border of Germany, with some minor northwestern Swiss and central Black Forest admixture:

Distance: 1.8840% / R3P

68.0 VK2020_Faroes_EM
26.8 VK2020_POL_Sandomierz_VA
5.2 Levant_Yehud_IBA

The Balto-Slavic admixture of his mother is halved here, as expected; and tilted over towards the Slavic side. Surprising the Levant_Yehud admixture.

My maternal grandmother, of Lower Swabian, Upper Swabian, Alsatian and northwestern Swiss extraction, 1/4 each:

Distance: 0.5685% / R4P

37.4 Scotland_LBA
25.2 VK2020_ITA_Foggia_MA
21.8 ISL_Viking_Age_Early_Christian
15.6 ITA_Boville_Ernica_IA

She definitely seems to have some ancient central-southern Italian admixture, quite substantially, judging from that Foggia_MA. Again there is some Boville Ernica which I suspect to reflect some unsampled Celtic population rather than ancient Italic admixture.



Simon_W said...

Last, but not least, a model for my own ancestry, which is composed of the above ancestors + 1/4 North Italian ancestry from the province of Forli-Cesena:

Distance: 1.2714% / R5P

44.8 VK2020_Faroes_EM
20.4 FRA_Hauts_De_France_IA2
16.6 ITA_Etruscan_Campiglia
10.8 Levant_Yehud_IBA
7.4 Baltic_LTU_Late_Antiquity_low_res

Again I had to remove VK2020_SWE_Oland_IA and also SVK_Poprad_MA in order for my Balto-Slavic admixture not to be masked. It's back on the Baltic side. Else, the Gaulish Celtic ancestry is quite strong in this model, and it is hardly from my Italian grandfather (from who I've already got the Etruscan and most of the Levantine admixture). This confirms what I said above: The Boville Ernica in the above models exerts a southwards pull reflecting some unsampled Celtic populations.

For comparison, this is what I obtained using modern pops only:

Distance: 1.5030% / R5P

43.0 English_Cornwall
20.4 Spanish_Pais_Vasco
17.0 German_East
11.2 Samaritan
8.4 Polish_Kashubian

Quite similar overall.

a Gorilla said...

Will there ever be a Turk Cypriot G25 average?

Alexandros said...

@Varkoume Ipervolika

I do not believe there any published Turkish Cypriot autosomal DNA samples. Unless David has any samples directly sent to him, this does not seem possible.

Anyhow, based on Y-DNA evidence (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0179474), Turkish Cypriots are simply a more (recently) admixed version of Greek Cypriots, therefore I would not consider them as of much relevance as a reference population.

If you would like to discuss further Cypriot population genetics, you can contact me directly.

Alexandros said...

@Varkoume Ipervolika
@Why are you gae
You can contact me at cypriotgenes@gmail.com
Happy to discuss your Global 25 models or anything else related.

Davidski said...

@All

Please note that the G25 coords won't be available until July 2022.

Updates here...

https://bga101.blogspot.com/2021/08/genetic-ancestry-online-store.html

a Gorilla said...

@Davidski the Cypriot samples have Maronite Cypriots who have more Neolithic Levant admix than actual Lebanese people. They are only a small minority of around 5,000 people. Why is it not separated into Greek Cypriot and Maronite Cypriot? It makes no sense to use Maronite Cypriot to represent Greek Cypriots. People are using that Cypriot average and make misleading assumptions.

discreetmaverick said...

Hi,

Can I know

What is laballed as ROU_Glavanesti_o1, is this one that was found to be under R1a - Z93 or Z94? Thanks.

Mar said...

Are you doing g25 coordinates since it's now July?

Davidski said...

Yes

Ethan Matthews said...

Hello Davidski,

I was wondering if you could add the scaled G25 coordinates from sample R3481? https://www.biorxiv.org/content/10.1101/2022.05.15.491973v1.full

Fazan2022 said...

Hello Mr.Davidski
Please guide me how can i get G25 coordinates of Ancient Iraq Samples ?
I have Nermik9 PPN
Nermik9 LBA
But i need others new Ancient from north Iraq .
Samples :
J1
J2
G
Thanks

Davidski said...

There's not enough data in the samples that are missing from the G25 datasheets to run them.

We have to wait for new samples from Iraq, or wait for the current samples to be sequenced again.

Simon_W said...

Urbino_Bivio works remarkably good for me:

Target: Simon_W
Distance: 1.7649% / 0.01764923

55.6 father
25.2 maternal grandmother
19.2 ITA_Urbino_Bivio_Imperial

As I said before, the grandparents of my maternal grandfather were from Cesena and surrounds, which is not far from Urbino. Obviously in this model my other relatives compensate the lack of Medieval northern admixture in these ancient Roman age samples from Urbino. My IBD sharing with my maternal grandmother being 24.3%, there must be 25.7-19.2 = 6.5% northern admixture from my Italian ancestors. That's comparatively little, compared to other north/central Italian regions.

Also, it seems to follow that all the previous models (ADMIXTURE and G25) that seemed to suggest that I have an unusual Italian ancestry with strangely low Caucasus/Anatolia and elevated Near Eastern components, were wrong, because the samples from Urbino_Bivio are not like this.

Maestro said...

HI, Davidski. I'm wondering if you can get the coordinates for the Kadruka sample from the new Nature article:

https://www.nature.com/articles/s41598-022-25384-y

Would be an awesome addition.

Davidski said...

Do you have the link to the genotype data?

Maestro said...

I wouldn't know where to check, to be honest. Is it not available or something?

I sent a message to one of the authors. Hopefully some I will receive positive information, if any at all.

Maestro said...

No response.

This exists, but don't know if there's anything useful there:

https://www.ebi.ac.uk/ena/browser/view/PRJEB53198?show=reads

Aram said...

Is Dnagenics G25 compatible with Your Eurogenes G25.

https://www.dnagenics.com/services/G25Coordinates#:~:text=The%20G25%20coordinates%20are%20a,relation%20to%20other%20world%20populations.

Davidski said...

@Aram

Nope.

They try to simulate the G25 coords. So it's a different analysis.

marinella said...

Where I can find Byzantine_Mugla_Stratonikeia_1000-1200AD G25 coordinates ?

Davidski said...

The G25 coords for the samples from the new Olalde paper are here.

https://drive.google.com/file/d/1CmELrecqqmDDEzoRilqq6bRM1NJXaLdq/view?usp=sharing

But only the samples that have enough data for the G25 are listed.

I can't run the rest, because there's nothing to run.

Genes of the Ancients said...

I made a post on my blog about the new Balkan study
Attempting to improve the qpAdm models from the new study: A genetic history of the Balkans from Roman frontier to Slavic migrations
https://genes-of-the-ancients.blogspot.com/2023/12/attempting-to-improve-qpadm-models-from.html

antifa said...

Wang 2023 and Yu 2023 seem to be the same file. Can you fix it?

Jalisciense said...

@Davidski

This is my first time asking and doing all this, so Idk if it is enough for you to make the G25 coordinates of these samples please:


*Study: 16th century sub-Saharan African origin for all three individuals

Study data: https://www.ebi.ac.uk/ena/browser/view/PRJEB37490



*Study: Aridoamerica and Mesoamerica of pre-Hispanic civilizations thrived between 2,500 BCE and 1,521 CE

Study data: https://www.ebi.ac.uk/ena/browser/view/PRJEB51440



*Study: 40 ancient northern Mexicans dating to 7400-200 years before present (BP).

Study data: https://www.ebi.ac.uk/ena/browser/view/PRJEB66319