search this blog

Wednesday, May 23, 2018

Global25 workshop 2: intra-European variation


Even though the Global25 focuses on world-wide human genetic diversity, it can also reveal a lot of information about genetic substructures within continental regions.

Several of the dimensions, for instance, reflect Balto-Slavic-specific genetic drift. I ensured that this would be the case by running a lot of Slavic groups in the analysis. A useful by-product of this strategy is that the Global25 is very good at exposing relatively recent intra-European genetic variation.

To see this for yourself, download the datasheet below and plug it into the PAST program, which is freely available here. Then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Ordination > Principal Components.

G25_Europe_scaled.dat

You should end up with the plot below. Note that to see the group labels and outlines, you need to tick the appropriate boxes in the panel to the right of the image. To improve the experience, it might also be useful to color-code different parts of Europe, and you can do that by choosing Edit > Row colors/symbols. Of course, if you have Global25 coordinates you can add yourself to the datasheet to see where you plot.


Components 1 and 2 pack the most information and, more or less, recapitulate the geographic structure of Europe. However, many details can only be seen by plotting the less significant components. For instance, a plot of components 1 and 3 almost perfectly separates Northeastern Europe into two distinct clusters made up of the speakers of Indo-European and Finno-Ugric languages.


This plot might also be useful for exploring potential Jewish ancestry, because Ashkenazi, Italian and Sephardi Jews appear to be relatively distinct in this space. Thus, people with significant European Jewish ancestry will "pull" towards the lower left corner of the plot. For example, someone who is half Ashkenazi and half German will probably land in the empty space between the Northwest Europeans and Jews.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 3: genes vs geography in Northern Europe

Global25 workshop 4: a neighbour joining tree

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)

17 comments:

Unknown said...

Very well done. In the first plot, the Irish seem to be too close to England_Anglo-Saxon - if I am reading the plot correctly. Are you sure that the England samples were Anglo-Saxon and not Celtic?

Davidski said...

They're definitely Anglo-Saxons, but I don't think they're all pure Anglo-Saxons straight from continental Europe.

Also, keep in mind that different PCA and PCs will show different things. This is where these Anglo-Saxons cluster compared to the Irish in an intra-North European PCA.

http://eurogenes.blogspot.com/2017/10/genetic-and-linguistic-structure-across.html

ph2ter said...

Your Austrian samples 1, 2, and 7 look suspicious. They cluster with English, Irish, Dutch, Belgian, and French samples.

Unknown said...

Maybe this is evidence that Austrians, English, Irish, Dutch, Belgian and French all share some degree of Celtic ancestry?

Davidski said...

@ph2ter

I've tested people from Austria who turned out surprisingly western and northwestern for their geography, so I'm not sure if those samples are outliers.

ph2ter said...

They are then a unique case in your European database.
It is impossible that they cluster with English samples.

Unknown said...

Why is it impossible that they cluster with English samples?

ph2ter said...

It is not impossible, but such samples are not representative.

AWood said...

Austria was the core Celtic zone, so I'm not sure why anyone would question the results.

ph2ter said...

Is there any scientific article about Austrains autosomally similar to British people?

Skordo said...

Would u say that this shows that mainland Greeks plot more Balkan than Cretans then due more to Albanian and Slavic input rather than Mycenaean elite due to how far left the Mycenaean plots?

Anonymous said...



Botai have two R1b1a1.


Matt said...

Introducing selected Bronze Age populations into this plot can be illustrative of position of averages for ancient cultural groups: https://imgur.com/a/9g4GpEu

Though as some of these are very sprawling and heterogeneous compared to present day populations, it's not so possible generally to see crisp clusters, and its a bit overstuffed.

dsjm1 said...

David,

Again thanks - am really learning a lot from your tutorials. Now in the process of loading the Global25.dat file for myself into these tables.
(puts me right in the Anglo-Saxon Cline where I imagined it would be).

Very instructive. Prior to this I really did not know what to make of these charts loaded with coloured dots (and I bet I was not alone) :)

Activating the group labels was a great help.

Keep up your good work - you are winning over PCA newbies like myself and am loving it.

DougM

dsjm1 said...

Just as a comment

On my screen with 1920 x 1080 res, these plots look very good. The res self-adjusts back to 1760 x 994 but that is fine.

Am feeling like a kid who has learned to tie his shoe-laces and ride his 1st bike all on the same day !

Doug

Matt said...

Following might be of interest to someone.

One trick you can do with the Global25 PCA data and these subset PCAs, is that after computing a subset PCA, you can then project other populations in the G25 onto it, based on their G25 data.

E.g. let's say you first do a subset PCA on modern Europeans - https://imgur.com/CkZC2Dw

You can then take the loadings from that - https://imgur.com/xEnJ0Yl

And finally use the loadings as a conversion table to put other populations back on that subset PCA - https://imgur.com/a/fg0I1u7

(Actually how you do the conversion table is a bit complex to explain. I do it in spreadsheets, but I think Eren came up with a quick R script that does this better).

Because this PCA is basically lossless compression, this should all be lossless provided you use all 25 dimensions in initial subset PCA, then conversion, etc.

Of course, this is probably less informative about total distances than simply reprocessing back through PCA, but if you're interested in where an ancient population would "project" back onto a PCA of moderns, etc. it could be quite useful.

Brian Boru said...

And finally use the loadings as a conversion table to put other populations back on that subset PCA - https://imgur.com/a/fg0I1u7

Fascinating, though the Swedish and Norwegian samples seem to be a little too close to Beaker Britain and the Celtic populations.