search this blog

Monday, May 21, 2018

Global25 workshop 1: that classic West Eurasian plot


In this Global25 workshop I'm going to show how to reproduce, more or less, that classic plot of West Eurasian genetic diversity seen regularly in ancient DNA papers and at this blog (for instance, here). To do this you'll need the datasheet below, which I'll be updating regularly, and the PAST program, which is freely available here.

G25_West_Eurasia_scaled.dat

This is what you'll get if you follow my instructions to the letter. Note the fairly strong correlation with geography. I think this is impressive for so many reasons.


OK, so, download the said datasheet, plug it into PAST, select columns 1 to 8, and go to Multivariate > Ordination > Principal Components. Here's a screen cap of me doing it:


The initial output won't resemble my plot above. So you'll need to place PC2 on the X axis, PC1 on the Y axis, and set the image size to 1206x706. After doing that, you should end up with exactly this:


Then, export the image, flip it horizontally with whatever imaging software that can do the job, and that's it, unless you want to add some labels like I did. Feel free to ask questions and make suggestions in the comments below.

See also...

Global25 workshop 2: intra-European variation

Global25 PAST-compatible datasheets

36 comments:

Gill said...

Thanks!

Le skipper de Pytheas said...

I am asked to require permission from you to access
G25_West_Eurasia_scaled.dat
This was not necessary for the previous G25 data
Is it the way you want it ?

Matt said...

@Davidski, you should get the same thing if you use the same process but all the dimensions in the sheet, rather than just 1-8? And actually with using 25 dimensions, should recapture more of the distances.

One thing I'd add is that rather than exporting the image and reversing the axis in image software, if you go under the scores tab in screenshot 3, then extract the scores, export to a spreadsheet, then multiply the Component 2 you want to reverse by -1, you can then re-export back into PAST3 to visualise. This is useful if you want to visualise Component 2 x 3, 2 x 4, etc. within PAST3.

(Speaking more generally, you can basically reverse any dimension in a PCA that you want and the distance will remain the same. Think of it this way; an XY coordinate system and A is -1,2 and B is 1,2, then they're no more or less distant than if A is 1,2 and B is -1,2, and of course this holds true however many dimensions you go up to. Whether any particular axis has population A on the negative end and B on the positive seems pretty much completely arbitrary).

Anonymous said...

Back AG ffoucart noticed that the Dolmen BA samples are modeled as almost fully CHG. There is one sample used for modeling but we have three samples with mtDNA: U2e1, T1a2 and H6a1a2a.

https://anthrogenica.com/showthread.php?14285-The-genetic-prehistory-of-the-Greater-Caucasus-preprint-Harvard-Jena&p=398777&viewfull=1#post398777

Matt said...

@epoch, that's a nice spot and cool for people to be looking, but I think it may reflect some of the "swinginess" inherent in trying to detect offsets between fairly convergent Anatolia_Chl, Iran_N, CHG via outgroups. Compare to the PCA and ADMIXTURE; Dolmen_LBA is not offset towards the CHG samples and away from Seh_Gabi and Anatolia_Chl relative to Maykop. If anything, very small, offset towards Europe...

@all, in the other thread, Sein explained how simple k-means clustering could be used to split up the populations into subsets. If you cluster based on Dimensions 1 and 2, you basically get three clusters:

A: Africans: https://pastebin.com/Jsdaawf9

B: West Eurasians+West Eurasian like South+Central Asians: (Pretty much like Davidski's PCA, though with more populations along the South Asian cline, and North Africans. File too big for Pastebin! Looks like: https://imgur.com/a/S4G3QjI).

C: East Eurasians+East Eurasian like South+Central Asians: https://pastebin.com/Jyj4u1Ve

Visualising Cluster 3: https://imgur.com/a/DJIwSpF - very clear separate clines

(Without Native Americans and Bashkirs in Cluster 3: https://imgur.com/a/fot2mfx)

Higher k clustering on G25 PC1+PC2 would start to separate out the South+Central Asians and other "big picture" transitional groups into their own clusters.

Davidski said...

@Matt

You should get the same thing if you use the same process but all the dimensions in the sheet, rather than just 1-8? And actually with using 25 dimensions, should recapture more of the distances.

That should, in theory, be a more accurate way of doing things, but I have tried it, and visually it reminds me less of my West Eurasian plot.

Davidski said...

@Le skipper de Pytheas

Try the link now.

Seinundzeit said...

Matt,

I think it's interesting how those 3 clusters correspond rather nicely with the old physio-anthropological "Caucasoid, Negroid, Mongoloid/Australoid" categories.

Of course, those categories were quite problematic, in a multitude of ways.

Like how they were meant to map "subspecies/races", when in truth contemporary genetic variation doesn't involve that much divergence (Neanderthals were a distinct human "race"/subspecies, but genetically diverged and "geographical extreme" living populations just aren't that diverged, and we almost always see vast genetic clines across space connecting these genetically diverged and geographically extreme poles of variation).

Furthermore, no physical anthropologist ever combined the peoples of East Asia and Australasia.

And even more glaring, the Caucasoid category was wrong in the sense that "West Eurasians" are not an actual phylogenetic unit, but are rather complex mixtures between distantly related western and eastern streams (UP European/WHG and MA1/AG3/Botai) with the addition of pervasive ENA (for both WHG and ANE, with additional minor ENA in northeastern Europe + the Volga + southern Central Asia + northwestern South Asia) and a heavy helping of something even more divergent (Basal Eurasian). Not to mention varying amounts of African ancestry (Near East/North Africa).

Still, even though "Caucasoid, Negroid, and Mongoloid/Australoid" did not represent either objective phylogenetic units (except perhaps in the case of East Asians) or deeply diverged subspecies/races, this simple k-means clustering does show that those categories did represent geographically structured clines of genetic similarity.

For example, like a broad swathe of populations which share very substantial amounts of genetic ancestry on time scales involving the Neolithic and beyond, stretching from northwestern Europe all the way to northwestern India (West Eurasians/"Caucasoids").

Seinundzeit said...

Oh also, after redoing averages based on K-means clustering, I'm seeing much cleaner/sensible models, and the fits tend to be tighter. I'd say it's worth the effort.

Anthro Survey said...

@Seinundzeit

People also forget that "Negroid" is a problematic category---Horners and Maghrebis aside. The differences between some of those SSAs can be immense and were more so prior to the Bantu expansions(which, tbf, was a homogenizer) even if a quick glance at them and clustering in 2D PCA space doesn't immediately suggest this. Things can be complicated.

For instance, Yoruba, Mende, etc. can be modeled as para-Eurasian(immediate sister clade to OOA) and basal-human if you've seen qpGraphs in the last two papers on Africa. Without basal human admixture, they'd probably share more drift with OOAs than any other SSA group. Since that's not the case, the distinction belongs to the Hadza, iirc, whose primary ancestry doesn't immediately clade w/OOA, unlike w/W. Africans.

That being said, readily visible phenotypic differences between most Eurasians, who feature more specialization in this than Africans do, often correlate nicely with ancestral streams.

Anthro Survey said...

What I like to do sometimes is take the rectangular PCA coordinates from PAST and convert them into polar coordinates on excel w/a couple of relevant formulas(radius length and angle are the resulting dimensions as opp to x,y). I then add/subtract some radians/degrees to rotate to my liking and convert this new sheet of coordinates back to rectangular coords(and export back to PAST).

This is pretty convenient allows more direct interaction with my desired PCA orientation.

Le skipper de Pytheas said...

@Davidski

The link is now fine. Thanks ....

Samuel Andrews said...

@Sein,

"And even more glaring, the Caucasoid category was wrong in the sense that "West Eurasians" are not an actual phylogenetic unit, but are rather complex mixtures between distantly related western and eastern streams"

True. How distant they were we don't know. We don't know what the relation between ANE, WHG, EEF, bla, bla are to each other yet. Before we do, we should be cautious of exaggerating their differenes. Just saying. People tend to do that because the expactation was Caucasins decend from a single line.

Natufian and EEF have some kind of "recent" common ancestor. Barcin has WHG-like ancestry. IranNeo has ANE ancestry. There's plenty of distant but important mtDNA links between BArcin and IranNeo. Where'd the Y DNA J2a in Neolithic Europe come from?

See what I'm seeing. They definitely aren't as distant from each other as let's say West Eurasians (eg, Persians) and East Asians (eg, Chinese) which David Recih likes to say. Maybe their total genetic distance is. But, they definitly have signifcant recent common ancestors that Chinese and Persians don't.

Matt said...

West Eurasians is an illustrative example, but Africans and East Eurasians just as illustrative;

- African groups (besides recent West Eurasian ancestry) are mixes between the OoA clade and a basal AMH clade

- East Asians are a mix of Tianyuan clade and a specific branch with Onge subsequent to Onge-Papuan separation, and Tianyuan itself shares an unresolved relationship with European Upper Paleolithic (on top of which is further SE Asian admixture with Onge-like and Siberian admix with ANE).

- Papuans are an East Eurasian clade + the most deeply branching sort of ancestry, Denisovan, among AMH ancestry the least paraphyletic*, but among human ancestry as a whole probably the most.

*well, I doubt that, but we'll need more adna from China'a paleolithic to really begin to look at it.

Paraphyletic dynamics are kind of not so different between regions, but the branching depth of each admixing ancestry is limited by the origin and diffusion of Out of Africa, and the depth of pre-OoA divergence.

E.g. among AMH structure Africa has deeper phylogenetic structure to admix than West Eurasia, which is generally deeper than East Eurasia, while structure for other humans may be almost the inverse (Basal Human closer to other AMH than Neanderthals, who are closer than Denisovans) depending on what comes out of Africa.

Anonymous said...

@Matt

I see what you mean.

OTOH, all the relevant Iranians cluster in that PCA and the modeling is done as CHG + Iran Chalcolithic + Anatolian Chalcolithic so all samples are modeled as sum of three samples with a substantial amount of CHG. Couldn't that muddle the differences in the PCA?

Matt said...

Actually, thinking about it, with the data at the moment, in the broad details the East Eurasian case is quite parallel to West Eurasia:

1) Early split of Tianyuan as outgroup to main East Eurasian ancestry parallels early split off of Basal Eurasian, as sort of Basal East Eurasian

2) Split off of main East Eurasian ancestry into Oceanian and Onge-like (Deeply Diverged East Asian), paralleling split of European and Siberian Upper Paleolithic

3) Hybridization of ancestry from Onge-like with Tianyuan in China parallels reflux of WHG-like into ME to mix with Basal Eurasian and form later Near East

4) Movements of agriculturalists into SE Asia and absorbtion of Onge-like ancestry parallels movements of EEF into Europe and WHG absorption.

5) Finally, admixture of ANE and East Asians in Siberia parallels admixture of Iberomaurasians and Near East groups in North Africa; ANE is functionally basal to the Tianyuan-main East Eurasian split in the same way Iberomaurasian is basal to the Basal Eurasian-EuroSiberian UP split.

On top of this parallels some details different; Tianyuan-GoyetQ113 relations no clear, very low level but introgressively important "Denisovan" ancestry into different East Asian populations (contrasted from the actual ancestry into Papuans and to some degree South Asians being a distinguishable pulse of southern Denisovan related), and some fairly generically East Eurasian signal back into ANE and WHG.

@epoch, gotta run, I'll have a think about that later and get back to you.

Seinundzeit said...

Matt,

"1) Early split of Tianyuan as outgroup to main East Eurasian ancestry parallels early split off of Basal Eurasian, as sort of Basal East Eurasian"

Ah, therein lies the distinction; East Eurasians do not have any noticeable Eurasian ancestry outside of what RK dubbed "Crown Eurasian", while essentially all West Eurasians (from Europe to northern India) have substantial "Basal Eurasian" admixture. The processes are parallel, but the magnitude of genetic differentiation is quite different (Papuan/Onge/Tianyuan are all part of the East Eurasian clade, and ANE is "Crown Eurasian").

As you noted, I guess the structuration at play with the mixing lineages is almost subject to cumulative simplification and temporal "thinning", as one moves from Africa towards East Eurasia.

Regardless, what you say is absolutely true; parallel processes at play in all three groups, but with differing amounts of divergence between the lineages which are mixing/melding in the context of the three groups (and with the differences going in opposite directions when it comes the geographic pattern you mentioned, if looking at AMH vs broadly human ancestry).

Anthro Survey,

I completely concur.

Matt said...

@sein:The processes are parallel, but the magnitude of genetic differentiation is quite different

That kind of depends on the model specification! Phylogenetically, they have to be deeper deeper, as is even more so than for the Basal Eurasian case, the case for OoA+Basal AMH in Africans; but models are still all over the map in whether Basal Eurasian is almost at the divergence of "Crown Eurasian", which is the classical model in Laz 2014, or at a high depth. May be that the split in f2 units between Tianyuan->Onge-like is of a similar size to Basal->West_Eurasian.

Plus, further issue is compound drift in all the trees; if drift happens more at the expanding edge (as in the classical serial founder effect model), Basal Eurasian may not end being so differentiated from each of the Crown Eurasians, in raw f2 / fst, as they are from each other - that was the case in the classical Lazaridis 2014 model - e.g. in that model Basal_Eurasian->para_West Eurasian distance was (25+29+4=58) while para_East Eurasian->para_West Eurasian was about (25+32=57). Though this was a "high fraction" Basal Eurasian at 36% Stuttgart's ancestry.

Not so clear to me that Basal Eurasian-Crown Eurasian split is "big potatoes" while splits within the CE clade are "small potatoes", yet...

Ryan said...

Forgive the off topic question, but is it possible to distinguish between a half sibling and an aunt/uncle using DNA alone? I have a 27% match on Ancestry and I'm trying to figure out if he's a half brother or an uncle. Would the number of segments by higher for a uncle than a half brother due to there being recombination happening twice for the uncle as opposed to once for the half sibling? I have 2 half sisters in Ancestry as well and they share 47 and 51 segments with me, vs 51 segments for this mystery individual, and 48 for my grandfather.

Shaikorth said...

@Matt
Lazaridis' tweets about Kamm et al. imply he's open to Basal percentages going down quite a bit, which would push the component further back from generic Eurasian clades. Lazaridis 2016 estimates about 25% for ENF, Kamm about 10%.

On another note, Global25 nMonte (scaled, pen=0) persistently suggests lots of WSHG in EHG's instead of fitting them as a mix of just AG3 or MA1 and WHG's. This would be something to test with qpAdm etc.

Matt said...

@Shaikorth, yeah, IMO Basal Eurasian to Anatolians could be something like 36% as in Laz 2014, 9% as in Kamm 2018, or something between the two like 24% which would comparable % to Tianyuan in Ami per McColl's preprint.

Mike the Jedi said...

@ Dave

Thanks for the new data sheet and the tutorial.

EastPole said...

Interesting article in “Der Spiegel” about recent genetic research. Fragment about migrations from Eastern Europe to India Google translated by me:

“The researchers now trace the stages of the centuries-old migration. It started in around 4800 years ago roughly on the territory of present-day Belarus. There, penetrating Yamnaya had mixed with the local farmer population. So to speak as souvenirs they had henceforth their genes into the luggage.
The next detectable station is more than 2000 kilometers further East. There, so at least it reconstruct the linguists, the traveler grabbed new terms that suggest, that they learned the art of chariots. This agrees with the findings of the archaeologists: you situate the birthplace of this acting mighty war machine at the foot of the Ural mountains.
So far, the migrants in the ox-cart were wincing through the steppe, henceforth they surge in graceful two-wheeled carriages through the grassland. That leaves close to social changes. Archaeologist Kristiansen at least considers it very likely, that the advance towards India did not take place in the form of marauding youth gangs as in Europe, but rather militarily organized.”


https://s31.postimg.cc/w5q4srsff/screenshot_394.png


https://www.academia.edu/36689289/Invasion_aus_der_Steppe

They mention Belarus although I think it could be more like Northern Ukraine, Belarus and Eastern Poland where oldest Corded Ware culture evolved before migrating east. Also I am not sure whether it was Yamnaya or Late Sredny Stog Dereivka as a source steppe population.

Davidski said...

@EastPole

I wouldn't trust the German media, or any mass media, to tell you where Sintashta originated. In fact, I'd go so far as to say that if they claim it was Belarus, then it probably wasn't Belarus.

Note that in the Global25, the least admixed (most western) Sintashta samples show more affinity to Czech Corded Ware (minus the MN farmer-like outlier) than to Baltic Corded Ware/BA.

Anthro Survey said...

@Samuel

As far as I remember, Reich was saying this about comparing the verteces of the West Eurasian PCA diagonally.

In that case, it does make sense that the distance b/ween WHG and Iran_N (or between ANE and Levant_N) should be comparable to(but still less than) that between Chinese and contemp. Iranians. In both cases, you're taking a Crown Eurasian population and comparing it to a basal-rich one w/a considerably different clade of Crown Eurasian ancestry.

Anthro Survey said...

@Davidski

What I'm wondering about is if Poltavka-like populations form an appreciable ancestral layer in Sintashta/Andronovo(say, >10%) or whether they were essentially undiluted CWC offshoots.

If the former is true, we're potentially looking at the dominant, non-Poltavka CWC layer closely resembling/clustering with contemporary East Europeans in 2D PCA space. I don't know if you remember the revised Harappa calculator results or not, but Brahmins get about 15% "East Euro" on that. Their Steppe_MLBA numbers across various estimates are slightly higher than this. I'm not a huge fan of ADMIXTURE percentages, but it's an interesting note.

Anthro Survey said...

^ to add: I.E The question is whether populations of CWCs, Westward-shifted enough to make them touch/cluster with Poles or Ukrainians, existed at one time in some region. I see no reason why not because we don't have a full spatio-temporal transect of CWCs right now, and the few that we do have show some variation.

EastPole said...

@Davidski
Baltic Corded Ware/BA was to the north of Belarus area. We don’t have much Corded Ware from Belarus, Poland and Ukraine. There steppe people probably were more admixed with farmers and less with Hunter Gatherers. Genetics is not everything. Look at the language, religion, culture. Everything supports the area north of the steppe.

Davidski said...

@All

Idiotic comments from banned commentator Olympus Mons deleted.

This ridiculous person can't come to terms with the fact that R1b-L23 isn't native to the Southern Caucasus, and, thus, that there won't be any R1b-L23 in any pre-Bronze Age Near Eastern populations, which means no R1b-L23 in Shulaveri Shomu.

Please ignore all such bleating from Olympus Mons in the future. I'll delete his comments as soon as I see them.

Anthro Survey said...

@EastPole

"Belarus,etc....There steppe people probably were more admixed with farmers and less with Hunter Gatherers"

My hunch exactly for CWCs there circa 2200BC or so!
Not sure that this would hold for the people in the heavily forested regions of those lands, though, where I wouldn't expect extensive penetration of neither EEF nor steppe-like at this date.

dsjm1 said...

David,

Thanks - had it done in 10 mins - you now have me hooked into delving deeper :)

Cheers

Doug Marker
Sydney Australia

Seinundzeit said...

Matt,

"Not so clear to me that Basal Eurasian-Crown Eurasian split is "big potatoes" while splits within the CE clade are "small potatoes", yet..."

I think your circumspection is completely warranted.

We really need an actual Basal Eurasian aDNA sample, to crack this puzzle (without actual aDNA, the models can be tuned in a very large number of ways, with very different structural features and drift lengths. Hell, even with aDNA, the models are far from clear/unambiguous!).

For a long time, I've been greatly interested in seeing relatively "recent" samples which display a very close relationship with MA1/AG3, and of seeing ancient samples which are essentially West Eurasian and yet of a strongly "South Asian" genetic character, and those wishes have come true (for the former, Botai and West_Siberia_N, and for the latter, Shahr_I_Soktha_BA2).

So, hopefully my desire to see a genome of around (at least) 80%-90% BEA will also come true, eventually?

Question is, where should they be digging, and do those ancient North African genomes (not the Iberomaurusians, the samples from the other preprint) fit the bill (even if partly, that would still be a thing of great interest)?

ryukendo kendow said...

For various reasons I'm not going to be posting here any longer.

I want to thank all of you, and especially Matt, Rob, Shaikorth, Chad, Alberto, Sein, Kristiina, Santosh, Karl_K, Ebizur, and Capra_Internetensis for the conversations and spirited debates we have had here. It has been very stimulating, and its early days, but it has probably permanently changed the trajectory of my career. I know its very low stakes and all, but this is the first place where I've ever had the experience of being wrong and explicitly contradicted by findings in a very public way, and the epistemic humility that comes from tasting that is something better tasted while very young haha.

And of course thank you Davidski, you've got a very special project here, one of its kind in fact.

So long and best of luck to all of you!

Alberto said...

@Ryukendo Kendow

Many thanks to you for all the good times here!

Since I'm in a similar situation where I'm not going to be posting here anymore I want to join in to thank everyone here who have made this place so interesting. It includes all the above mentioned, and Ryu himself, and probably a long list that I could hardly compile here, like FrankN, Nirjhar, Jaydeep, Tobus, Arza and many more (sorry for not being able to mention everyone, I hope I've thanked many others along the way already).

Hope we'll still see around somewhere. All the best!

Shaikorth said...

Global25 was fitting significant WSHG admixture to EHG samples on top of WHG and AG3. Now it looks like qpAdm (rightpops CHG, Neolithic Near East, UP Europe, Oceania, Ami, Anzick, MA-1) supports it. Tail probs for modeling the oldest EHG Sidelkino as various WHG's+WSHG are between 0.35 and 0.5. Models with WHG+AG3 or WHG+AG3+WSHG have values below 0.3. Karelia-HG is Sidelkino with extra WHG (tail prob 0.9). This is a pretty important discovery and wasn't checked in Narasimhan preprint, hopefully some tests will be in the final paper.

Simon_W said...

@ Anthro Survey

I wanted to send you a PM on anthrogenica about an interesting new book, but you have exceeded your "stored private messages quota".