search this blog

Thursday, August 10, 2017

Basal-rich K7 & Global 10 updates (10/08/2017)


I've updated the Basal-rich K7 spreadsheet and the Global 10 datasheets with a plethora of ancient individuals and populations, including Anglo-Saxons, British Celts (labeled England_IA), Minoans, Mycenaeans, Bronze Age Iberians and many more.

Basal-rich K7 spreadsheet

Global 10 main datasheet

Global 10 ancient averages datasheet

Please keep in mind that the K7 can be somewhat conservative with minor ancestry proportions, especially Ancient North Eurasian (ANE) admixture, and low coverage samples can behave in odd ways in the Global 10. So when modeling ancestry with ancient samples it might be useful to stick to high coverage individuals that show consistent results. If you don't know what the Basal-rich K7 and Global 10 are, then these links will be useful.

The Basal-rich K7

Global 10: A fresh look at global genetic diversity

An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests

39 comments:

MaxT said...

All Bell Beakers in Basal Rich spreadsheet are from Germany, these are from Haak et al, 2015 i think?

Any new ones from recent Olalde et al, 2017 study?

Davidski said...

The new Beaker dataset hasn't been released yet. It will be when the paper is published in a journal.

andrew said...

What does British_IA stand for?

Ryan said...

@Andrew - British Iron Age.

@David - Any chance of adding this to Gedmatch? Your ANE K7 one is a bit dated lol, lumping CHG and WHG together and mistaking Iranian Chalcolithic ancestry for ASI.

Which reminds me...

Davidski said...

@andrew

What does British_IA stand for?

There's only England_IA, and it stands for England Iron Age.

@Ryan

Any chance of adding this to Gedmatch? Your ANE K7 one is a bit dated lol, lumping CHG and WHG together and mistaking Iranian Chalcolithic ancestry for ASI.

I can't because people have paid for it.

And I can't remember lumping CHG with WHG in any old tests, but maybe I did. It's a steep learning curve making tests like this.

Alexandros said...

This is excellent! Actually I was about to write a comment on the main Mycenean/Minoan post, asking whether you were planning to update the Basal-rich K7! I will dig into it during the weekend..

Aram said...

Can someone do a nMonte for Kum4 using the Kum6 as the only preceding reference. Not any other Anatolian Neolithic.
Thanks in advance.

Alexandros said...

@David
Actually, I could not wait.. First thing I noticed is that Anatolia_BA in the Basal-rich test are almost indistinguishable from some modern populations (needless to say who..). Should we take this with a pinch of salt, or are you confident in this result? I am asking because these Anatolian_BA individuals behave somewhat differently in Lazaridis' PCA plot (there, it is Anatolia_Chl that clusters with moderns instead).

Rob said...

@ Alex

Have a look at http://imgur.com/Eu2Kcbo

Rob said...

@ Aram

Kum 4 with All 5000 BC Anatolian included

Kumtepe_LN:kum4
Tepecik_Ciftlik_N:Tep006 61.9 %
Kotias:KK1 14.6 %
Latvia_HG:ZVEJ32 9.25 %
Ukraine_N1:StPet12 8.5 %
Nganasan 5.7 %
Hungary_N:I1495 0.05 %


Only with Kum 6

Kumtepe_LN:kum4
Kumtepe_LN:kum6 46.15 %
Hungary_N:I1495 23.5 %
Kotias:KK1 14.35 %
Ukraine_N1:StPet12 7.4 %
Nganasan 6.15 %
Latvia_HG:ZVEJ32 1.8 %

Alexandros said...

Thanks Rob! Very useful! May I ask, which raw data were used for constructing the specific PCA plot you shared?

Also, is anyone aware of the main difference between Kum4 and Kum6? I understand that they are both LN Anatolians, with around a millenium separating them and one of them is a low coverage sample. Is this the reason for the genetic difference between the two (i.e. due to bias intoroduced by low coverage) or are we seeing a real transitional change here?

Davidski said...

@Alexandros

I haven't looked at this in detail yet, but it seems that modern-day Cypriots fall on a cline between Anatolia_BA and Anatolia_ChL. So it looks like the genetic structure of that period of Anatolian history has been preserved in Cyprus, perhaps because it's an island?

And there's definitely something European-like about Kum4 compared to all of the earlier Anatolians.

Rob said...

@ Alex

It's using Dave's Globe 10 in different ways.
I agree that, despite the low coverage, that Kum 4 might be early steppe admixture.
But of course, there were multiple things happening in Anatolia, when you compare Kumtepe IV, the south Anatolia B_A, and Barcin Chalcolithic. They don't represent a single post-Neolithic phenomenon.

Ryan said...

@David - Yah fair enough.

"And I can't remember lumping CHG with WHG in any old tests, but maybe I did. It's a steep learning curve making tests like this."

Well it's that or I'm 62% WHG so... I'll go with that. Not just a learning curve though - I don't think the whole situation was as well understood by anyone a couple years ago.

Looking at this sheet here, Basques seem about 50% Bronze Age Hungarian and 50% Iberian Middle Neolithic. Makes sense.

Samuel Andrews said...

@Ryan,
"Looking at this sheet here, Basques seem about 50% Bronze Age Hungarian and 50% Iberian Middle Neolithic. Makes sense."

Nah, I bet Basque are basically 50% North Bell Beaker (R1b P312), 50% Iberia Neolithic.

Hopefully eventually Maju will realize this. I confronted him with the new Bell Beaker and Bronze age Portugal papers a few times. He just said "Well we just don't have enough aDNA from Neolithic France." But he has stopped posting on his blog which I think means he has given up on genetics since his whole agenda concerning European genetics has fallen apart.

MaxT said...

Interesting how Minoans and Sardinians completely lack ANE admixture. Their Basal/WHG admixture proportions look almost identical here.

I imagine Minoans must have looked a lot like modern-day Sardinians

Davidski said...

Minoans don't lack ANE in the K7. They do have a bit due to their Caucasus-related ancestry.

They're more like the Tepecik Anatolian farmers rather than Sardinians in the K7.

Matt said...

@ Davidski, think the Kumtepe rows may be off in your Basal K7 spreadsheet?

Combined the values in the Basal K7 sheet and a projection onto the Principle Coordinates of the Fst values of the components: http://imgur.com/a/cGKyQ

(Not that this tells us anything much that the K7 values themselves don't, I guess!)

MaxT said...

@David

Yeah, they have it around 1%-2%

"They're more like the Tepecik Anatolian farmers rather than Sardinians in the K7."

AH, Thanks. I was thinking more in terms of phenotype.

Matt said...

Not sure if BasalRich K7 is picking up Portugal BA correctly. Looks like they can't have any more than 4% Steppe based on ANE fractions

Myceneaen as Minoan+SteppeEMBA looks 89:11 as expected in BRK7.

Davidski said...

@Matt

Yep, there was a formatting error in the K7 spreadsheet. Fixed now.

And yeah, minor ANE is often underestimated in the K7, because it's confused with Villabruna-related, which also has some ANE.

The Portuguese BA samples should be OK in the Global 10 though, with around 15% Yamnaya stuff.

Ryan said...

@Sam

"Nah, I bet Basque are basically 50% North Bell Beaker (R1b P312), 50% Iberia Neolithic. "

Effectively that's the same thing. I don't think Maju really doubted that the Basques are strongly tied to Bell Beakers though. It's just a question of where R1b came from in Bell Beakers, and frankly I don't think the assumption that it came from IE people makes sense (rather I think Yamnaya got M269 from a WHG source).

Josep Coderch said...

It's curious that sardinians got 0 ANE yet have around 20% R1b. If ANE is underestimated what would be the true value, maybe 2-3%?

Davidski said...

Yeah, something like 2-3% ANE for the Sardinians from the HGDP, but some other Sardinians from the less isolated areas of the island are as eastern shifted as mainland Italians.

P Piranha said...

Some work from Ryukendo at Anthrogenica, using what he calls the "sequential exclusion" method:

[1] "distance%=0.4352 / distance=0.004352"

Portugal_BA

Iberia_MN 68.05
Ireland_MN 12.90
Yamnaya_Samara 11.35
Corded_Ware_Estonia 4.50
Bell_Beaker_Germany 1.90
Iran_N 0.95
Yamnaya_Kalmykia 0.25
Halberstadt_LBA 0.10
Afanasievo 0.00
Alberstedt_LN 0.00
Anatolia_BA 0.00
Anatolia_ChL 0.00
Andronovo 0.00
Armenia_ChL 0.00
Armenia_EBA 0.00
Armenia_MLBA 0.00
Baalberge_MN 0.00
Barcin_N 0.00
BattleAxe_Sweden 0.00
Corded_Ware_Germany 0.00
England_Roman 0.00
England_Roman_outlier 0.00
Germany_Bronze_Age 0.00
Greece_LN 0.00
Greece_MN 0.00
Hungary_BA 0.00
Hungary_CA 0.00
Hungary_HG 0.00
Hungary_N 0.00
Iberia_ChL 0.00
Iberia_EN 0.00
Iberia_HG 0.00
Iceman_MN 0.00
Iran_ChL 0.00
Iran_IA 0.00
Iran_LN 0.00
Ireland_EBA 0.00
Israel_Natufian 0.00
Jordan_EBA 0.00
Karasuk 0.00
Karasuk_outlier 0.00
Karelia_HG 0.00
Kumtepe_LN 0.00
Lapita_Tonga 0.00
Lapita_Vanuatu 0.00
Levant_N 0.00
Loschbour 0.00
MA1 0.00
Maros 0.00
Minoan_Lasithi 0.00
Minoan_Odigitria 0.00
Mota 0.00
Nordic_IA 0.00
Nordic_LBA 0.00
Nordic_MN_B 0.00
Okunevo 0.00
Portugal_LN 0.00
Portugal_MN 0.00
Potapovka 0.00
Remedello_BA 0.00
Salzmuende_MN 0.00
Samara_HG 0.00
Sarmatian_Pokrovka 0.00
Scythian_AldyBel 0.00
Scythian_Pazyryk 0.00
Scythian_Samara 0.00
Scythian_ZevakinoChilikta 0.00
Sintashta 0.00
Srubnaya 0.00
Srubnaya_outlier 0.00
Unetice_EBA 0.00
Ust_Ishim 0.00
Vatya 0.00
Villabruna 0.00

P Piranha said...

His model is already pretty good, so he performed just two follow up models.

[1] "distance%=0.4399 / distance=0.004399"

Portugal_BA

Iberia_MN 58.4
Ireland_MN 15.1
Bell_Beaker_Germany 12.6
Corded_Ware_Estonia 11.6
Iran_N 2.2
Karelia_HG 0.1

[1] "distance%=0.4431 / distance=0.004431"

Portugal_BA

Iberia_MN 66.5
Bell_Beaker_Germany 19.1
Halberstadt_LBA 5.6
Ireland_MN 5.4
Iran_N 2.5
Karelia_HG 0.9

The Ireland Middle Neolithic (I'm assuming thats Balynahatty?) appears of its own accord, a rather fitting outcome.

Josep Coderch said...

It's also impressive how high japanese and koreans score in ANE, almost the same level of southwestern europeans.
From Siberia to the furthest west and east of Eurasia and south too, ANEs sure spread far and wide.

P Piranha said...

@ Davidski

Any reason why you stopped doing those K13 and K15 runs? They always seem interesting as they would be expected to capture shared drift with modern recent clusters. Would be curious if Portugal BA had a large quantity of "Atlantic" and "North Sea" mixed with "West Med" for example. That would tell us some of the drift associated with these regional clusters had already began.

Honestly the more I look at this the more I think Iberia was shaped by later population movements. No sign of the 10% North African ancestry yet, and Steppe input must also increase to quite a dramatic degree to get to present-day values. Maybe some came with Urnfield and the rest with Germanics.


@ Shaikorth I know previous studies showed few IBD signals in Western Europe. Any sign of recent IBD that could mark Germanic migrations into Iberia, say from Busby et al? How would you go about detecting Germanic migration signal independently of shared recent European signals in David's IBD matrix?


Rob said...

Halberstadt shouldn't be used, as it's the same time or slightly later than Portugal BA

Samuel Andrews said...

@Rob,
"Halberstadt shouldn't be used, as it's the same time or slightly later than Portugal BA"

You do know time period makes no difference right? Even modern Norwegian would give an accurate score. They all have similar Steppe/MN ratios.

Samuel Andrews said...

@P Piranha,

"Any reason why you stopped doing those K13 and K15 runs? They always seem interesting as they would be expected to capture shared drift with modern recent clusters"

I second that.

"Would be curious if Portugal BA had a large quantity of "Atlantic" and "North Sea" mixed with "West Med" for example."

That would be interesting. It'll also be interestng to see the "North Sea" and "Atlantic" scores in British Beaker folk. And hopefully someone will be able to run them through genetic geneaology tests like 23andme.Think about it high British/Irish scores in recent arrivals from mainland Europe will be ground breaking.

Aram said...

Rob

I see. So it really came from European side.

Ryan

"rather I think Yamnaya got M269 from a WHG source"


Well no single pre-M269 from Mesolithic/Neolithic C/W Europe and not even a P297+.
While in Eneolithic Altai P297 didn't have any problem to show up from the first sample.
For Basques a good read.

https://www.nature.com/articles/s41598-017-07710-x

Matt said...

@ Davidski: And yeah, minor ANE is often underestimated in the K7, because it's confused with Villabruna-related, which also has some ANE.

That Villabruna vs ANE probably explains why that problem exists for Portuguese_BA and not so much for Mycenaeans - additional WHG in Portugal_MN adds to difficulty. I wonder if this is also what hit Martiniano's ADMIXTURE analysis in the Portugal paper?

@P Piranha: "Any reason why you stopped doing those K13 and K15 runs? They always seem interesting as they would be expected to capture shared drift with modern recent clusters. Would be curious if Portugal BA had a large quantity of "Atlantic" and "North Sea" mixed with "West Med" for example. That would tell us some of the drift associated with these regional clusters had already began."

This stuff is interesting, though not so sure how much we could infer actually whether regional cluster drift in Bell Beaker or other ancient populations had happened.

As I recall, when Davidski ran off a projection of ancients onto a PCA made of of recent European samples, excluding Sardinians and Basques, I think we kind of found that all the ancients (Corded Ware, Bell Beaker, etc.) except for the Hungary_BA and Czech_Early_Slavic all loaded onto the West European side.

That might be due to additional drift in the East European side of the plot. Though this is at a very mild level if so (by Fst, East European groups have pretty much exactly the same distance from the Africans outgroups relative to their ancestry composition. See - http://i.imgur.com/zLP0Wu5.png. Rather than anything like the +0.020 difference seen betwee Kalash and Pathans).

So if we ran this and found that many ancients were sitting in the "Atlantic" / "North Sea" clusters, that may just be telling us that the West European groups who form the basis of those clusters have slightly less drift from the ancestral state, and not so great as a specific signal of new ancient groups splitting off.

Davidski said...

@Matt

That Villabruna vs ANE probably explains why that problem exists for Portuguese_BA and not so much for Mycenaeans - additional WHG in Portugal_MN adds to difficulty. I wonder if this is also what hit Martiniano's ADMIXTURE analysis in the Portugal paper?

Yes, good chance it did, plus I think the R1b-M269 ancestors of the Portuguese BA individuals came from Ukraine, where the forager ancestry in the BA pastoralist groups there was likely to have been closer to SHG along the WHG > EHG cline, and thus they had less ANE than Samara Yamnaya.

Samuel Andrews said...

@Matt, David,

Have you guys considered that Yamnaya or proto Corded Ware could be (UkraineHG+An HG more ANE than EHG)+CHG not EHG+CHG.

Shaikorth said...

@P.Piranha


I tried various combinations with Spain_LNCA, Basques, Tuscans, Nordic_IA, Germans and Scandinavians. There were inconsistencies, separating a specifically Germanic Northwest European IBD signal is indeed difficult. Busby et al. gives French signals but no Germanic signals, admittedly the French could be a proxy.

For example with the IBD matrix:

http://i.imgur.com/zwl5bnR.jpg

This is probably the most decent attempt using a Germanic population, Spanish_Pais is always higher above the trendline with Scandinavians, but the position of others will vary. However, using French will make almost all populations hug the line, suggesting French and Basque IBD can't easily separate Spanish from each other or Basques (concerning for those searching a pattern since French have much more regular NW European ancestry than Basques). Germanic samples from Migration Period SW Europe are yet to be tested, though.

Matt said...

@ Sam, well, I don't know that it couldn't have been like that. But I just don't think we have any specific evidence that it would be like that!

At the moment EHG+CHG is the simplest model, and the most plausible for what we have about populations who lived relatively close to the Yamnaya at a relatively close time. In time perhaps methods like the simulations in the new Lazaridis paper can test other possibilities.

There are still lots of gaps in sampling. One possibility I entertain is that the Yamnaya had less EHG from Samara or anything like that, and that they may have had ancestry from a North Caucasus population / South steppe populatio that was already intermediate EHG and CHG, just as CHG has more EHG than Iran_N and as Levant_N->Barcin->Boncuklu have progressing levels of WHG ancestry.

Just not enough sampling to comment on anything like this atm. We know that EHG+CHG is a good working model for Yamanya and that refinements suggest additional low level ancestry from the Near East would probably improve slightly, much else we can really say.

Ebizur said...

"It's also impressive how high japanese and koreans score in ANE, almost the same level of southwestern europeans."

But still lower than a Han from North China (HGDP01287) and a Naxi (HGDP01345) in the case of a Japanese (HGDP00748), and also lower than a Yi (HGDP01179) in the case of a Korean (ND13299). The Naxi and Yi are Tibeto-Burman-speaking populations from southwestern China.

The minima for ANE affinity in eastern Asia appear to be found among the aborigines of Taiwan (as represented by Atayal and Ami) and the aborigines of the Andaman Islands (as represented by Onge). On the other hand, the inhabitants of most of China (with the possible exception of the southeastern quadrant even today) appear to have as much or more affinity for the Mal'ta genome than Japanese or Koreans have affinity for the Mal'ta genome. That would make geographical sense, too, considering the place of deposition of the Mal'ta specimen.

Matt said...

Bit off topic, but thought this might be interesting to some:
When we have dimensional data, like the PCA and Global 10, we can calculate the euclidean distances in those dimensions between populations. Alberto has done this before on the weighted and unweighted Global 10.

Those tend to show relative distances of populations like we expect.
But ultimately is just a number and is hard to compare with anything much.

A way to use the Basal Rich K9, or really any ADMIXTURE calculator that might produce population distances that are more meaningful to interpretation might be like follows.

1) First take the Fst matrix for the relatedness of the components - http://i.imgur.com/Ln4G1p9.png

2) Now run the Principal Coordinates Analysis function on them in PAST3 - a) http://i.imgur.com/fSn6YlI.png, b) http://i.imgur.com/dSzIUEP.png (use the same settings)

3) Copying out the dimensions and scores, you can check that they've kept the same information from the Fst matrix by re-running the Euclidean distance calculation over the output - a) http://i.imgur.com/vTyJIgd.png, b) http://i.imgur.com/7z9iizK.png. There are some very small differences, but it's basically 1:1.

4) Final steps are you can project all the rows from the normal Basal Rich K9 onto these dimensions: https://pastebin.com/7sD7mgRk

While this seems like a pretty longwinded process, the advantage it gets you is that when you use nMonte on this final data, or 4mix, or run a simple Euclidean distance, the output should be somewhat comparable to an Fst matrix.

When I run a Euclidean distance matrix over the output (e.g. https://pastebin.com/bF4ug99K), I do find that the Fst scores output, do generally seem to have some resemblence to *real* population Fst (assuming that the Sub Saharan here does represents Mbuti, Oceanian is mostly Papuan, etc.)

This isn't hyper specific for closely related recent groups - using this method on EurogenesK15 actually seems to work better in that regards (or higher dimensional stuff like EurogenesK36 might also be more precise).
(Ideally, would have a script to just run this whole process through).