search this blog

Thursday, June 17, 2021

Balto-Slavic drift


A few years ago I began using the term "Balto-Slavic genetic drift" to describe the fine-scale genetic signal that is shared by the speakers of Baltic and Slavic languages to the exclusion of Europeans without significant Balto-Slavic ancestry.

As a result, nowadays, many people online use the term "Balto-Slavic drift" when referring to this phenomenon.

The easiest way to prove that Balto-Slavic drift exists is to run a fine-scale Principal Component Analysis (PCA) of European genetic variation with a lot of Balto-Slavic samples in the mix. Indeed, my Global25 PCA analysis does a great job of illustrating the impact of Balto-Slavic drift on the population structure of Europe both in PCA plots and mixture models (for instance, see here).

It's also possible to tease out Balto-Slavic drift with formal statistics. I showed this indirectly in a recent blog post about Greek population structure (see here). In this post I'm going to demonstrate how to explicitly and formally test for Balto-Slavic drift both in ancient and present-day samples.

To do this we need to find stats that basically split Baltic and Slavic speakers from other Europeans, such as f4(Outgroup,Test;Bell_Beaker_NDL,Baltic_LVA_BA). In this f4-stat, Baltic_LVA_BA is the ancient reference population with an unusually high level of Balto-Slavic drift, while Bell_Beaker_NDL is a fairly similar population overall in terms of ancient ancestry components, but with practically zero Balto-Slavic drift.

Note that the statistics with the most significant Z scores (>3) involve populations that speak Baltic or Slavic languages, or their neighbors who plausibly harbor significant Baltic and/or Slavic ancestry. Among the ancient, mostly Scandinavian, populations (from Margaryan et al. 2020 and marked with the VK2020 prefix), significant Balto-Slavic drift only appears in the more easterly and/or later groups from the Viking Age (VA).


Unfortunately, one of the problems with this analysis is that Baltic_LVA_BA and Bell_Beaker_NDL aren't identical in terms of their ancient ancestry proportions. For one, the latter has significantly more Neolithic farmer ancestry. No wonder then, that Greeks, who are mostly of early farmer stock, don't show a significant Z score, despite probably packing a significant amount of Balto-Slavic ancestry dating to the Middle Ages.

In the near future, as more ancient samples become available, it might be possible to find better reference populations for the job and create more accurate, finer-scaled tests.

See also...

Uralian genes

That old chestnut: Northeast vs Northwest Euros

233 comments:

«Oldest   ‹Older   201 – 233 of 233
George said...

Off Topic;

A Middle Pleistocene Homo from Nesher Ramla, Israel
https://science.sciencemag.org/content/372/6549/1424

From the Abstract;
"The authors present comprehensive qualitative and quantitative analyses of fossilized remains from a site in Israel dated to 140,000 to 120,000 years ago indicating the presence of a previously unrecognized group of hominins representing the last surviving populations of Middle Pleistocene Homo in Europe, southwest Asia, and Africa."

Also see:
https://www.sciencedaily.com/releases/2021/06/210624141540.htm

"The bones of an early human, unknown to science, who lived in the Levant at least until 130,000 years ago, were discovered in excavations at the Nesher Ramla site, near the city of Ramla. Recognizing similarity to other archaic Homo specimens from 400,000 years ago, found in Israel and Eurasia, the researchers reached the conclusion that the Nesher Ramla fossils represent a unique Middle Pleistocene population, now identified for the first time."

"This is a group in itself, with distinct features and characteristics. At a later stage small groups of the Nesher Ramla Homo type migrated to Europe -- where they evolved into the 'classic' Neanderthals that we are familiar with, and also to Asia, where they became archaic populations with Neanderthal-like features. As a crossroads between Africa, Europe and Asia, the Land of Israel served as a melting pot where different human populations mixed with one another, to later spread throughout the Old World. The discovery from the Nesher Ramla site writes a new and fascinating chapter in the story of humankind."

Andrzejewski said...

@Heyerdahl “ 80% genetic replacement? I have my doubts. Max Planck loves ostentatious statements.”

If a 90% replacement of the Brit Neolithic pop by the Dutch Beakers wax possible, why not 80% of Welsh ancestors by AS?

Rob said...

As with any large scale genetic shift observed, the context is extremely important. Although 80% leaves little doubt to the extent, the reason is why/ how. SE England was the most Romanized area. it might have been abandoned when indefensible, and people shifted to the Iron Age hillforts in the West and Maritime contacts with the Mediterranean via the Bristol

Ric Hern said...

@ George

Very interesting. Thanks George. I still wonder how an Archaic population on the way to become Sapiens who split recently from the the common Ancestor of Neanderthal and Sapiens look like Genetically...

ambron said...

David, therefore, it is difficult to understand why the eastern GAC areas are to be preferred over the central areas. For example, Gava is primarily a Slovak culture and the genome from this culture is most similar to the Slovak genome:

Target: HUN_Gava_BA:I20771
Distance: 3.1854% / 0.03185423 | R3P
63.6 Slovakian
28.2 Spanish_La_Rioja
8.2 SRB_Iron_Gates_HG

Distance to: HUN_Gava_BA:I20771
0.04727892 Slovakian
0.05270727 Czech
0.05271682 Austrian

Matt said...

Used Admixtools2 and tested out whether Romania_C_HG_o.SG (the sample showing high BS specific drift in G25, but mainly HG and ANF ancestry) gave an enriched signal in f4 statistics: https://imgur.com/a/nR4Tewj

(Blue = East Europe, Red = West Europe, X = ancient, Dot = Modern).

(The control population used against Romania_C_HG_o.SG in f4 stats was the Blatterhohle_MN population, which in G25 came out with around 3% difference in total HG, in a simple HG+ANF model).

Got about 250k SNPs so hopefully the stats are OK.

It basically looks like there's some signal present, even when we disentangle any presence of Steppe_Eneo ancestry in Romania_C a bit, still some linkage to East Europe today, however it's not very strong! May be nothing. Not significant Z scores with an outgroup (though Basque:Lithuanian *might* just about reach significance under a direct f4(B,L;Blat,Romania_C_HG_o).

Compare this to the kind of signals obtained from Spain_MLN and Latvia_BA, or comparing Latvia_BA-Beaker difference to Turkey-HG difference: https://imgur.com/a/zD6nWAt

Much stronger.

So I guess that favors the idea that Romania_C_o has her G25 position due to some kind of projection issue?

(Btw, still seeing GAC have some kind of correlation / reduced differentiation with Steppe_Eneo, relative to trend: https://imgur.com/a/hrKH7Vh . Something seems unexplained.)

(Another interesting thing, if I take the plots from the above comparing Latvia_BA-Beaker difference to Turkey-HG difference, which seems to very effectively isolate the "Balto-Slavic cline", then, assuming that Slavic expansion was Belarusian-Ukrainian like (as a composite of a Slovakian like and LTU_Baltic_BA like population), and assuming a Greek-Roman base that's more like Sicilian like than Cretan like, then the ultimate ancestry from that origin in Greek seems likely to be 14-18%, or roughly 1/6, from the relative positions. Could be higher using intermediate points on the cline, but since those populations might be composites of Greek like ancestry it becomes hard to say).

(Can post up any of these stats if anyone is interested).

Matt said...

Another quick set of f4 comparisons using the Romania_C_o sample (GB): https://imgur.com/a/SHR4puw

It does seem like the sample is overall similar in proportions of HG, ANF to the average of the 3 Blatterhohle, though possibly very slightly shifted towards EHG on the HG cline (but by a very small amount).

And it does seem like there is relationship between the differences between GB and Blatterhohle and present day Europeans that goes beyond deep proportions. However, this does seem statistically weaker compared to differences using BA populations, though these are of course more confounded by deep proportions.

Cy Tolliver said...

Does this new version of Admixtools have the same R/Windows compatability that the old version did?

Matt said...

Cy, the new version Lazaridis has mentioned (https://twitter.com/iosif_lazaridis/status/1407389498850177029) is the original implementation. I don't *think* the ADMIXTOOLS 2 R package has received the same update. ADMIXTOOLS 2 is the name for the R package implementation. (It seems a bit confusing to have a "2" at the same time as an updating version number in the original package but this is how they're rolling it seems ;-) )

Parastais said...

Not sure how this graph is called:
https://2.bp.blogspot.com/-ecMCYawqenE/XHe5xPgQtyI/AAAAAAAAHmA/m-V4dvxN29sRtHVLOLZigjsVHDFJb3tKQCLcBGAs/s1600/Piedmont_Eneolithic_qpGraph2.png
But could something similar be modelled for Latvians - Poles - Serbs (or any other 3 pops on BS cline, extreme ends and middle) and bunch of currently known ancient BA, IA and MA samples?

Norfern-Ostrobothnian said...

@Arza
I did the Wallacea sample (Eigenstrat and Plink genotypes)
I wonder if there is any mitochondrial DNA available from the sequences?

https://www.mediafire.com/file/umox84zglz68er6/GUP001.zip/file

@Davidski
I found this interesting alternate method of ancestry inferring alternative to PCA. Seems to be less susceptible to being swayed by overtly large populations or noise. "Spectral-GEM" as it is called.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4610359/

Rob said...

@ Matt

So what are your conclusions Matt ?

Matt said...

@Rob, I think my conclusion overall is that ROU_C_HG_o seems to only captures a part of the Balto-Slavic shift; there are probably a lot of different possibilities compatible with this*.

The sample might still be useful in qpAdm for trying to detect Balto-Slavic drift without using LVA_BA as a reference in the outgroups tho.

* A sample of them: 1) GB's HG ancestry is on a cline to the true source but the drift had already occured in her time, 2) the drift had started in her HG population but only completed in a HG population of later than her time, 3) Balto-Slavic drift comes partly from drift in a HG population and partly in further drift in a BA population with HG admixture, and a single HG population with all drift is just a statistical abstraction.

Norfern-Ostrobothnian said...

@Davidski
Could we get coordinates for the southeast Chinese samples: http://bigd.big.ac.cn/gsa-human/browse/HRA000451
Getting ones for GUP001 would be great too.

Davidski said...

@Norfern

Can you compile them into a 1240K Plink file?

Rob said...

Thanks for the explanation, Matt. I think Rou-c_o is a very interesting sample

Norfern-Ostrobothnian said...

The GUP001 genotypes had a mistake in it till now. Here's the Chinese genotypes:
https://www.mediafire.com/file/22cbu1txph5tt83/guangxi.zip/file

Davidski said...

@Norfern

Is this the correct link for GUP001?

https://www.mediafire.com/file/umox84zglz68er6/GUP001.zip/file

Norfern-Ostrobothnian said...

@Davidski
The replaced files have the same link.
Pretty convenient.

ambron said...

Matt, I wonder if the Balto Slavic drift is not just a projection of the neolithic WHG/EEF cline, transferred to the PCA center by the admixture of Yamna/early CWC?

Davidski said...

Seriously WTF?

ambron said...

David, I am seriously wondering because this is what the PCA looks like.

Davidski said...

You mean the PCA here? Are you sure?

https://eurogenes.blogspot.com/2018/05/global25-workshop-2-intra-european.html

ambron said...

I mean WE PCA with ancient and modern samples. Note that the cline created by WHG's participation in the EEF can in principle be moved up, towards Yamna/early CWC.

Davidski said...

The WE PCA focuses on deep ancestry in its first two dimensions, so it doesn't show the effects of Balto-Slavic drift but basically just the ratios of EEF, WHG, steppe etc.

Rob said...

The funny thing is ambron is right; and all the know-it-alls at AG are wrong : Slavs don’t come form kiev culture ; just like Scythians don’t come from Asia :)

ambron said...

Rob, I like you already, friend.

Davidski said...

The kiss of death.

Arza said...

@ EastPole
I would also like to fall off my chair

There was nothing that could cause such fall. And nothing can change this picture:
https://i.postimg.cc/yBpxQjgb/isbspspb.jpg

Arza said...

@ Matt

Admixtools are by design blind to the unreferenced drift. That's one of the reasons why ROU_C_o as a reference gives much weaker signal than Latvia_BA (~7.5% vs. ~35-45% of admixture from a hypothetical source).

So the results you got were expected.

Davidski said...

@Luuk

I'm not sure how else to explain this to you.

All I can say is that the Steppe Maykop outlier with Y-HG T has nothing to do with Khvalynsk judging by his ancestry.

So he's not Khvalynsk related at all. In any way.

Matt said...

@arza, sorry, what do we mean by "unreferenced drift" here?

Arza said...

@ Matt

Specific set of mutations that is present in just one of the populations used to calculate particular f3/f4 statistic.

To have an effect on the f3/f4 statistic this set needs to be shared with another population from the triplet/quadruple.

«Oldest ‹Older   201 – 233 of 233   Newer› Newest»