search this blog

Showing posts with label Slavs. Show all posts
Showing posts with label Slavs. Show all posts

Monday, March 25, 2024

High-resolution stuff


I just emailed this to the authors of High-resolution genomic ancestry reveals mobility in early medieval Europe, a new preprint at bioRxiv [LINK].

I appreciate that Polish population history is not the main focus of your preprint, and also that you're constrained by the lack of relevant and suitably high quality ancient genomes from East-Central and Eastern Europe. However, I must say that your analysis of the Medieval Polish population and resulting conclusions about Polish population history don't reflect reality.

Your Poland_Middle_Ages genomic cluster is made up of just six samples that don't fully represent the genetic complexity of the core population of Medieval Poland.

As a result, you classified PCA0148 as one of the Poland_Middle_Ages outliers, even though this sample isn't an outlier when analyzed within the context of the full set of published Polish Medieval genomes.

Moreover, PCA0148 is very similar to several Polish Viking Age samples that show Scandinavian-specific genome-wide and Y-chromosome haplotypes, and probably likewise shows some Scandinavian-related ancestry.

This is important to note when attempting to recapitulate Polish population history, because it suggests that Scandinavian-related ancestry played a formative role in the shaping of the core Polish Medieval genetic cluster.

Thus, you might be correct when you claim that the six samples in your Poland_Middle_Ages cluster don't show any "detectable" Scandinavian-related ancestry, but this doesn't necessarily mean that this type of ancestry isn't a key part of the post-Iron Age Polish population history.

Below is a self-explanatory Principal Component Analysis (PCA) plot that illustrates my points. Interestingly, Figure 3c in your preprint shows very similar outcomes in regards to the post-Iron Age Polish population history. But the style and scale of your figure makes it difficult to spot the subtle but likely genuine Northwest European-related genetic shifts shown by PCA0148, the Viking context samples and present-day Poles relative to the Poland_Middle_Ages cluster.

However, I'm also skeptical that your Poland_Middle_Ages cluster doesn't carry any detectable or even significant Scandinavian-related ancestry. That's because I suspect that there might be some technical issues with your analysis that are masking this type of ancestry in the Polish samples.

Your top mixture model for the Poland_Middle_Ages cluster is, in all likelihood, an extreme statistical abstraction of reality, rather than a close reflection of it. That's because, due to a combination of historical, geographical and genetic factors, neither Italy.Imperial(I).SG nor Lithuania.IronRoman.SG are realistic formative source populations for the Medieval Polish gene pool.

One of the reasons why you ended up with such a surprising result is probably the lack of suitable samples from East-Central and Eastern Europe, especially those associated with plausibly the earliest Slavic-speaking populations.

It's also possible that basing your mixture model on formal statistics played a key part.

Formal statistics-based mixture models are known to be biased towards outcomes involving mixture sources from the extremes of mixture clines. If your analysis is affected by this problem, then this would help to explain why you characterized the Poland_Middle_Ages cluster as simply a two-way mixture between a Middle Eastern-related group from Imperial Rome and a Baltic population with a very high cut of European hunter-gatherer ancestry.

I do note that on page 6 of your manuscript you consider the possibility that the Southern European-related signal in the Poland_Middle_Ages cluster might only be very distantly related to Italy.Imperial(I).SG, and that it may even have spread across Poland with early Slavic speakers. This is a great point, and I think it should be emphasized and expanded upon, because I suspect that the problem runs deeper than this.

For instance, if the early Slavic ancestors of Poles carried substantially more Southern European-related ancestry than Lithuania.IronRoman.SG, and this ancestry was, say, more Balkan-related than Italian-related, then this might radically change your modeling of the Poland_Middle_Ages cluster. That's because these early Slavs would be positioned in a very different genetic space than Lithuania.IronRoman.SG, which could potentially require a significant signal of Scandinavian-related ancestry to get a robust mixture model.

Finally, it might be useful to consider Isolation-by-Distance as a partial vector for the Italy.Imperial(I).SG-related signal in Medieval Poland.

The full set of published Polish Medieval genomes includes a number of outliers with obvious ancestry from Western Europe and the Balkans. These people probably don't represent any large-scale migrations into Poland, but rather the movements of individuals and small groups. Over time, such small-scale mobility may have had a fairly significant impact on the genetic character of the Polish population.

Update 26/03/2024: I sent another email to Speidel et al., this time in regards to their analysis of present-day Hungarians.

Your preprint also claims that present-day Hungarians are genetically similar to Scythians, and that this is consistent with the arrival of Magyars, Avars and other eastern groups in this part of Europe.

However, present-day Hungarians are overwhelmingly derived from Slavic and German peasants from near Hungary. This is not a controversial claim on my part; it's backed up by historical sources and a wide range of genetic analyses.

Hungarians still show some minor ancestry from Hungarian Conquerors (early Magyars), but this signal only reliably shows up in large surveys of Y-chromosome samples.

The Scythians that you used to model the ancestry of present-day Hungarians are of local, Pannonian origin, and they don't show any eastern nomad ancestry. So they're either acculturated Scythians, or, more likely, wrongly classified as Scythians by archeologists.

And since these so-called Scythians lack eastern nomad ancestry, the similarity between them and present-day Hungarians is not a sign of the impact from Avars, Hungarian Conquerors and the like, but rather a lack of significant input from such groups in present-day Hungarians.

Citation...

Speidel et al., High-resolution genomic ancestry reveals mobility in early medieval Europe, bioRxiv, Posted March 19, 2024, doi: https://doi.org/10.1101/2024.03.15.585102

See also...

Wielbark Goths were overwhelmingly of Scandinavian origin

Saturday, January 13, 2024

Romans and Slavs in the Balkans (Olalde et al. 2023)


It's always amusing to see some random Jovan or Dimitar arguing online that Slavic speakers have been in the Balkans since at least the Neolithic.

Obviously, Slavic peoples only turned up in the Balkans during the early Middle Ages. It's just that their linguistic and genetic impact on the region was so profound that it may seem like they've been there forever.

A new paper at Cell by Olalde et al. makes this point well. See here.

That's not to say, however, that it's an ideal effort. The paper's qpAdm mixture models probably could've been more precise and realistic. Genes of the Ancients has a useful discussion on the topic here.

Interestingly, Olalde et al. admit that they can't detect much, if any, admixture from the Italian Peninsula in the Balkans, even in samples dating to the Roman period. And yet, this doesn't stop them from accepting that the Roman Empire had a massive cultural and demographic impact on the Balkans.

I also assume that, by extension, they don't deny that Latin was introduced into the Balkans from the Italian Peninsula.

That is, Latin spread into the Balkans without any noticeable genetic tracer dye, and it eventually gave rise to modern Romanian spoken by millions of people today in the eastern Balkans. This might be a useful data point to keep in mind when discussing the spread of Indo-European languages into Anatolia.

See also...

Dear Iosif, about that ~2%

Friday, November 10, 2023

Wielbark Goths were overwhelmingly of Scandinavian origin


When used properly, Principal Component Analysis (PCA) is an extraordinarily powerful tool and one of the best ways to study fine-scale genetic substructures within Europe.

The PCA plot below is based on Global25 data and focuses on the genetic relationship between Wielbark Goths and Medieval Poles, including from the Viking Age, in the context of present-day European genetic variation.


I'd say that it's a wonderfully self-explanatory plot, but here are some key observations:

- the Wielbark Goths (Poland_Wielbark_IA) and Medieval Poles (Poland_Middle_Ages) are two distinct populations

- moreover, the Wielbark Goths form a relatively compact Scandinavian-related cluster and must surely represent a homogenous population overwhelmingly of Scandinavian origin

- on the other hand, the Medieval Poles form a more extensive and heterogeneous cluster that overlaps with present-day groups all the way from Central Europe to the East Baltic, and that's because they are likely to be in large part of mixed origin

- I know for a fact that at least some of these early Poles harbor recent admixture, because their burials are similar to those of Vikings and their haplotypes have been shown to be partly of Scandinavian origin (see here)

- one of the Wielbark females is an obvious genetic outlier (Poland_Wielbark_IA_outlier), and basically looks like a first generation mixture between a Goth and a Balt.

Please note that the PCA is only based on relatively high quality genomes, so as not to confuse the picture with spurious results and noise. Also, all outliers with potentially significant ancestry from outside of Central, Eastern and Northern Europe were removed from the analysis. The relevant datasheet is available here.

However, sanity checks are always important when studying complex topics like fine-scale genetic ancestry. To that end I've prepared a graph based on f3-statistics of the form f3(X,Cameroon_SMA,Estonia_BA)/(X,Cameroon_SMA,Ireland_Megalithic), that reproduces the key features of my PCA. The relevant datasheet is available here.

Polish groups from the Middle Ages are marked with the MA suffix, while the Iron Age Wielbark Goths are marked with the IA suffix.

If you're wondering why I plotted the f3-statistics that I did, take a look at this (all groups largely of Scandinavian origin are emboldened):

f3(X,Estonia_BA,Cameroon_SMA)
Poland_Legowo_MA 0.226406
Poland_Ostrow_Lednicki_MA 0.225996
Poland_Plonsk_MA 0.225017
Poland_Trzciniec_Culture 0.224215
Poland_Lad_MA 0.224142
Poland_Viking 0.223838
Poland_Niemcza_MA 0.223659
Poland_Weklice_IA 0.223549
Poland_Kowalewko_IA 0.222584
Poland_Pruszcz_Gdanski_IA 0.222324
Sweden_Viking 0.222091
Russia_Viking 0.222042
Poland_Maslomecz_IA 0.221914
Norway_Viking 0.221825
Denmark_EarlyViking 0.221257
Denmark_Viking 0.221174
England_Viking 0.220979

f3(X,Ireland_Megalithic,Cameroon_SMA)
Poland_Maslomecz_IA 0.219816
Poland_Weklice_IA 0.219501
Denmark_Viking 0.2192
Poland_Kowalewko_IA 0.219176
Poland_Ostrow_Lednicki_MA 0.218916
Norway_Viking 0.218854
Poland_Pruszcz_Gdanski_IA 0.218684
Sweden_Viking 0.218626
Denmark_EarlyViking 0.218529
England_Viking 0.218308
Russia_Viking 0.217999
Poland_Viking 0.217914
Poland_Plonsk_MA 0.217756
Poland_Lad_MA 0.217719
Poland_Legowo_MA 0.21765
Poland_Niemcza_MA 0.217001
Poland_Trzciniec_Culture 0.216551

Interestingly, the Middle Bronze Age samples associated with the Trzciniec Culture (Poland_Trzciniec_Culture) show a closer genetic relationship to Medieval Poles than to Wielbark Goths or Northwestern Europeans. This is indeed the case both in terms of genome-wide and uniparental markers, including some very specific lineages under Y-chromosome haplogroup R1a.

But that's a much more complex issue that I'll leave for another time. So please stay tuned.

See also...

Slavs have little, if any, Scytho-Sarmatian ancestry

Saturday, November 4, 2023

Slavs have little, if any, Scytho-Sarmatian ancestry


Here's an abstract of a new study from the David Reich Lab about ancient Slavs, titled "Genetic identification of Slavs in Migration Period Europe using an IBD sharing graph". Emphasis is mine:

Popular methods of genetic analysis relying on allele frequencies such as PCA, ADMIXTURE and qpAdm are not suitable for distinguishing many populations that were important historical actors in the Migration Period Europe. For instance, differentiating Slavic, Germanic, and Celtic people is very difficult relying on these methods, but very helpful for archaeologists given a large proportion of graves with no inventory and frequent adoption of a different culture. To overcome these problems, we applied a method based on autosomal haplotypes. Imputation of missing genotypes and phasing was performed according to a protocol by Rubinacci et al. (2021), and IBD inference was done for ancient Eurasian individuals with data available at >600,000 1240K sites. IBD links for a subset of these individuals were represented as a graph, visualized with a force-directed layout algorithm, and clusters in this graph are inferred with the Leiden algorithm. One of the clusters in the IBD graph emerged that includes nearly all individuals in the dataset annotated archaeologically as “Slavic”. According to PCA a hypothesis for the origin of this population can be proposed: it was formed by admixture of a Baltic-related group with East Germanic people and Sarmatians or Scythians. The individuals belonging to the “Slavic” IBD sharing cluster form a chronological gradient on the PCA plot, with the earliest samples close to the Baltic LBA/EIA group. Later “Slavic” individuals are shifted to the right, closer to Central and Southern Europeans and probably reflecting further admixture of Slavs with local populations during the Migration Period.

Apparently this abstract is causing a bit of confusion online because of the mention of possible Sarmatian or Scythian ancestry in Slavs.

However, it's important to understand that the authors are referring to certain Slavic or even just Slavic-related individuals, usually from culturally heterogeneous frontier settlements deep in what is now Russia.

So yes, it's possible that some of these individuals carry Sarmatian, Scythian or other exotic eastern ancestry. But even if this is true, then obviously we can't extend this inference to all ancient and modern-day Slavs.

Indeed, below is a G25/Vahaduo Principal Component Analysis (PCA) that shows why modern-day Slavic speakers can't be linked genetically to Sarmatians or Scythians. To experience a more detailed version of the PCA paste the data here into the relevant field here.

As you can see, dear reader, most of the Slavs (Belarusians, Poles, Ukrainians and many Russians) cluster with the Irish near the western end of the plot.

Some Russians are shifted significantly east of them along the "Uralic cline" and, as a result, they cluster with various Uralic speakers such as Mordovians. That's because when Slavs migrated deep into what is now northern Russia they mixed with Uralic speakers who were there before them.

Most of the Sarmatians and Scythians form a cluster southeast of the Slavs and Irish because they carry significant levels of East Asian ancestry. This type of eastern ancestry is basically missing in modern-day Slavs (see here).

Several of the Scythians cluster among the Slavs and Irish, but that's because they're genetic outliers, whose existence, if anything, suggests that some Scythians had significant Slavic-related and/or Irish-related ancestry.

Now, even though most of the Slavs do cluster with the Irish in the above PCA plot, I strongly disagree with the authors of the abstract when they claim that "differentiating Slavic, Germanic, and Celtic people is very difficult" with PCA. It's actually pretty damn easy and I've been doing it successfully for many years. For instance, see here.

See also...

Wielbark Goths were overwhelmingly of Scandinavian origin

The Caucasus is a semipermeable barrier to gene flow

Thursday, October 6, 2022

Balto-Slavs and Sarmatians in the Battle of Himera


G25 coordinates for most of the samples from the recent Reitsema et al. paper are available in a text file here. They're also in the G25 datasheets at the usual link here.

A basic distance analysis with the G25 data at Vahaduo shows that the two samples labeled Himera_480BCE_3 are either early Balts or Slavs. I suspect that they're Slavs, because I believe that early Slavs had this type of Baltic-like genetic structure before mixing with their non-Slavic-speaking neighbors. Well, that's my pet theory for now, so take it or leave it.

Distance to: ITA_Sicily_Himera_480BCE_3:I10943
0.03393838 HUN_IA_La_Tene_o:I18226
0.03572886 DEU_MA_Krakauer_Berg:KRA001
0.03618075 RUS_Pskov_VA:VK159
0.03899963 SWE_Gotland_VA:VK463
0.03915018 Baltic_EST_IA:s19_V12_1

Distance to: ITA_Sicily_Himera_480BCE_3:I10949
0.03573636 HUN_IA_La_Tene_o3:I25524
0.03698768 HUN_IA_La_Tene_o:I18226
0.03732752 SWE_Skara_VA:VK397
0.03767022 Baltic_EST_IA:s19_V12_1
0.03772687 DEU_MA_Krakauer_Berg:KRA001

On the other hand, I'm almost certain that the two Himera_480BCE_4 samples are Sarmatians. The good old G25 does it again!

Distance to: ITA_Sicily_Himera_480BCE_4:I10944
0.03100861 KAZ_Segizsay_Sarmatian:SGZ002
0.03548059 MDA_Sarmatian:I11925
0.03619219 RUS_Urals_Sarmatian:MJ56
0.03626538 RUS_Urals_Sarmatian:chy001
0.03904260 RUS_Urals_Sarmatian:MJ41

Distance to: ITA_Sicily_Himera_480BCE_4:I10947
0.02989458 RUS_Urals_Sarmatian:MJ43
0.03052790 RUS_Urals_Sarmatian:chy002
0.03170622 KAZ_Kangju:DA226
0.03288789 TUR_BlackSea_Samsun_Anc_C:I4529
0.03310149 KAZ_Aigyrly_Sarmatian:AIG003
See also...

Slavic-like Medieval Germans

Tuesday, June 21, 2022

My take on the Erfurt Jews


I had a quick look at the genotype data from the recent Waldman et al. preprint focusing on the ancestry of early Jews from Erfurt, Germany. My impression is that the genetic origins of these Jews are somewhat more complex than claimed in the manuscript.

Indeed, I'd say the Waldman et al. characterization of the Erfurt Jews as a three-way mixture between populations similar to present-day Lebanese, South Italians and Russians doesn't exactly reflect reality.

Unlike Waldman et al., I designed an ADMIXTURE analysis that separated East Asian ancestry into East Asian and Siberian clusters, and also included Mediterranean and North African clusters. The output is available in a spreadsheet HERE. Below is a bar graph based on some of the output.
Now, keeping in mind that ADMIXTURE is not a formal mixture test, and that it estimates ancestry proportions from inferred populations, as opposed to ancient groups that actually existed, here are some key observations:

- in terms of fine scale ancestry, the Erfurt Jews show enough variation to be divided into three or four clusters, as opposed to just two as per Waldman et al.

- some of the Erfurt Jews show excess "Mediterranean" ancestry, while others excess "North African" ancestry, and this cannot be explained with ancestral populations similar to Lebanese and/or South Italians, but rather with significant gene flow from the western Mediterranean and possibly North Africa

- several of the Erfurt Jews show relatively high levels of "East Asian" ancestry that cannot be explained by admixture from Russians, or even any Russian-like populations, because such populations almost lack this type of ancestry, and instead show significant "Siberian" admixture

- as far as I can see, there are no correlations between any of the observations above and the quality of the samples. That is, low coverage doesn't appear to be causing the aforementioned excess "Mediterranean", "North African" and/or "East Asian" ancestry proportions.

Investigating this in more detail with, say, formal statistics will take some time. But I was able to reproduce the results from the above ADMIXTURE run using several somewhat different datasets, so that's something.

It seems to me that Waldman et al. want a simple and elegant model to explain the data, which is understandable, but I do think they should at least expand their ADMIXTURE analysis to include "Siberian", "Mediterranean" and "North African" clusters, and go from there depending on what they find.

Citation...

Waldman et al., Genome-wide data from medieval German Jews show that the Ashkenazi founder event pre-dated the 14th century, bioRxiv, posted May 16, 2022, doi: https://doi.org/10.1101/2022.05.13.491805

See also...

Mediterranean PCA update

Saturday, September 4, 2021

The genomic formation of modern Balkan peoples (Olalde et al. 2021 preprint)


Over at bioRxiv at this LINK. This preprint deals with some very complex issues, so I can't say much about it until I have a good look at the relevant genotype data. However, for now, my impression is that the authors have oversimplified the genetic origins of most Balkan peoples.

For instance, they model the present-day Greek population as a two way mixture between ancient Greeks from a Greek colony in Iberia and present-day Mordovians. The Mordovians are basically a proxy for the Slavs who moved into the Balkans during the Medieval period.

However, the problem is that, strictly speaking, this isn't a historically plausible model, because Mordovians are actually a Uralic-speaking group from the Volga region with significant Siberian ancestry. Needless to say, it's extremely unlikely that anyone like them had an appreciable impact on the present-day Greek gene pool.

So instead I'd like to see the authors try three-way and four-way models with ancients from Mycenae, Anatolia and some places (well to the west of the Volga River) likely to have been inhabited by early Slavs.

Feel free to let me know what you think about this preprint in the comments below. Here's the abstract:

The Roman Empire expanded through the Mediterranean shores and brought human mobility and cosmopolitanism across this inland sea to an unprecedented scale. However, if this was also common at the Empire frontiers remains undetermined. The Balkans and Danube River were of strategic importance for the Romans acting as an East-West connection and as a defense line against “barbarian” tribes. We generated genome-wide data from 70 ancient individuals from present-day Serbia dated to the first millennium CE; including Viminacium, capital of Moesia Superior province. Our analyses reveal large scale-movements from Anatolia during Imperial rule, similar to the pattern observed in Rome, and cases of individual mobility from as far as East Africa. Between ∼250-500 CE, we detect gene-flow from Central/Northern Europe harboring admixtures of Iron Age steppe groups. Tenth-century CE individuals harbored North-Eastern European-related ancestry likely associated to Slavic-speakers, which contributed >20% of the ancestry of today’s Balkan people.

Olalde et al., Cosmopolitanism at the Roman Danubian Frontier, Slavic Migrations, and the Genomic Formation of Modern Balkan Peoples, bioRxiv, posted August 31, 2021, doi: https://doi.org/10.1101/2021.08.30.458211

See also...

A Greek tragedy

Friday, August 27, 2021

R1a vs R1b in third millennium BCE Central Europe (Papac et al. 2021)


R1a-M417 and R1b-L51 are by far the most important Y-chromosome haplogroups in Europe today. More precisely, R1a-M417 dominates in Eastern Europe, while R1b-L51 in Western Europe.

It's been obvious for a while now, at least to me, that both of these Y-haplogroups are closely associated with the men of the Late Neolithic Corded Ware culture (CWC). Indeed, in my mind they're the main genetic signals of its massive expansion, probably from a homeland somewhere north of the Black Sea in what is now Ukraine.

I'm still not exactly sure how the east/west dichotomy between R1a and R1b emerged in Europe, but, thanks to a new paper by Papac et al. at Science Advances, at least now I have a working hypothesis about that. Below is a quote from the said paper, emphasis is mine:

In addition to autosomal genetic changes through time, we observe a sharp reduction in Y-chromosomal diversity going from five different lineages in early CW to a dominant (single) lineage in late CW (Fig. 4A). We used forward simulations to explore the demographic scenarios that could account for the observed reduction in Y-chromosomal diversity. Performing 1 million simulations of a population with a starting frequency of R1a-M417(xZ645) centered around the observed starting frequency in Bohemia_CW_Early (3 of 11, 0.27), we assessed the plausibility of this lineage reaching the observed frequency in Bohemia_CW_Late (10 of 11, 0.91) in the time frame of 500 years under a model of a closed population and random mating (Materials and Methods). We reject the “neutral” hypothesis, i.e., that this change in frequency occurred by chance, given a wide range of plausible population sizes. Instead, our results suggest that R1a-M417(xZ645) was subject to a nonrandom increase in frequency, resulting in these males having 15.79% (4.12 to 44.42%) more surviving offspring per generation relative to males of other Y-haplogroups. We also find that this change in Y chromosome frequency is extreme compared to the changes in allele frequencies at fully covered autosomal 1240k sites within the same males, suggesting a process that disproportionately affected Y-chromosomal compared to autosomal genetic diversity, ruling out a population bottleneck as the likely cause. Our results suggest that the Y-lineage diversity in early CW males was supplanted by a nonrandom process [selection, social structure, or influx of nonlocal R1a-M417(xZ645) lineages] that drove the collapse in Y-chromosomal diversity. A simultaneous decline of Y-chromosomal diversity dating to the Neolithic has been observed across most extant Y-haplogroups (64), possibly due to increased conflict between male-mediated patrilines (65). We view that changes in social structure (e.g., an isolated mating network with strictly exclusive social norms) could be an alternative cause but would be difficult to distinguish in the underlying model parameters.

Right, so even though the CWC was clearly a community of closely related groups, there must have been some competition between its different clans. And since these clans were highly patriarchal and patrilineal, this competition probably led to different paternal lineages dominating different parts of the CWC horizon, with M417 becoming especially common in the east and L51 in the west.

Of course, the expansions of post-Corded Ware groups, such as the M417-rich Slavs in Eastern Europe and L51-rich Celts in Western Europe, were also instrumental in creating Europe's R1a/R1b dichotomy, but obviously these groups were in large part the heirs of the CWC.

By the way, most of the samples from Papac et al. are already in the Global25 datasheets linked here. Look for the labels listed here. Below is a plot made from the Global25 data courtesy of regular commentator Matt.
Citation: L. Papac, M. Ernée, M. Dobeš, M. Langová, A. B. Rohrlach, F. Aron, G. U. Neumann, M. A. Spyrou, N. Rohland, P. Velemínský, M. Kuna, H. Brzobohatá, B. Culleton, D. Daněček, A. Danielisová, M. Dobisíková, J. Hložek, D. J. Kennett, J. Klementová, M. Kostka, P. Krištuf, M. Kuchařík, J. K. Hlavová, P. Limburský, D. Malyková, L. Mattiello, M. Pecinovská, K. Petriščáková, E. Průchová, P. Stránská, L. Smejtek, J. Špaček, R. Šumberová, O. Švejcar, M. Trefný, M. Vávra, J. Kolář, V. Heyd, J. Krause, R. Pinhasi, D. Reich, S. Schiffels, W. Haak, Dynamic changes in genomic and social structures in third millennium BCE central Europe. Sci. Adv. 7, eabi6941 (2021).

See also...

On the origin of the Corded Ware people

Understanding the Eneolithic steppe

Conan the Barbarian probably belonged to Y-haplogroup R1a

Thursday, June 17, 2021

Balto-Slavic drift


A few years ago I began using the term "Balto-Slavic genetic drift" to describe the fine-scale genetic signal that is shared by the speakers of Baltic and Slavic languages to the exclusion of Europeans without significant Balto-Slavic ancestry.

As a result, nowadays, many people online use the term "Balto-Slavic drift" when referring to this phenomenon.

The easiest way to prove that Balto-Slavic drift exists is to run a fine-scale Principal Component Analysis (PCA) of European genetic variation with a lot of Balto-Slavic samples in the mix. Indeed, my Global25 PCA analysis does a great job of illustrating the impact of Balto-Slavic drift on the population structure of Europe both in PCA plots and mixture models (for instance, see here).

It's also possible to tease out Balto-Slavic drift with formal statistics. I showed this indirectly in a recent blog post about Greek population structure (see here). In this post I'm going to demonstrate how to explicitly and formally test for Balto-Slavic drift both in ancient and present-day samples.

To do this we need to find stats that basically split Baltic and Slavic speakers from other Europeans, such as f4(Outgroup,Test;Bell_Beaker_NDL,Baltic_LVA_BA). In this f4-stat, Baltic_LVA_BA is the ancient reference population with an unusually high level of Balto-Slavic drift, while Bell_Beaker_NDL is a fairly similar population overall in terms of ancient ancestry components, but with practically zero Balto-Slavic drift.

Note that the statistics with the most significant Z scores (>3) involve populations that speak Baltic or Slavic languages, or their neighbors who plausibly harbor significant Baltic and/or Slavic ancestry. Among the ancient, mostly Scandinavian, populations (from Margaryan et al. 2020 and marked with the VK2020 prefix), significant Balto-Slavic drift only appears in the more easterly and/or later groups from the Viking Age (VA).


Unfortunately, one of the problems with this analysis is that Baltic_LVA_BA and Bell_Beaker_NDL aren't identical in terms of their ancient ancestry proportions. For one, the latter has significantly more Neolithic farmer ancestry. No wonder then, that Greeks, who are mostly of early farmer stock, don't show a significant Z score, despite probably packing a significant amount of Balto-Slavic ancestry dating to the Middle Ages.

In the near future, as more ancient samples become available, it might be possible to find better reference populations for the job and create more accurate, finer-scaled tests.

See also...

Uralian genes

That old chestnut: Northeast vs Northwest Euros

Friday, May 14, 2021

A Greek tragedy


I wasn't going to blog about the Clemente et al. "Aegean palatial civilizations" paper, because I think that it's a rather strange effort overall. But apparently a lot of people want to know my thoughts on the topic, so here goes.

If you download the relevant PDF file (here) and do a search for "Slav", you'll see that the word doesn't even appear in the bibliography. How is that possible, considering the massive impact that the Slavs had on the Balkans, including Greece, during the Middle Ages?

Indeed, here's a quote from page 12 of the PDF: "Present-day Greeks - who also carry Steppe-related ancestry - share ~90% of their ancestry with MBA northern Aegeans, suggesting continuity between the two time periods."

That's a very optimistic view. In fact, there's no evidence whatsoever in the paper that there's even 1% genetic continuity between present-day Greeks and any ancient Greek population, let alone the MBA northern Aegeans.

The genetic impact of Medieval Slavic migrations on most present-day Greek populations is easy to see. For instance, below are several linear models based on D-statistics of the form D(Outgroup,Test;Ancient1,Ancient2). You don't need a PhD in mathematics to understand them. The relevant data file is available here.

Note that most of the present-day Greek groups cluster together, and they also form fairly neat clines with the other Greeks, as well as Cypriots, other Balkan populations, including those speaking Slavic languages, and also the Slavic-speaking Ukrainians. On the other hand, they don't overlap with any of the ancient groups from Greece and surrounds, nor do they generally form obvious clines with them.

To me this suggests that most present-day Greeks harbor significant levels of Slavic ancestry and some sort of recent Cypriot-related ancestry, and in large part they're only coincidentally similar to ancient Aegeans, including those from the MBA (labeled Greece_Helladic_MBA in my graphs).

And let me assure you that no matter which ancient populations you run in such D-stats, you'll always see similar present-day Greek clusters and present-day Balkan clines.

Obviously, it's fair enough to assume that there's been some genetic continuity in the Aegean from the Iron Age, Bronze Age, and even the Copper Age and Neolithic era to the present-day. But the point I'm making is that no one has yet proved this, or even attempted to measure it properly.

See also...

Greek confirmation bias

Sunday, January 17, 2021

That old chestnut: Northeast vs Northwest Euros


In the last comment thread reader Greg put forth this question:

David, when are you going to explain the genetic discrepancy between Northeastern and Northwestern Europeans? You know, the one that people believe is due to Baltic Hunter-Gatherer admixture, whereas you believe it is due to genetic drift? You ought to make a post about this issue at some point, because a lot of people are wondering what's causing the differences.

Well, Greg, this issue has been discussed to the proverbial death here and elsewhere. In fact, there were two posts and rather lengthy comment threads on the same topic at this blog just a few months ago. See here and here.

Nevertheless, it seems that a fair number of people are still befuddled, so I'm going to try to explain this one last time, as briefly as a I can using just a handful of f4-stats.

Admittedly, Northeast Europeans generally do pack higher levels of indigenous European hunter-gatherer ancestry than Northwest Europeans. This is especially true of Balts, who show more of this type of ancestry than even Scandinavians in practically every type of analysis.

The f4-stats below back this up unambiguously. Note the significantly positive (>3) Z scores, which suggest that Latvians and Lithuanians harbor more Baltic hunter-gatherer-related ancestry than Norwegians and Swedes.

Chimp Baltic_HG Norwegian Latvian 0.001301 7.114
Chimp Baltic_HG Swedish Latvian 0.001017 4.205
Chimp Baltic_HG Norwegian Lithuanian 0.001023 7.341
Chimp Baltic_HG Swedish Lithuanian 0.000763 3.408

Greg, I know what you're thinking: the naysayers are right! But wait, because there's a twist to this tale. Check out these f4-stats:

Chimp Baltic_HG Norwegian Belarusian 0.000265 1.934
Chimp Baltic_HG Swedish Belarusian 0.000152 0.7
Chimp Baltic_HG Norwegian Polish 6.4E-05 0.519
Chimp Baltic_HG Swedish Polish -0.000235 -1.074

Please note, Greg, that none of the Z scores reach significance, which means that these Northwest Europeans and Slavs are symmetrically related to Baltic_HG. They're also symmetrically related to other relevant ancient groups such as the Yamnaya steppe herders. This, of course, suggests that they harbor very similar levels of basically the same ancient genetic components.

Chimp Karelia_HG Norwegian Belarusian 0.000136 0.844
Chimp Karelia_HG Swedish Belarusian 7.9E-05 0.32
Chimp Karelia_HG Norwegian Polish -4.7E-05 -0.304
Chimp Karelia_HG Swedish Polish -0.000134 -0.54

Chimp Yamnaya_Samara Norwegian Belarusian -0.000134 -1.085
Chimp Yamnaya_Samara Swedish Belarusian -6.6E-05 -0.34
Chimp Yamnaya_Samara Norwegian Polish -0.000225 -1.995
Chimp Yamnaya_Samara Swedish Polish -0.000311 -1.574

Chimp Barcin_N Norwegian Belarusian -0.000335 -2.809
Chimp Barcin_N Swedish Belarusian -0.000284 -1.491
Chimp Barcin_N Norwegian Polish -0.000222 -2.057
Chimp Barcin_N Swedish Polish -0.000318 -1.662

Chimp Baikal_N Norwegian Belarusian 0.000186 1.3
Chimp Baikal_N Swedish Belarusian -7E-05 -0.33
Chimp Baikal_N Norwegian Polish -4.6E-05 -0.351
Chimp Baikal_N Swedish Polish -0.000477 -2.277

Interestingly, pairing up Ukrainians with English samples from Cornwall and Kent produces similar outcomes. But that's because most ancient ancestry proportions in Europe show a closer correlation with latitude than longitude.

Chimp Baltic_HG English_Cornwall Ukrainian 0.000282 2.242
Chimp Baltic_HG English_Kent Ukrainian 0.000225 1.748

Chimp Karelia_HG English_Cornwall Ukrainian 0.000323 2.175
Chimp Karelia_HG English_Kent Ukrainian 0.000239 1.634

Chimp Yamnaya_Samara English_Cornwall Ukrainian -6.6E-05 -0.569
Chimp Yamnaya_Samara English_Kent Ukrainian -0.000112 -0.977

Chimp Barcin_N English_Cornwall Ukrainian -0.000519 -4.641
Chimp Barcin_N English_Kent Ukrainian -0.000598 -5.232

Chimp Baikal_N English_Cornwall Ukrainian 0.000385 2.874
Chimp Baikal_N English_Kent Ukrainian 0.00036 2.836

Now, Greg, if at least in terms of genetic ancestry, Latvians, Lithuanians, Belarusians, Poles and Ukrainians all qualify as Northeast Europeans, then what makes them different, as a group, from Northwest Europeans? Do you believe that the key factor is admixture from Baltic hunter-gatherers? Or is it genetic drift?

Of course, considering all of the f4-stats above, logic dictates that it must be relatively recent genetic drift.

Keep in mind, however, that this only applies to Balto-Slavic speaking Northeast Europeans without significant Uralian ancestry. Overall, Uralic speakers have a more complex population history, and indeed genetic differences between them and Northwest Europeans are in large part due to somewhat different ancestry proportions and also Siberian admixture.

See also...

So who's the most (indigenous) European of us all?

Friday, November 13, 2020

Fatyanovo as part of the wider Corded Ware family (Nordqvist and Heyd 2020)


There's a new archeological paper about the Fatyanovo culture at the Proceedings of the Prehistoric Society [LINK]. It includes this quote on page 18:

In the traditional narrative, the Fatyanovo people – like the CWC populations in general – are regarded as Indo-European, representing the pre-Balto-Slavic (-Germanic) stage (Carpelan & Parpola 2001, 88; Anthony 2007, 380; also Gimbutas 1956, 163; Tretyakov 1966, 109) in the spread of Indo-European languages.

That's correct, but considering the latest ancient DNA research on the Fatyanovo people, the traditional narrative is probably wrong. Fatyanovo males were rich in Y-haplogroup R1a-Z93, which is found at very low frequencies in Balto-Slavic populations (see here). It's actually much more common nowadays in Central and South Asia, where it often reaches frequencies of over 50% in Indo-Iranian speaking groups.

Balts and Slavs are rich in R1a-Z282, which is a sister clade of R1a-Z93 that has been found in Corded Ware and Corded Ware-related samples from west of Fatyanovo sites. That is, in present-day Poland and the Baltic states.

Therefore, the origins of the Balto-Slavs should be sought somewhere west of the Fatyanovo culture, probably in the Corded Ware derived populations from what is now the border zone between Poland, Belarus and Ukraine.

Indeed, in my view the Fatyanovo people are more likely to have spoken Proto-Indo-Iranian rather than anything ancestral to Baltic or Slavic (see here).
Nordqvist and Heyd, The Forgotten Child of the Wider Corded Ware Family: Russian Fatyanovo Culture in Context, Proceedings of the Prehistoric Society, online 12 November 2020, DOI: https://doi.org/10.1017/ppr.2020.9

See also...

The oldest R1a to date