search this blog

Saturday, February 13, 2021

The Uralic cline with kra001 - no projection this time

A whole lot of nonsense was posted online, often by people who should've known better, after I claimed that kra001 was a solid proxy for a proto-Uralic genome (see here).

For those of you who still don't get it, below are three Principal Component Analysis (PCA) plots featuring Uralic speakers and other present-day Eurasians. Kra001 is also there. These graphs are based on genotype data not reprocessed Global25 data. The relevant datasheet is available here.

Compared to my previous PCA with kra001, here I included a bigger range of East Eurasian populations to help mitigate the effects of extreme genetic drift in some of the Siberian groups, at least on the first few Principal Components (PCs). Moreover, kra001 wasn't projected onto PCs computed with modern-day samples, so he was free to influence the outcome of the PCA.

Note the east to west clines made up largely of Uralic speaking groups on the first two plots. These plots are based on PCs 1/2 and 1 /3, respectively. The third plot, based on PCs 1/4, is more complex and thus more difficult to interpret, but it also manages to isolate many of the Uralic populations from the others.

The Uralic-specific clines do intersect with the clines and clusters formed by the other linguistic groups. However, based on the three plots, the Yeniseian-speaking Kets are the only Asian group that can plausibly be confused for Uralic speakers.

Importantly, apart from the Kets, kra001 is the only Asian individual who shifts his position on all three plots as if he were a Uralic speaker. This might well be a coincidence, and we'll never know what language was spoken by kra001, but it does suggest to me that his genome is a solid proxy for a proto-Uralic genome.

See also...

First taste of Early Medieval DNA from the Ural region

The BOO people: earliest Uralic speakers in the ancient DNA record?

Fresh off the sledge

Friday, February 5, 2021

Finally, a proto-Uralic genome

Obviously, genes don't speak languages, people do. But sometimes it's possible to associate a linguistic group with a very specific genetic signature.

A while ago many of us in the blogosphere spotted an uncanny connection between the Uralic language family, Y-haplogroup N-L1026 and Nganasan-like genome-wide genetic ancestry.

As a result, we expected a Nganasan-like population rich in N-L1026 to eventually appear in the ancient DNA record, probably somewhere in Siberia and in burials from a likely proto-Uralic archeological culture. This hasn't happened yet, but we now have direct evidence that such a population must have existed somewhere deep in Siberia as early as the Bronze Age.

Kra001, whose genome was published recently along with Kilinc et al., belongs to N-L1026 and, at least in terms of genome-wide genetic structure, could well be from a population directly ancestral to present-day Nganasans. Of course, the Nganasan language is part of the Samoyedic branch of Uralic.

Below is a series of Principal Component Analyses (PCA) featuring kra001. He's labeled RUS_Krasnoyarsk_BA, after the location and age of his burial. Note the obvious Uralic cline running across the plots. That is, from west to east. Kra001 is positioned at the end of this cline very close to a small cluster of Nganasans. To see interactive versions of the plots, paste the Global25 coordinates here into the relevant field here.

Admittedly, there's no way of knowing whether this individual spoke proto-Uralic or not. Indeed, he may have spoken something totally unrelated. The important point is that the very specific genetic signature shared by almost all present-day Uralic speakers, except perhaps Hungarians, is now finally represented in the ancient DNA record. And I can reveal to you that we'll soon be seeing many more ancients very similar to kra001 in upcoming papers.

See also...

The Uralic cline with kra001 - no projection this time

The BOO people: earliest Uralic speakers in the ancient DNA record?

Fresh off the sledge

Wednesday, January 27, 2021

The great shift

Here's a Principal Component Analysis (PCA) featuring some of the ancients from the recent Saag et al. paper at Science Advances. To see an interactive version of the plot paste the Global25 coordinates here into the relevant field here.

Note that the Fatyanovo culture agropastoralists, who are rich in Y-haplogroup R1a and steppe ancestry, cluster with present-day Eastern Europeans. On the other hand, the Volosovo culture singleton sits near the European hunter-gatherer cline that no longer exists.

This Volosovo individual belongs to Y-haplogroup Q1a. However, most of the Volosovo males whose genomes are soon to be published belong to Y-haplogroup R1b.

Thus, in much of Eastern Europe during the Bronze Age, agropastoralists rich in R1a and steppe ancestry replaced hunter-gatherers rich in R1b and with no steppe ancestry. Of course, that's not where the story ends, but I'll get back to that later this year.

By the way, the relatively high coverage Fatyanovo Y-chromosome sequences are being analyzed at YFull. You can check out the results here.

See also...

Sunday, January 17, 2021

A tantalizing link

A new paper at PLoS ONE reports on the first human genomes reliably associated with the Single Grave culture (SGC). They were sequenced from remains in a burial at Gjerrild, Denmark, roughly dating to 2,500 BCE.

Surprisingly, one of the male genomes belongs to Y-haplogroup R1b-V1636, which is an exceedingly rare marker both in ancient and present-day populations.

However, the results do make sense, because the earliest instances of R1b-V1636 are in three Eneolithic males from burial sites on the Pontic-Caspian (PC) steppe in Eastern Europe, which is precisely where one would expect to find the paternal ancestors of the SGC population. The SGC, of course, is the westernmost variant of the Corded Ware culture (CWC), and there's very little doubt nowadays that the CWC had its roots on the PC steppe.

A Copper Age individual from Arslantepe in central Anatolia also belongs to R1b-V1636, which suggests that Northern Europe shared a very specific link with Anatolia via Eastern Europe during a period generally regarded to have been the time of early Indo-European dispersals.

Numerous SGC barrows or kurgans dot the landscape in what are now the Netherlands, northwestern Germany and Denmark. Unfortunately, most SGC human remains have been eaten up by the acidic soils that exist in this area.

Citation: Egfjord AF-H, Margaryan A, Fischer A, Sjögren K-G, Price TD, Johannsen NN, et al. (2021) Genomic Steppe ancestry in skeletons from the Neolithic Single Grave Culture in Denmark. PLoS ONE 16(1): e0244872.

See also...

Maykop ancestry in Copper Age Arslantepe

That old chestnut: Northeast vs Northwest Euros

In the last comment thread reader Greg put forth this question:

David, when are you going to explain the genetic discrepancy between Northeastern and Northwestern Europeans? You know, the one that people believe is due to Baltic Hunter-Gatherer admixture, whereas you believe it is due to genetic drift? You ought to make a post about this issue at some point, because a lot of people are wondering what's causing the differences.

Well, Greg, this issue has been discussed to the proverbial death here and elsewhere. In fact, there were two posts and rather lengthy comment threads on the same topic at this blog just a few months ago. See here and here.

Nevertheless, it seems that a fair number of people are still befuddled, so I'm going to try to explain this one last time, as briefly as a I can using just a handful of f4-stats.

Admittedly, Northeast Europeans generally do pack higher levels of indigenous European hunter-gatherer ancestry than Northwest Europeans. This is especially true of Balts, who show more of this type of ancestry than even Scandinavians in practically every type of analysis.

The f4-stats below back this up unambiguously. Note the significantly positive (>3) Z scores, which suggest that Latvians and Lithuanians harbor more Baltic hunter-gatherer-related ancestry than Norwegians and Swedes.

Chimp Baltic_HG Norwegian Latvian 0.001301 7.114
Chimp Baltic_HG Swedish Latvian 0.001017 4.205
Chimp Baltic_HG Norwegian Lithuanian 0.001023 7.341
Chimp Baltic_HG Swedish Lithuanian 0.000763 3.408

Greg, I know what you're thinking: the naysayers are right! But wait, because there's a twist to this tale. Check out these f4-stats:

Chimp Baltic_HG Norwegian Belarusian 0.000265 1.934
Chimp Baltic_HG Swedish Belarusian 0.000152 0.7
Chimp Baltic_HG Norwegian Polish 6.4E-05 0.519
Chimp Baltic_HG Swedish Polish -0.000235 -1.074

Please note, Greg, that none of the Z scores reach significance, which means that these Northwest Europeans and Slavs are symmetrically related to Baltic_HG. They're also symmetrically related to other relevant ancient groups such as the Yamnaya steppe herders. This, of course, suggests that they harbor very similar levels of basically the same ancient genetic components.

Chimp Karelia_HG Norwegian Belarusian 0.000136 0.844
Chimp Karelia_HG Swedish Belarusian 7.9E-05 0.32
Chimp Karelia_HG Norwegian Polish -4.7E-05 -0.304
Chimp Karelia_HG Swedish Polish -0.000134 -0.54

Chimp Yamnaya_Samara Norwegian Belarusian -0.000134 -1.085
Chimp Yamnaya_Samara Swedish Belarusian -6.6E-05 -0.34
Chimp Yamnaya_Samara Norwegian Polish -0.000225 -1.995
Chimp Yamnaya_Samara Swedish Polish -0.000311 -1.574

Chimp Barcin_N Norwegian Belarusian -0.000335 -2.809
Chimp Barcin_N Swedish Belarusian -0.000284 -1.491
Chimp Barcin_N Norwegian Polish -0.000222 -2.057
Chimp Barcin_N Swedish Polish -0.000318 -1.662

Chimp Baikal_N Norwegian Belarusian 0.000186 1.3
Chimp Baikal_N Swedish Belarusian -7E-05 -0.33
Chimp Baikal_N Norwegian Polish -4.6E-05 -0.351
Chimp Baikal_N Swedish Polish -0.000477 -2.277

Interestingly, pairing up Ukrainians with English samples from Cornwall and Kent produces similar outcomes. But that's because most ancient ancestry proportions in Europe show a closer correlation with latitude than longitude.

Chimp Baltic_HG English_Cornwall Ukrainian 0.000282 2.242
Chimp Baltic_HG English_Kent Ukrainian 0.000225 1.748

Chimp Karelia_HG English_Cornwall Ukrainian 0.000323 2.175
Chimp Karelia_HG English_Kent Ukrainian 0.000239 1.634

Chimp Yamnaya_Samara English_Cornwall Ukrainian -6.6E-05 -0.569
Chimp Yamnaya_Samara English_Kent Ukrainian -0.000112 -0.977

Chimp Barcin_N English_Cornwall Ukrainian -0.000519 -4.641
Chimp Barcin_N English_Kent Ukrainian -0.000598 -5.232

Chimp Baikal_N English_Cornwall Ukrainian 0.000385 2.874
Chimp Baikal_N English_Kent Ukrainian 0.00036 2.836

Now, Greg, if at least in terms of genetic ancestry, Latvians, Lithuanians, Belarusians, Poles and Ukrainians all qualify as Northeast Europeans, then what makes them different, as a group, from Northwest Europeans? Do you believe that the key factor is admixture from Baltic hunter-gatherers? Or is it genetic drift?

Of course, considering all of the f4-stats above, logic dictates that it must be relatively recent genetic drift.

Keep in mind, however, that this only applies to Balto-Slavic speaking Northeast Europeans without significant Uralian ancestry. Overall, Uralic speakers have a more complex population history, and indeed genetic differences between them and Northwest Europeans are in large part due to somewhat different ancestry proportions and also Siberian admixture.

See also...

So who's the most (indigenous) European of us all?

Thursday, January 14, 2021

David Anthony on Y-haplogroup R1a

Archeologist David Anthony has a new theory which attempts to explain why Y-haplogroup R1a hasn't yet been found in any Yamnaya graves. Basically, he thinks that it was carried by Yamnaya men who weren't buried in kurgans, because they were part of a social underclass, and so their remains are now difficult to locate. See here.

This is an interesting attempt to find a socio-archeological solution to a genetic question, but it's unnecessarily complicated and, in fact, also unnecessary.

The important thing to understand about R1a is that it's rarely seen in the ancient DNA record before the rise of the Corded Ware culture (CWC). Moreover, the vast majority of the R1a lineages in the world today belong to the R1a-M417 subclade, which is a relatively young (Eneolithic era?) marker and closely associated with the CWC population and its rapid expansion.

Indeed, modern R1a lineages show a very strong star-like phylogeny indicative of a series of rapid and massive expansions starting from a handful of lineages only a few thousand years ago.

So if R1a was actually present in the Yamnaya population, then the obvious reason why it hasn't yet been found in any Yamnaya remains is because it was only carried by a very small group of Yamnaya men. Simple as that.

Its expansions from the Pontic-Caspian (PC) steppe, predominantly via the highly successful R1a-M417, may have coincidentally and rather ironically started in a socially disenfranchised Yamnaya clan.

But my view is that R1a-M417 just happened to be present in a small group of early Yamnaya or Yamnaya-related males who came up with an economic package that allowed them to expand out of the PC steppe like no one else before them, and so they did just that.

Anthony is currently collaborating on a new paper about the Eneolithic era on the PC steppe with scientists from Harvard's David Reich Lab (see here). I'm really hoping that they get this right.

See also...

Fatyanovo as part of the wider Corded Ware family

Tuesday, December 29, 2020

Fully automated graph exploration

Scientists at Broad MIT are working on a new feature-packed and "lightning fast" version of Admixtools that runs in R. It's already available via this link...

I don't have access to a Linux machine right now, but since this thing runs in R then it also runs in Windows, and I do have a Windows computer here.

One of the most interesting and useful features in the new R package is arguably the find_graphs function, which automatically searches for admixture graphs that reflect the observed f-statistics. That is, once the user chooses the samples and settings, find_graphs runs an unsupervised admixture graph analysis.

Here are a couple of graphs that I knocked out with find_graphs in about five minutes each. The commands and settings that I used are listed in a text file here.

The two topologies above were among the most commonly seen in a series of about 50 runs with the same sample set. A couple of basic inferences based on the output:

- RUS_Progress-Vonyuchka_En harbors GEO_Kotias-Satsurblia_HG-related ancestry, not IRN_Ganj_Dareh_N-related ancestry

- IRN_Ganj_Dareh_N and TKM_Geoksyur_En form a clade to the exclusion of GEO_Kotias-Satsurblia_HG.

The results are certainly in line with those from other types of analyses that I've done on this blog (for instance, see here and here).

Update 05/01/21: Robert Maier, one of the creators of Admixtools2, has left this message in the comments below.

I'm glad to see that there is so much interest in Admixtools2! I very much appreciate any comments and suggestions on how to improve it and how to make it more user friendly.

Because it's still under active development, some things are likely to change in the future. For example, there is a faster successor to "find_graphs", called "find_graphs2", but in the future they will probably be merged into one.

I'm in David Reich’s group at Harvard and Broad and we are hoping to publish a paper describing Admixtools2 where we illustrate its value by using it to test how robust several previously published results are by exploring a large number of alternative models for each of them. If any of you use Admixtools2 to find graphs that are significantly better fits than published graphs and are also historically plausible - or if you find families of graphs that are equally good fits to the published ones but provide qualitatively different conclusions about population relationships - please contact us. That would be a meaningful contribution to the paper we write about this and we’d be open to including someone as a co-author based on identifying case studies like this.

Sunday, December 6, 2020

Looking forward to a post-Covid world

I was hoping that the Covid-19 pandemic wouldn't have an immediate impact on the publication of ancient DNA papers and new data, but considering how much things have slowed down in this respect, it seems that I was fooling myself.

So let's take a break until early next year, and then see what happens.

Trust me, we've got a lot to look forward to in the post-Covid-19 world. Based on what I've heard from various sources, here are some predictions about what we might see:

- the search for the Proto-Indo-European homeland will shift west to the North Pontic steppe

- on the other hand, the search for the Proto-Uralic homeland will move deep into Siberia

- the key role of the Single Grave (westernmost Corded Ware) culture in the population history of Western Europe will finally get some attention

- following on from the above, Y-haplogroup R1b-L51 will be revealed as a Single Grave marker

- the idea that the Pontic-Caspian steppe was colonized by migrants from Mesopotamia during the Bronze Age will be forgotten, and, ironically, we'll instead learn that there was a significant influx of steppe ancestry into ancient Mesopotamia

- Old Kingdom Egyptians will come out less Sub-Saharan African than present-day Egyptians.

I probably shouldn't blab everything out, so that's all you're getting from me for now. You'll just have to wait for the rest until next year, or perhaps even the year after that.

See also...

Friday, November 13, 2020

Fatyanovo as part of the wider Corded Ware family (Nordqvist and Heyd 2020)

There's a new archeological paper about the Fatyanovo culture at the Proceedings of the Prehistoric Society [LINK]. It includes this quote on page 18:

In the traditional narrative, the Fatyanovo people – like the CWC populations in general – are regarded as Indo-European, representing the pre-Balto-Slavic (-Germanic) stage (Carpelan & Parpola 2001, 88; Anthony 2007, 380; also Gimbutas 1956, 163; Tretyakov 1966, 109) in the spread of Indo-European languages.

That's correct, but considering the latest ancient DNA research on the Fatyanovo people, the traditional narrative is probably wrong. Fatyanovo males were rich in Y-haplogroup R1a-Z93, which is found at very low frequencies in Balto-Slavic populations (see here). It's actually much more common nowadays in Central and South Asia, where it often reaches frequencies of over 50% in Indo-Iranian speaking groups.

Balts and Slavs are rich in R1a-Z282, which is a sister clade of R1a-Z93, and has been found in Corded Ware and Corded Ware-related samples from west of Fatyanovo sites. That is, in present-day Poland and the Baltic states.

Therefore, the origins of the Balto-Slavs should be sought somewhere west of the Fatyanovo culture, probably in the Corded Ware derived populations from what is now the border zone between Poland, Belarus and Ukraine.

Indeed, in my view the Fatyanovo people are more likely to have spoken Proto-Indo-Iranian rather than anything ancestral to Baltic or Slavic (see here).
Nordqvist and Heyd, The Forgotten Child of the Wider Corded Ware Family: Russian Fatyanovo Culture in Context, Proceedings of the Prehistoric Society, online 12 November 2020, DOI:

See also...

The oldest R1a to date

Saturday, November 7, 2020

Slavic-like Medieval Germans

The samples labeled DEU_Krakauer_Berg_MA in the Principal Component Analysis (PCA) plot below are from a recent paper by Parker et al. at Scientific Reports. Their remains were excavated from a Medieval cemetery in the now abandoned village of Krakauer Berg in eastern Germany.

Krakauer sounds sort of like Kraków, doesn't it? That's probably not a coincidence, especially considering how these people behave in my analysis. To see an interactive version of the plot, paste the coordinates from the text file here into the relevant field here.

See also...

Yamnaya-related ancestry proportions in present-day Poles

Warriors from at least two different populations fought in the Tollense Valley battle

Viking world open analysis and discussion thread