Sunday, May 19, 2019

Who were the people of the Nordic Bronze Age?

Ancient DNA has revealed that large scale migrations and population replacements have often accompanied major cultural changes in prehistoric Europe. But, for now, my opinion is that the formation of the archeologically ostentatious Nordic Bronze Age wasn't associated with any significant foreign gene flow into Scandinavia. I've tested this as best as I could with the few relevant ancient samples that are currently available.

For instance, below are among the most successful qpAdm mixture models that I was able find for various ancient Scandinavian groups dating back to the local Middle Neolithic (MN) period. The Nordic Bronze Age population is represented by three individuals labeled Nordic_BA. Unfortunately, the guy pictured above, from the famous Borum Eshøj barrow burial in what is now Denmark, didn't make the cut. For more details about my sampling and labeling strategies refer to the text file here.

CWC_CZE 0.822±0.059
POL_Globular_Amphora 0.178±0.059
chisq 14.478
tail prob 0.341086
Full output

CWC_Baltic_early 0.662±0.028
POL_Globular_Amphora 0.338±0.028
chisq 11.234
tail prob 0.591189
Full output

Nordic_MN_B 0.928±0.069
SWE_TRB 0.072±0.069
chisq 12.139
tail prob 0.516307
Full output

Nordic_LN 0.851±0.061
SWE_TRB 0.149±0.061
chisq 10.897
tail prob 0.619475
Full output

It's impossible to successfully model the ancestries of Nordic_MN_B and SWE_Battle_Axe simply with the populations that were living in Scandinavia before them. Therefore, it's likely that they were migrants or the recent descendants of migrants to Scandinavia. But there's nothing surprising about that, because they're archeologically associated with the Corded Ware culture (CWC), which has always been seen as intrusive to Scandinavia from the south and east.

Conversely, it's easy to produce statistically sound mixture models for both Nordic_LN and Nordic_BA exclusively with earlier Scandinavian populations. Indeed, based on the outgroups or right pops that I'm using, Nordic_LN is almost indistinguishable from Nordic_MN_B, and the same can be said of Nordic_BA in regards to Nordic_LN.

Of course, if I mixed and matched reference populations from across prehistoric Europe, I could probably come up with some spectacular statistical fits even without the need for any Scandinavians. Essentially that's because Nordic_LN and Nordic_BA are closely related to many earlier and contemporaneous peoples living all the way from the Atlantic facade to the Ural Mountains. My point, however, is that this isn't crucial, despite the dearth of ancient samples from Scandinavia.

This is how things look in a Principal Component Analysis (PCA) of Northern European genetic variation based on my Global25 test. Strikingly, Nordic_MN_B, SWE_Battle_Axe, Nordic_LN and Nordic_BA more or less recapitulate the cluster made up of present-day Swedish samples. The relevant datasheet is available here.
Granted, two of the Nordic_BA samples sit just south of the Swedes, no doubt due to their slightly higher ratios of Neolithic farmer (SWE_TRB-related) ancestry, but this is also an area of the plot that many present-day Danes call home (not shown, because I don't have any suitable academic Danish samples to run).

I'll eat my hat if it turns out that Scandinavia experienced a major population shift (say, more than a collateral ~10%) during the LN and/or BA periods. And I'll post a clip of it online too.

The Trundholm sun chariot was found in a peat bog on the island of Zealand, Denmark, in 1902. It's thought to be an Indo-European religious artifact dating back to the Nordic Bronze Age; a representation of a horse pulling the sun and perhaps also the moon in a spoked wheel chariot. So one way or another it appears to be a reference to the Divine Twins mythos. Click on the image for more...

Thursday, May 16, 2019

Fresh off the sledge

As things stand, the closest individual to a Proto-Uralic speaker in the ancient DNA record is arguably OLS10 from an Iron Age tarand grave in what is now Estonia. I say that because:

- isotopic data suggest that OLS10 wasn't born where he died, and considering his elevated Siberian ancestry relative to earlier and most contemporaneous Baltic ancients, he was very likely a migrant to the Baltic region from the east

- the tarand grave tradition appears to be specifically a Finnic (west Uralic) phenomenon that probably spread from the Volga-Oka region, which is just west of where most people place the Proto-Uralic homeland

- OLS10 belongs to Y-chromosome haplogroup N-L1026, a paternal marker that is especially closely associated with Uralic-speaking populations and probably only appeared in the East Baltic region during the transition from the Bronze Age to the Iron Age

You can find more background info about OLS10 and other relevant samples in Saag et al. 2019 (see here). This is where he sits in my Principal Component Analyses (PCA) focusing on fine scale Northern European genetic diversity. The relevant datasheets are available here and here, respectively.

Note that OLS10 doesn't cluster strongly with any ancient or modern populations. To investigate this in more detail I ran a series of two-way qpAdm analyses, testing tens of ancient individuals and populations as potential admixture sources. These two models stood out above the rest in terms of their statistical fits, chronology and overall plausibility.

Baltic_EST_BA 0.826±0.045
RUS_Sintashta_MLBA_o1 0.174±0.045

chisq 12.527
tail prob 0.564048
Full output

Baltic_EST_BA 0.683±0.102
RUS_Mezhovskaya 0.317±0.102

chisq 13.811
tail prob 0.463864
Full output

Please note that RUS_Sintashta_MLBA_o1 isn't representative of the Sintashta culture population as a whole. It's a group of the most extreme genetic outliers among the Sintashta samples, and they may or may not have been Uralic speakers (see here). Interestingly, the Mezhovskaya culture population is generally associated with the Ugric branch of the Uralic language family.

I was also able to closely replicate these results with the Global25/nMonte method; down to almost one per cent. However, the statistical fits (distances) are poor, probably because the reference populations aren't the real mixture sources. This is in line with the fact that their Y-haplogroups are Q1a, R1a and R1b, rather than any type of N.





I do realize that two Bronze Age samples from Bolshoy Oleni Ostrov, Kola Peninsula, belong to N-L1026, but adding them to my mixture models doesn't help. Little wonder, because the Kola Peninsula lies within the Arctic Circle, and I'm pretty sure that OLS10 and his N-L1026 came from somewhere just north of the mixture cline marked on the map below. Unfortunately, I can't test this directly yet due to the scarcity of ancient samples from this region.

Saturday, May 11, 2019

Uralic-specific genome-wide ancestry did make a signifcant impact in the East Baltic

I've started analyzing the ancient genotype data from the recent Saag et al. paper on the expansion of Uralic languages and associated spread of Siberian ancestry into the East Baltic region. The paper is freely available here and the data are here.

I really like the paper, but I don't agree with the authors' claim that the appearance of Y-chromosome haplogroup N in what is now Estonia and surrounds during the Iron Age is "not matched by a clear shift in autosomal profiles". In my opinion it certainly is, and, as one would expect, it's a shift towards a genetic profile typical of western Uralic speakers.

I'd say that the easiest way to find this signal is with a Principal Component Analysis (PCA) focusing on fine scale genetic substructures within Northern Europe, like the one below. The relevant datasheet is available here.

Note that the East Baltic Iron Age samples, all from burial sites in what is now Estonia, appear to be peeling away from their Bronze Age predecessors and overlapping strongly with present-day Estonians, who are Uralic speakers. Indeed, the PCA suggests to me that the formation of the greater part of the present-day Estonian gene pool took place in the East Baltic during the transition from the Bronze Age to the Iron Age. That is, when Uralic languages are generally accepted to have arrived in the region from near the Ural Mountains in the east.

I was also able to closely replicate these outcomes with my Global25 data using the method described here. However, in this effort, present-day Estonians are clearly more western than the Estonian Iron Age samples (EST_IA), which might be due to the presence of low level Germanic ancestry in Estonia dating to the medieval period. The relevant datasheet is available here.

Interestingly, the Estonian Bronze Age samples (EST_BA) come from stone-cist graves which are widely hypothesized to have been introduced to the East Baltic from the Nordic Bronze Age civilization. I even recall reading a paper on the topic which claimed that the remains buried in such graves were those of Proto-Germanic-speaking Scandinavian migrants. Well, I haven't had a chance to study these samples in any great detail yet, but considering that in both of the PCA above they're overlapping strongly with Latvian Bronze Age samples (LVA_BA) and sitting far away from the nearest Scandinavians, I'd say they're probably of local stock from way back.

Thursday, May 9, 2019

It was always going to be this way

The native peoples of the East Baltic - Estonians, Latvians and Lithuanians - are genetically alike and their paternal gene pools are dominated by the same two Y-chromosome haplogroups: R1a and N3a.

Linguistically, however, Estonians are a world apart from Latvians and Lithuanians. That's because the Estonian language belongs to the Uralic language family, which has an obvious North Eurasian character. On the other hand, Latvian and Lithuanian are both classified as Indo-European languages, along with the vast majority of other European languages.

The Uralic and Indo-European language families may or may not descend from the same ancestral tongue, but even if they do, their relationship is very distant.

So how is it that Estonians came to speak a Uralic language? As far back as I can remember, the basic explanation accepted by most people was that Uralic speech arrived in what is now Estonia and neighboring Finland during the Bronze Age with migrants, or perhaps invaders, rich in N3a from somewhere around the Ural Mountains. Conversely, Latvians and Lithuanians were generally assumed to have retained the Indo-European speech of their R1a-rich forefathers from the Pontic-Caspian steppe, who colonized much of Eastern Europe north of the steppe during the Late Neolithic.

Ancient DNA has now uncannily corroborated these theories (for instance, see Mittnik et al. 2018 and, published today, Saag et al. 2019). All it took was a handful of samples from a few relevant sites. I think that's awesome; I love it when sensible, long-standing hypotheses are validated by cutting edge science.

I'll have a lot more to say about the spread of Uralic languages and Uralian genes to the East Baltic when I get my hands on the genotype data from the new Saag et al. paper. I also have a post coming soon about the Nordic Bronze Age. Stay tuned.

Update 10/05/2019: Uralic-specific genome-wide ancestry did make a signifcant impact in the East Baltic

Tuesday, May 7, 2019

The execution

Around 2,800 BCE, in what is now southern Poland, a family group of fifteen individuals associated with the Globular Amphora culture (GAC) were massacred. They were probably captured and executed, because each victim was killed with a blow to the head from the same type of weapon, possibly a stone axe, and lacked defensive wounds. The dead were mostly women and children. They were buried in a mass grave, but with great care and very likely by someone who knew them well.

This Late Neolithic mass grave is the focus of a new ancient DNA and archeological research paper at PNAS by Schroeder et al. (see here). The authors tentatively attribute the massacre to the Corded Ware culture (CWC) people, who were expanding rapidly at the time across much of Europe from their homeland on the Pontic-Caspian steppe.

The CWC people may or may not have been responsible; we'll never know for sure. The perpetrators could just as easily have been a competing GAC family group.

In any case, it's interesting to see that the GAC males belong to Y-chromosome haplogroup I2a-L801. This is today a rather uncommon subclade of I2, and almost exclusively found in Germanic-speaking populations, especially Scandinavians. To me this suggests that some Polish GAC males were incorporated into Indo-European-speaking CWC populations that ended up in Scandinavia, and their paternal lineages eventually became a part of the Proto-Germanic gene pool. Admittedly, though, that's just one of many possible scenarios.

Sunday, May 5, 2019

Conan the Barbarian probably belonged to Y-haplogroup R1a

A fresh batch of Iron Age genomes from across the Eurasian steppe is about to be published along with a new paper at Current Biology. The manuscript, titled Shifts in the Genetic Landscape of the Western Eurasian Steppe Associated with the Beginning and End of the Scythian Dominance, is still under review but freely available here.

Most of the male ancients, including two Cimmerians from the North Pontic steppe, in what is now Ukraine, belong to Y-chromosome haplogroup R1a. Wasn't Conan the Barbarian supposed to be a Cimmerian? From the preprint, emphasis is mine:

The Early Iron Age nomadic Scythians have been described as a confederation of tribes of different origins, based on ancient DNA evidence [1-3]. It is still unclear how much of the Scythian dominance in the Eurasian Steppe was due to movements of people and how much reflected cultural diffusion and elite dominance. We present new whole-genome sequences of 31 ancient Western and Eastern Steppe individuals including Scythians as well as samples pre- and postdating them, allowing us to set the Scythians in a temporal context (in the Western/Ponto-Caspian Steppe). We detect an increase of eastern (Altaian) affinity along with a decrease in Eastern Hunter-Gatherer (EHG) ancestry in the Early Iron Age Ponto- Caspian gene pool at the start of the Scythian dominance. On the other hand, samples of the Chernyakhiv culture postdating the Scythians in Ukraine have a significantly higher proportion of Near Eastern ancestry than other samples of this study. Our results agree with the Gothic source of the Chernyakhiv culture and support the hypothesis that the Scythian dominance did involve a demic component.


Out of the 31 samples of this study, 16 are male, and with sufficient Y-chromosome coverage for haplogroup assignment (Table S2). R1a (43%) and I (27%) are the two most frequent Y- chromosome hgs in present-day Ukrainians [142]. R1a is also the predominant lineage among Cimmerians, Scy_Ukr and ScySar_SU in our data, and present among Scy_Kaz as well. Thus, although acknowledging our small sample size, the individuals sampled from archaeological context associated with Scythian identity do not appear to stand out from the context of other groups living in the region before and after them. One notable difference from the present is the absence of hg N, nowadays widespread in the Volga-Uralic region and West Siberia as well as among Mongols and Altaians [165-167]; however, this result is consistent with the absence of hg N among Bronze Age and Eneolithic males from the Steppe [168]. In context of their claimed Altaian homeland it is interesting to note that one Scy_Ukr and the single Sar_Cau sample belong to the Q1c-L332 lineage which is a sub-clade of hg Q1c-L330 that today has peak frequency of 68% in Western Mongolians [169] and occurs at 17% in South Altaians [170] while being very rare (<1%) in East European populations and absent elsewhere (

Järve et al., Shifts in the Genetic Landscape of the Western Eurasian Steppe Associated with the Beginning and End of the Scythian Dominance, Current Biology (preprint), Posted: 6 Mar 2019,

Friday, May 3, 2019

Inferring the linguistic affinity of long dead and non-literate peoples: a multidisciplinary approach

Ancient DNA has treated us to many surprises in recent years. But it has also uncannily corroborated some well established hypotheses that were formulated decades ago from historical linguistics and archeological data. One such hypothesis is that the population associated with the Late Neolithic Corded Ware culture (CWC), and its myriad offshoots, spoke early Indo-European languages and spread them across much of Europe and into the Indian subcontinent.

Below is a series of figures in which I explain why the CWC and its likely close relative, the Sintashta culture, are widely regarded as early Indo-European-speaking cultures, even though their languages aren't attested. To view the images at their maximum size, right click on the thumbs and choose "open link in a new tab".

It's a damn shame that we still don't know where the modern domesticated horse lineage ultimately came from. I'm pretty sure that it came from the Pontic-Caspian steppe, but I was hoping this would be confirmed in the latest paper on horse genomics published today at Current Biology: Tracking Five Millennia of Horse Management with Extensive Ancient Genome Time Series. Nope, the topic wasn't even covered, and no wonder, because the sampling strategy in the paper didn't allow it to be. What we desperately need are samples associated with such archeological cultures as Khvalynsk, Repin, Sredny Stog and Yamnaya. Maybe next time, eh?

Thursday, April 25, 2019

Some myths die hard

Ancient DNA tells us that the Bronze Age wasn't kind to the indigenous populations of Central Asia. It seems to have wiped them out totally. Indeed, Central Asia might well be the only major world region in which native hunter-gatherers failed to make a perceptible impact on the genetics of any extant populations.

Before the Neolithic transition, much of Central Asia was home to hunter-gatherers closely related to those of nearby western Siberia. During the Neolithic, agriculturalists and pastoralists from the Near East gradually moved into the more arable parts of southern and eastern Central Asia, eventually giving rise to the Bactria Margiana Archaeological Complex, or BMAC, and other similar communities.

It's not clear what their relationship was like with the native hunter-gatherers in these areas. But they did mix with them in varying degrees. This is obvious because genome-wide genetic ancestry characteristic of the Botai people, who hunted and eventually domesticated horses on the Kazakh steppe during the 4th millennium BCE, and were probably the archetypal Central Asians for their time, is found at significant levels in a number of later samples from Central Asian farmer and pastoralist sites, such as Dali, Gonur Tepe and Sarazm.

Thus, even though the Neolithic transition did have a big impact on Central Asia, and clearly led to large scale population replacements in some parts of the region, this was just the beginning of these population shifts. Moreover, in some cases the expanding farmer and pastoralist populations seem to have acquired significant indigenous Central Asian ancestry and spread it with them.

The precise geographic extent of the relatively unique Botai-related ancestry in prehistoric Eurasia is still something of a mystery. But to give you a general picture of where it was found from around 6,000 BCE to 2,000 BCE, here's a map with info about samples with significant levels of this type of ancestry from a wide range of sites in space and time.

Going by this map, I'd say it's safe to infer that the Botai-related ancestry was a major feature of practically all forager populations living between the Caspian Sea and the Altai Mountains. It was also present in the Early Bronze Age (EBA) pastoralist population associated with the Steppe Maykop archeological culture of Eastern Europe, so it may have already been in Europe as early as 3,800 BCE, because that's when the Steppe Maykop culture first appeared.

It's an interesting question where the ancestors of the Steppe Maykop herders came from. I once simply assumed that they were closely related to the Maykop people who lived in the Caucasus Mountains. But it's now clear that the populations associated with these two similar cultures were starkly different, with the Maykop people being basically of Near Eastern origin and lacking any discernible Botai-like ancestry. My guess for now is that the Steppe Maykop herders were in large part the descendants of the Kelteminar culture population from just east of the Caspian Sea, but we'll see about that when more ancient DNA comes in.

The other great mystery is what eventually happened to the Steppe Maykop people. Around 3,000 BCE, their culture vanished from the archeological record and their particular genetic signature disappeared from the steppe ancient DNA record. Where did they go? Did they migrate back east?

I don't know, but at about that time other Eastern European steppe herders, those associated with the Yamnaya and Corded Ware archeological cultures, began to stir and migrate in big numbers in basically all directions, including into Steppe Maykop territory. Indeed, unlike the Steppe Maykop population, these groups weren't closely related to any contemporaneous or earlier Central Asians. But they ended up moving into Central Asia, and in a big way too.

Their impact all the way from the Ural Mountains to what are now China and India was profound. For instance, not only did they end up totally replacing the Botai people, but also their horses. For more details on this topic check out the Youtube clip here. I have a strong suspicion that the same sort of thing happened to the aforementioned Steppe Maykop people. In other words, they may have been forced out from the Eastern European steppe, and perhaps sought shelter in the Caucasus Mountains?

Admittedly, I'm not offering anything new here. I just wanted to emphasize a few key points, because I'm still seeing some confusion online about the population history of Central Asia, and especially how it relates to the population history of Europe, and also the Proto-Indo-European homeland question. Make no mistake, thanks to the ancient DNA already available from Central Asia, we can confidently infer the following:

- the chance that the ancient European populations associated with the Yamnaya, Corded Ware and other closely related archeological cultures formed as a result of migrations from Central Asia is zero

- the chance that the Proto-Indo-European homeland was located in Central Asia is zero

- the chance that present-day Europeans, by and large, derive from any ancient Central Asian populations is zero

Monday, April 22, 2019

R1b-M269 in the Bronze Age Levant

The new Harvard genotype datasets that I blogged about recently include a couple of potentially very useful samples from the Levant dated to 1400-1100 BCE. Search for IDs I2062 and I1934 in the anno files here. They're both from an archeological paper about a Late Bronze Age (LBA) burial site in what is now Israel that was published back in 2017 (see here).

Surprisingly, individual I2062 is listed in the anno files as belonging to Y-haplogroup R1b1a1a2, which is also known as R1b-M269. The reason that this is a surprise to me is because R1b-M269 is closely associated with the Bronze Age expansions of pastoralists from the Pontic-Caspian steppe in Eastern Europe, and these expansions didn't impact the Levant in any direct or significant way.

The Y-haplogroup assignment may or may not be correct. Sometimes the Y-haplogroups in these sorts of datasheets are indeed wrong. Unfortunately, as far as I know, the BAM file for I2062 isn't available anywhere online, so I can't check whether he does really belong to R1b-M269. But, intriguingly, his autosomes do show a subtle signal of Yamnaya-related ancestry from the Pontic-Caspian steppe that is missing in earlier ancients from the Levant.

To characterize his genome-wide ancestry, I first ran a series of unsupervised and supervised analyses with the Global25/nMonte3 method (using this datasheet). For the sake of simplicity, I narrowed things down to the mixture models below based on three reference populations each. Levant_ISR_C is made up of Chalcolithic samples from Israel. The identities of the other reference sets should be obvious to most readers. If confused, feel free to ask for more details in the comments below.


[1] distance%=1.8905


[1] distance%=2.0856


[1] distance%=2.1738

To further confirm the reliability of my models, I tested them with the formal statistics-based qpAdm software. As far as I can tell, the output from qpAdm looks very solid across the board.

IRN_Seh_Gabi_C 0.193±0.052
Levant_ISR_C 0.710±0.038
Yamnaya_RUS_Samara 0.098±0.026

chisq 9.304
tail prob 0.67676
Full output

Kura-Araxes_ARM_Kaps 0.249±0.076
Levant_ISR_C 0.681±0.051
Yamnaya_RUS_Samara 0.071±0.035

chisq 11.101
tail prob 0.52032
Full output

Levant_ISR_C 0.661±0.042
Kura-Araxes_RUS_Velikent 0.339±0.042

chisq 7.979
tail prob 0.844942
Full output

Admittedly, even though I2062 can be modeled with Yamnaya-related admixture, he doesn't need to be. Indeed, his ratio of this type of ancestry varies significantly between the models, from around 10% to nothing. This appears to be dependent on the geography of the non-Levant and non-Yamnaya reference populations; the closer they are to the Pontic-Caspian steppe, the smaller the ratio of Yamnaya-related ancestry in I2062. I'd describe this as an artifact of the isolation-by-distance phenomenon, and it totally makese sense, but it prevents me from confirming beyond any doubt that I2062 does harbor genome-wide steppe ancestry. Unfortunately, individual I1934 doesn't offer enough data to be analyzed with the same methods.

Samples associated with the Kura-Araxes or Early Transcaucasian culture are particularly strong references for the eastern ancestry in I2062. This probably isn't a coincidence, and it might also explain his Y-haplogroup, because, at its maximum extent, the territory occupied by the Kura-Araxes culture stretched all the way from the Pontic-Caspian steppe to the southern Levant. The map below is from Wilkinson 2014.

By the way, what's the chance that I2062 is an awesome proxy for the earliest Jews? I reckon it's pretty good, considering that Samaritans from Israel are his closest present-day population in terms of genome-wide affinity. Who wants to test this theory with the Global25? If I see some good stuff in the comments I'll post it here in an update.

Thursday, April 18, 2019

Early chariot riders of Transcaucasia came from...

I'm finding it increasingly difficult nowadays to fully appreciate all of the ancient DNA samples that are accumulating in my dataset. But it's not entirely my fault.

Among the hundreds of ancient samples published last year there was a couple of Middle Bronze Age (MBA) individuals from what is now Armenia labeled "Lchashen Metsamor" (see here). I wasn't planning to do much with these samples because, even after reading the Nature paper that they came with a couple times over, I didn't have a clue what they were about. But after some digging around, I now know that their people, those associated with the Lchashen Metsamor archeological culture, were among the earliest in Transcaucasia, and indeed the Near East, to use the revolutionary spoked-wheel horse chariot. How awesome is that?

The invention of the spoked-wheel chariot is generally credited to the Middle Bronze Age Sintashta culture of the Trans-Ural steppe in Central Asia, and its rapid spread is often associated with the early expansions of Indo-European languages deep into Asia. On the other hand, some have argued that this type of chariot was first developed in the Near East, and directly derived from solid-wheeled wagons pulled by donkeys.

It's now obvious, thanks to ancient DNA, that the Sintashta people were by and large migrants to Central Asia from somewhere in Eastern Europe, and that they didn't harbor any recent ancestry from the Near East. So if chariot technology spread into the steppes from the Near East, then it did so without any accompanying gene flow, which is possible but not entirely convincing. This begs the question of whether the Lchashen Metsamor population was of Sintashta-related origin, because if it was, then this would corroborate the consensus that spoked-wheel chariots were introduced into Transcaucasia from the steppes to the north.

Below is a Principal Component Analysis (PCA) of West Eurasian genetic variation. It does suggest that the Lchashen Metsamor pair (labeled Armenia_MBA_Lchashen), as well as most of the other currently available samples from what is now Armenia dating to the Middle to Late Bronze Age (MLBA), harbor some steppe ancestry. That's because they appear to form a cline between samples associated with the Sintashta and Kura-Araxes cultures. Of course, the Kura-Araxes culture was a major Early Bronze Age (EBA) archeological phenomenon centered on Transcaucasia and surrounds, so its population can be reasonably assumed to have formed the genetic base of most subsequent populations in the region. The relevant PCA datasheet is available here.

To investigate the possibility of Sintashta-related admixture in Lchashen Metsamor with formal methods, I ran a series of mixture models with the qpAdm software. Here are the three statistically most sound outcomes that I was able to come up with for Lchashen Metsamor:

CWC_Kuyavia 0.183±0.036
Kura-Araxes_Kaps 0.817±0.036
chisq 13.941
tail prob 0.378021
Full output

Balkans_BA_I2163 0.193±0.045
Kura-Araxes_Kaps 0.807±0.045

chisq 14.780
tail prob 0.321267
Full output

Kura-Araxes_Kaps 0.788±0.043
Sintashta_MLBA 0.212±0.043

chisq 14.871
tail prob 0.315451
Full output

I sorted the output by "tail prob", but the fact that Sintashta_MLBA is in third place isn't a problem because the stats in all of these models are basically identical. Indeed, CWC_Kuyavia (Corded Ware culture samples from present-day Kuyavia, North-Central Poland) and Balkans_BA_I2163 (a Bronze Age singleton from what is now Bulgaria) are both very similar and probably closely related to each other and to the Sintashta samples.

Interestingly, and, I'd say, importantly, ancients from the steppe that are closest to Lchashen Metsamor in both space and time, but not particularly closely related to the Sintashta people, don't work too well as a mixture source in such models.

Kubano-Tersk 0.184±0.046
Kura-Araxes_Kaps 0.816±0.046

chisq 22.179
tail prob 0.0526526
Full output

A couple of months ago I suggested that populations associated with the Early to Middle Bronze Age (EMBA) Catacomb culture were the vector for the spread of steppe ancestry into what is now Armenia during the MLBA (see here). After taking a closer look at the Lchashen Metsamor samples, I now think that the peoples of the Sintashta and related cultures were also important in this process. If so, they may have moved from the steppe into Transcaucasia both from the west via the Balkans and the east via Central Asia, and brought with them spoked-wheel chariots. I don't have a clue what language they spoke, but I'm guessing that it may have been something Indo-European.

Friday, April 12, 2019

Armenians vs Georgians

Armenians and Georgians are ethnic groups that live side by side in the south Caucasus, or Transcaucasia. By all accounts, they've both been there since prehistoric times and they're very similar in terms of overall genetic structure.

However, they speak languages from totally unrelated families: Indo-European and Kartvelian, respectively. How did this happen and might the answer lie in the small genetic differences that do exist between them?

To investigate this issue, I ran a series of qpAdm formal mixture models of present-day Armenians and Georgians using tens of ancient reference populations. To come up with as straightforward and meaningful results as possible, I constrained myself to two-way models. I then discarded the runs that produced "tail probs" under 0.1 and retained less than 400K SNPs. Only a handful of models passed muster, including these two:

Mycenaeans_&_Empuries2 0.233±0.041
Kura-Araxes_Kaps 0.767±0.041

chisq 18.422
tail prob 0.142151
Full output

Globular_Amphora 0.071±0.025
Kura-Araxes_Kaps 0.929±0.025

chisq 18.419
tail prob 0.142266
Full output

At the most basic level, the results suggest that both Armenians and Georgians are overwhelmingly derived from populations of Bronze Age Transcaucasia associated with the Kura-Araxes archeological culture, albeit with minor ancestries from somewhat different sources from the west. As far as I can see, when using more than 400K SNPs and a wide range and large number of outgroups (or right pops), neither Armenians nor Georgians can pass perfectly for any one ancient population in my dataset.

The best proxies for the minor but significant western ancestry in Armenians are Mycenaeans of the Bronze Age Aegean region and Greek colonists from Iron Age Iberia (Empuries2). Obviously, and perhaps importantly, these are both attested Indo-European-speaking groups. On the other hand, the very minor western ancestry in Georgians is best characterized as gene flow from Middle to Late Neolithic European farmers rich in indigenous European forager ancestry. It's practically impossible to say what language or languages these farmers spoke. How about something Kartvelian?

In any case, for me, the perplexing thing about present-day Armenians is that they harbor very little steppe ancestry. By and large, no more than a few per cent. Compare that to the currently available samples from what is now Armenia dating to the Middle to Late Bronze Age, which show ratios of steppe ancestry of up to 25%. For now, I'm guessing that what we're dealing with here is the classic bounce back of older ancestry layers that has been documented for different parts and periods of prehistoric Europe.

Sunday, April 7, 2019

On the association between Uralic expansions and Y-haplogroup N

Almost all present-day populations speaking Uralic languages show moderate to high frequencies of Y-chromosome haplogroup N. I reckon there are two likely explanations for this:

- the speakers of Proto-Uralic were rich in N because they lived in an area, probably somewhere around the Ural Mountains, where it was common, and they spread it with them as they expanded from their homeland

- Uralic languages often came to be spoken in areas of North Eurasia where N was already found at moderate to high frequencies

The major exception to this rule are Hungarians, whose language belongs to the Ugric branch of Uralic. Their frequency of N is close to zero and they don't differ much in terms of overall genetic structure from their Indo-European-speaking neighbors in East Central Europe.

This is an issue that has generated much debate over the years about the nature of Uralic expansions, who the Hungarians really were, and how the Hungarian language came to be spoken in the heart of Europe (for instance, see here).

But I never understood what the fuss was about, because based on historical sources alone it seemed rather obvious that Hungarian was introduced into the Carpathian Basin during the Middle Ages by a relatively small number of invaders from the east, probably from somewhere around the Ural Mountains, who imposed it on local Indo-European-speaking populations.

As far as I can remember, this has always been the academic consensus, and the results from one of the first ancient DNA studies of human remains soundly corroborated it. Back in 2008, Csányi et al. reported that two out of four skeletons from elite Hungarian conqueror graves dating to the 10th century carried the Tat C allele, which meant that they belonged to Y-haplogroup N (see here).

We've since had to wait over a decade to get a more comprehensive look at the Y-chromosome haplogroups of medieval Hungarians. The most useful effort to date, a manuscript courtesy of Neparáczki et al., was posted this week at bioRxiv (see here).

The results in the preprint suggest a much more complex picture than simply a migration of an obviously Uralic-speaking population rich in Y-haplogroup N into the medieval Carpathian Basin. But they do confirm the presence of N in Hungarian conqueror elites, and, in fact, of very specific subclades of N that link them to the present-day speakers of Uralic languages from around the Ural Mountains. Here are some pertinent quotes from the prepint:

Three Conqueror samples belonged to Hg N1a1a1a1a2-Z1936, the Finno-Permic N1a branch, being most frequent among northeastern European Saami, Finns, Karelians, as well as Komis, Volga Tatars and Bashkirs of the Volga-Ural region. Nevertheless this Hg is also present with lower frequency among Karanogays, Siberian Nenets, Khantys, Mansis, Dolgans, Nganasans, and Siberian Tatars 23.


It is generally accepted that the Hungarian language was brought to the Carpathian Basin by the Conquerors. Uralic speaking populations are characterized by a high frequency of Y-Hg N, which have often been interpreted as a genetic signal of shared ancestry. Indeed, recently a distinct shared ancestry component of likely Siberian origin was identified at the genomic level in these populations, modern Hungarians being a puzzling exception 36. The Conqueror elite had a significant proportion of N Hgs, 7% of them carrying N1a1a1a1a4-M2118 and 10% N1a1a1a1a2-Z1936, both of which are present in Ugric speaking Khantys and Mansis 23.


Population genetic data rather position the Conqueror elite among Turkic groups, Bashkirs and Volga Tatars, in agreement with contemporary historical accounts which denominated the Conquerors as “Turks” 38. This does not exclude the possibility that the Hungarian language could also have been present in the obviously very heterogeneous, probably multiethnic Conqueror tribal alliance.

Indeed, a large proportion of the 44 males from elite Hun, Avar and Hungarian conqueror burials analyzed in the study belonged to Y-haplogroups that can't be plausibly associated with the earliest Uralic speakers, but rather with those of various Indo-European languages, such as I1 and R1b-U106 (these are Germanic-specific markers), I2a-L621 and R1a-CTS1211 (obviously Slavic) and R1a-Z2124 (largely Eastern Iranian).

If most of these results aren't due to contamination, then it's likely that both the early Hungarian commoners and elites were, by and large, derived from Indo-European-speaking populations. No wonder then, that present-day Hungarians are basically indistinguishable genetically from their Indo-European-speaking neighbors and, like them, show hardly any Y-haplogroup N.

Thursday, April 4, 2019

Downloadable genotypes of present-day and ancient DNA data

They're freely available via the Harvard Medical School at this LINK. The linked web page includes this message:

We would be grateful if users of this dataset could alert us to any errors they detect and help us to fill in missing data. This could include: (1) errors or missing information for location, latitude, longitude, archaeological context, date, and group label, (2) concerns about Y chromosome or mitochondrial DNA haplogroup determinations, and (3) evidence for other problems in the data or annotations for individuals. Please write to Swapan 'Shop' Mallick and David Reich with any suggestions. We would also be grateful if members of the community could suggest additional content that would be helpful to add to this page to make it maximally useful. Finally, please let us know if there is any ancient DNA data we should be including that we have missed.

By the way, I've updated my Global25 datasheets with many of the samples from this new Harvard release. Same links as always...

Global 25 datasheet (scaled)

Global 25 pop averages (scaled)

Global 25 datasheet

Global 25 pop averages

Sunday, March 31, 2019

Map of pre-Corded Ware culture (>2900 BCE) instances of Y-haplogroup R1a

Below is a map showing the global distribution of Y-chromosome haplogroup R1a prior to the expansions of the R1a-rich Corded Ware culture (CWC) people and their descendants across Europe and Asia from around 2900 BCE. I'll be updating this map regularly and using it to help me narrow down the options for the place of origin of R1a, and also to counter the misinformation about this topic that has appeared in print and online over the years, including in many scientific publications and popular websites such as Wikipedia.

Incredibly, as far as I know, there are just six reliably called instances of R1a in the now ample Eurasian ancient DNA record dating to the pre-CWC period. To put this into perspective, consider that R1a is today the most common Y-haplogroup in much of Europe and Asia. How did that happen I wonder? However, please note that I chose to base the map only on samples sequenced with the capture and shotgun methods, rather than the PCR method, which is susceptible to producing contaminated results and no longer used in major ancient DNA studies.

Monday, March 25, 2019

Celtic probably not from the west

The term "Celtic from the west" is the catchphrase for a working theory, offered in a couple of recent books, positing that the earliest speakers of Celtic languages lived in Atlantic Europe during the Bronze Age or even earlier. It'll be interesting to see how this theory holds up against increasing numbers of ancient samples from attested early Celtic-speaking populations.

More popular and long-standing theories postulate that the Proto-Celts are associated with the Urnfield and/or Hallstatt archeological cultures of Late Bronze Age and Iron Age Central Europe. I'm inclined to agree with these more mainstream views when looking at my qpAdm mixture models below of three Celtiberians from what is now La Hoya, northern Spain, from the recent Olalde et al. paper on the genomic history of Iberia.

Halberstadt_LBA 0.207±0.077
Pre-Celtiberian_LaHoya 0.793±0.077

chisq 15.031
tail prob 0.522396
Full output

Halberstadt_LBA 0.196±0.074
Non-Celtic_Iberian 0.804±0.074

chisq 17.366
tail prob 0.362297
Full output

The Celtiberians show a stronger signal of (Urnfield-related?) ancestry from the northeast than their Bronze Age predecessors in northern Iberia (Pre-Celtic_LaHoya) as well as their Iron Age contemporaries from eastern Iberia (Non-Celtic_Iberian). The latter group very likely spoke the non-Indo-European Iberian language. It's not clear what the Bronze Age northern Iberians spoke, but it may have been a language related to Basque, which is also non-Indo-European.

Of course, the fact that the Celtiberians harbored more northern Bell Beaker-related ancestry than basically all earlier Iberian groups was already reported in the Olalde et al. paper (on page 2), but I just wanted to see if I could flesh out some more details in regards to this observation by using chronologically and archeologically more proximate reference populations.

Saturday, March 23, 2019

How did Y-haplogroup J2b get to Europe?

Y-haplogroup J2b, defined by the L282 mutation, is found throughout Europe and reaches relatively high frequencies in the southeastern part of the continent. But the question of how and when it got to Europe is still wide open.

It's certainly native to the Near East, where all of the main subclades of Y-haplogroup J2 show more structure than anywhere else. Indeed, it's first attested in the ancient DNA record in an Early Neolithic sample from the Zagros Mountains, in what is now western Iran, dating to ~8,000 calBCE.

It doesn't appear outside of this region until a few thousand years later, when it's recorded in an Early Bronze Age sample dating to ~2,300 calBCE from a site near the Mediterranean Sea in present-day Jordan.

In Europe, it's first attested in a Middle Bronze Age sample from the Caucasus Mountains, in what is now southern Russia, dating to ~1900 calBCE. However, this individual's burial site is practically in the Near East, and, in fact, in terms of ancestry and archeology he is best described as Near Eastern. Importantly, he's also not directly associated with any population that contributed to the genetic structure of Europeans (for instance, see here).

J2b first appears deep in Europe a little later during the Middle Bronze Age, in several samples from sites near the Mediterranean coast in what are now Croatia and Sardinia. This is obviously nowhere near the Caucasus, but it is in a part of Europe that was linked to the Near East at the time via extensive maritime trade networks. Interestingly, however, all of these individuals are genetically very typical of where and when they lived, in that they don't show any obvious recent foreign admixture.

So how did Y-haplogroup J2b get to Europe? My view for now is that it mostly arrived with a few sailors from the Near East during the Early to Middle Bronze Age. This is just about the only plausible theory that I can come up with when looking at this map.

The idea that J2b moved deep into Europe along with the population movements of early pastoralists from the Pontic-Caspian steppe seems to be fairly popular online. However, it currently has no support from ancient DNA. In fact, it's downright contradicted by ancient DNA, because J2b is missing in tens of samples from a wide range of archeological cultures associated with these population movements. If anyone out there disagrees, then please show me a single instance of J2b in samples from the Khvalynsk, Sredny Stog, Yamnaya, Poltavka, Corded Ware, Bell Beaker, Catacomb, Srubnaya and other closely related ancient European steppe and steppe-derived cultures.

Thursday, March 21, 2019

Ancient island hopping in the western Mediterranean (Fernandes et al. 2019 preprint)

Over at bioRxiv at this LINK. Here's the abstract, emphasis is mine:
A series of studies have documented how Steppe pastoralist-related ancestry reached central Europe by at least 2500 BCE, while Iranian farmer-related ancestry was present in Aegean Europe by at least 1900 BCE. However, the spread of these ancestries into the western Mediterranean where they have contributed to many populations living today remains poorly understood. We generated genome-wide ancient DNA from the Balearic Islands, Sicily, and Sardinia, increasing the number of individuals with reported data from these islands from 3 to 52. We obtained data from the oldest skeleton excavated from the Balearic islands (dating to ~2400 BCE), and show that this individual had substantial Steppe pastoralist-derived ancestry; however, later Balearic individuals had less Steppe heritage reflecting geographic heterogeneity or immigration from groups with more European first farmer-related ancestry. In Sicily, Steppe pastoralist ancestry arrived by ~2200 BCE and likely came at least in part from Spain as it was associated with Iberian-specific Y chromosomes. In Sicily, Iranian-related ancestry also arrived by the Middle Bronze Age, thus revealing that this ancestry type, which was ubiquitous in the Aegean by this time, also spread further west prior to the classical period of Greek expansion. In Sardinia, we find no evidence of either eastern ancestry type in the Nuragic Bronze Age, but show that Iranian-related ancestry arrived by at least ~300 BCE and Steppe ancestry arrived by ~300 CE, joined at that time or later by North African ancestry. These results falsify the view that the people of Sardinia are isolated descendants of Europe's first farmers. Instead, our results show that the island's admixture history since the Bronze Age is as complex as that in many other parts of Europe.

Fernandes et al., The Arrival of Steppe and Iranian Related Ancestry in the Islands of the Western Mediterranean, bioRxiv, posted March 21, 2019, doi:

Update: Another preprint on a similar theme by Marcus et al. has appeared at bioRxiv (see here).

Abstract: Recent ancient DNA studies of western Eurasia have revealed a dynamic history of admixture, with evidence for major migrations during the Neolithic and Bronze Age. The population of the Mediterranean island of Sardinia has been notable in these studies -- Neolithic individuals from mainland Europe cluster more closely with Sardinian individuals than with all other present-day Europeans. The current model to explain this result is that Sardinia received an initial influx of Neolithic ancestry and then remained relatively isolated from expansions in the later Neolithic and Bronze Age that took place in continental Europe. To test this model, we generated genome-wide capture data (approximately 1.2 million variants) for 43 ancient Sardinian individuals spanning the Neolithic through the Bronze Age, including individuals from Sardinia's Nuragic culture, which is known for the construction of numerous large stone towers throughout the island. We analyze these new samples in the context of previously generated genome-wide ancient DNA data from 972 ancient individuals across western Eurasia and whole-genome sequence data from approximately 1,500 modern individuals from Sardinia. The ancient Sardinian individuals show a strong affinity to western Mediterranean Neolithic populations and we infer a high degree of genetic continuity on the island from the Neolithic (around fifth millennium BCE) through the Nuragic period (second millennium BCE). In particular, during the Bronze Age in Sardinia, we do not find significant levels of the "Steppe" ancestry that was spreading in many other parts of Europe at that time. We also characterize subsequent genetic influx between the Nuragic period and the present. We detect novel, modest signals of admixture between 1,000 BCE and present-day, from ancestry sources in the eastern and northern Mediterranean. Within Sardinia, we confirm that populations from the more geographically isolated mountainous provinces have experienced elevated levels of genetic drift and that northern and southwestern regions of the island received more gene flow from outside Sardinia. Overall, our genetic analysis sheds new light on the origin of Neolithic settlement on Sardinia, reinforces models of genetic continuity on the island, and provides enhanced power to detect post-Bronze-Age gene flow. Together, these findings offer a refined demographic model for future medical genetic studies in Sardinia.

Marcus et al., Population history from the Neolithic to present on the Mediterranean island of Sardinia: An ancient DNA perspective, bioRxiv, posted March 21, 2019, doi:

Monday, March 18, 2019

Open thread: What are the linguistic implications of Olalde et al. 2019?

I was going to write a huge post on the linguistic implications of the latest batch of ancient DNA from Iberia courtesy of Olalde et al. 2019, and then I thought better of it. Admittedly, I don't know enough about the languages of prehistoric Iberia to say anything really useful on the topic. So instead here's an open thread to bounce around a few ideas in the comments.

Just briefly, this is what Olalde et al. say in the abstract of their paper about the relationship between ancestry from the Pontic-Caspian steppe and languages in Iron Age Iberia:

We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry. We show that, in the Iron Age, Steppe ancestry had spread not only into Indo-European–speaking regions but also into non Indo-European–speaking ones, and we reveal that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia.

However, in the paper it's revealed that "Indo-European regions" actually refers to a Celtic-speaking part of northern Iberia. And it's quite possible that Celts moved into this area from outside of Iberia only during the Iron Age. In other words, the speakers of Indo-European languages here may not have been the descendants of any of the people with steppe ancestry who came to Iberia by ~2000 BCE.

So I'm probably not alone in thinking that the question of the linguistic affinities of these early migrants with steppe ancestry to Iberia (mostly associated with the Bell Beaker culture or BBC) remains open, especially since they evidently had such a profound genetic impact on the later non Indo-European-speaking populations of eastern and southern Iberia. Could they have been the speakers of unattested Indo-European languages, as well as Proto-Iberian and Proto-Basque? If not, why not?

Below is a Principal Component Analysis (PCA) of West Eurasian genetic variation. I highlighted some of the ancient samples from Olalde et al., as well as Basques and other present-day Iberians. The Basques form a tight cluster with most of the Copper, Bronze and Iron Age Iberians, and, unlike the other present-day Iberians, they basically look like an Iberian population from the metal ages. The relevant datasheet is available here.

This is nothing new and very much in line with the results in Olalde et al., but I wanted to emphasize the point that Basques were not just a group that experienced an extreme founder effect in R1b-P312, which is a Beaker-specific Y-chromosome lineage. Rather, they're still very similar to Iberian Beakers in terms of overall genetic structure. So where did they get their language?

Saturday, March 16, 2019

Let's try a formal heuristic approach

I created a massive outgroup f3-statistics matrix, featuring almost 300 ancient and present-day populations and individuals, for the purpose of running unsupervised, or at least semi-supervised, fine scale mixture tests with nMonte. Most of the stats were computed with 400-900K SNPs, which is a lot and should provide plenty of power. The matrix is available in a zip file here.

The results I'm getting with this new setup are very similar to those obtained with the Global25. The main differences, as far as I can see for now, are that the f3 data produce more stable results when modeling very deep ancestry, while the Global25 provides more accuracy when modeling fine scale recent ancestry (probably because it's better at picking up more recent genetic drift).

Let's investigate some pertinent issues with the new data using nMonte and PAST. How about we start with these?

- where did Bell Beakers get their steppe ancestry from?

- which Steppe_MLBA group did Indians get their steppe ancestry from?

- do the present-day Irish have any Hallstatt ancestry?

- what is the origin of present-day Basques?

- what is the precise ancestry of Armenia_ChL?

- do the Swat Iron Age samples really lack BMAC ancestry?

- does Anatolia_MLBA really lack steppe ancestry?

Note that the f3 matrix includes the ancients from the new Olalde et al. paper on the genomic history of Iberia (see here). I've also updated the Global25 datasheets with most of these samples.

Global 25 datasheet (scaled)

Global 25 pop averages (scaled)

Global 25 datasheet

Global 25 pop averages

By the way, Hajji_Firuz_ChL I2327, from Narasimhan et al. 2018, is now labeled Hajji_Firuz_IA in the above datasheets, because my understanding is that he's actually from the Iron Age rather than the Chalcolithic period. For background reading about this controversial sample see here and here. I don't have any more info on this topic; we'll just have to wait for the formal publication of the Narasimhan et al. manuscript to get all the details. Apparently it's coming very soon.

Thursday, March 14, 2019

Two new papers on ancient Iberia

Olalde et al. 2019 (Science) at this LINK...

Abstract: We assembled genome-wide data from 271 ancient Iberians, of whom 176 are from the largely unsampled period after 2000 BCE, thereby providing a high-resolution time transect of the Iberian Peninsula. We document high genetic substructure between northwestern and southeastern hunter-gatherers before the spread of farming. We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry. We show that, in the Iron Age, Steppe ancestry had spread not only into Indo-European–speaking regions but also into non-Indo-European–speaking ones, and we reveal that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia. Additionally, we document how, beginning at least in the Roman period, the ancestry of the peninsula was transformed by gene flow from North Africa and the eastern Mediterranean. DOI: 10.1126/science.aav4040

Villalba-Mouco et al. 2019 (Current Biology) at this LINK...

Summary: The Iberian Peninsula in southwestern Europe represents an important test case for the study of human population movements during prehistoric periods. During the Last Glacial Maximum (LGM), the peninsula formed a periglacial refugium [1] for hunter-gatherers (HGs) and thus served as a potential source for the re-peopling of northern latitudes [2]. The post-LGM genetic signature was previously described as a cline from Western HG (WHG) to Eastern HG (EHG), further shaped by later Holocene expansions from the Near East and the North Pontic steppes [3, 4, 5, 6, 7, 8, 9]. Western and central Europe were dominated by ancestry associated with the ∼14,000-year-old individual from Villabruna, Italy, which had largely replaced earlier genetic ancestry, represented by 19,000–15,000-year-old individuals associated with the Magdalenian culture [2]. However, little is known about the genetic diversity in southern European refugia, the presence of distinct genetic clusters, and correspondence with geography. Here, we report new genome-wide data from 11 HGs and Neolithic individuals that highlight the late survival of Paleolithic ancestry in Iberia, reported previously in Magdalenian-associated individuals. We show that all Iberian HGs, including the oldest, a ∼19,000-year-old individual from El Mirón in Spain, carry dual ancestry from both Villabruna and the Magdalenian-related individuals. Thus, our results suggest an early connection between two potential refugia, resulting in a genetic ancestry that survived in later Iberian HGs. Our new genomic data from Iberian Early and Middle Neolithic individuals show that the dual Iberian HG genomic legacy pertains in the peninsula, suggesting that expanding farmers mixed with local HGs. DOI:

Thursday, March 7, 2019

A challenge

The datasheets below contain outgroup f3-statistics for a wide range of ancient and present-day populations. Five of the ancient groups and individuals are labeled "Unknown". In fact, I do know what they are, but I'd like you to try and work out whether they were the speakers of Indo-European or non-Indo-European languages by analyzing the datasheets with, say, PAST or nMonte.



I'll reveal the identities and likely languages of the mystery ancients in a couple of days. It'll be interesting to see if any of you nail this challenge. It shouldn't be too difficult, but to help things along, I color coded the populations in the datasheets (black = Indo-European, blue = Uralic, and grey = neither). If you haven't done this sort of thing before, these blog posts might be useful as background reading.

Maykop: a multi-ethnic layer cake?

Global25 PAST-compatible datasheets

D-stats/nMonte open thread

Update 09/03/2019: Samuel nailed the challenge in the first post below. And then Matt almost figured out the precise identities of the mystery ancients here. In hindsight I should've made this more difficult. Here are the answers:

Unknown1 = England_Anglo-Saxon (Indo-European) > more here
Unknown2 = Levanluhta_IA (non-Indo-European) > more here
Unknown3 = Minoan_Lasithi (non-Indo-European) > more here
Unknown4 = Slavic_Bohemia (Indo-European) > more here
Unknown5 = Turkmenistan_IA (Indo-European) > more here

Monday, March 4, 2019

An exceptional burial indeed, but not that of an Indo-European

Not too many people have been buried sitting on wagons. The most famous case is that of an Early Bronze Age man who, considering his injuries, may have died in a high-speed crash - high-speed for its time anyway - on the Pontic-Caspian steppe in Eastern Europe.

It's likely that this guy was one of the very first wagon-drivers in human history, because his four-wheeled wooden model is dated to 3336-3105 calBCE, which makes it the oldest wagon discovered thus far. His genotype data, under the label Steppe Maykop SA6004, were published recently along with Wang et al. 2019.

Early wagons are very important for a couple of reasons: they revolutionized human transport and warfare, and they're often closely associated with the prehistoric expansions of Indo-European languages.

So I'm pretty sure that many of you must be thinking right now that wagon-driver SA6004 was an early Indo-European, or even a Proto-Indo-European! I bet that's what Wang et al. thought too, considering the conclusion in their paper. But, alas, the chances of this are slim to none.

Steppe Maykop samples show rather peculiar genetic structure considering their geographic origin, with a large proportion of their ancestry deriving from a source closely related to western Siberian hunter-gatherers (aka West_Siberia_N in the ancient DNA record). Indeed, SA6004 basically looks like a 50/50 mix between West_Siberia_N and Piedmont_Eneolithic. Here's a map with all of the relevant details.

Thus, clearly, the Steppe Maykop population wasn't ancestral or even directly related to the steppe and steppe-derived groups generally regarded to have been Indo-European speaking, such as those associated with the Yamnaya, Corded Ware, and Bell Beaker cultures. That's because these groups lack any discernible West_Siberia_N-related ancestry.

It also wasn't ancestral or directly related to any present-day or currently sampled ancient Indo-European speaking populations, again because these populations basically lack West_Siberia_N-related ancestry.

On the other hand, Yamnaya, Corded Ware and other closely related groups show an exceptionally strong genetic relationship with Indo-European speakers, especially those from across Northern Europe, which experienced massive migrations from the Pontic-Caspian steppe during the late Neolithic period, and hardly anything from elsewhere since then.

Case in point, the samples from Wang et al. labeled Yamnaya Caucasus were recovered from the same area of the Pontic-Caspian as their Steppe Maykop samples, and yet, take a look at this linear model based on outgroup f3-statistics. Steppe Maykop does show high genetic affinity to Indo-European speakers (no doubt mediated via its Piedmont_Eneolithic-related ancestry), but, unlike Yamnaya Caucasus, it also shows unusually high affinity for a West Eurasian population to Native Americans and Siberians. The relevant datasheet is available here.
So the only way that the Steppe Maykop population was Indo-European-speaking, was if it inherited its Indo-European speech from its Piedmont_Eneolithic-related ancestors. And even if it was Indo-European-speaking, it probably spoke an extinct Indo-European language not closely related to any extant Indo-European languages. In other words, the possibility that Steppe Maykop passed on its language to Yamnaya, along with its wagons, is close to zero. More likely, Yamnaya stole a few wagons from Steppe Maykop, and the rest is history.

Saturday, March 2, 2019

Maykop: a multi-ethnic layer cake?

Let's speculate about the linguistic affinities of the currently available ancient populations from the Caucasus and surrounds. I put together a series of outgroup f3-stats to help things along. They're available for download here.

Georgian 0.258224
Abkhasian 0.257899
Latvian 0.257376
Swedish 0.257301
Turkish_Trabzon 0.256996
Basque_Spanish 0.256589
Chechen 0.256514
Icelandic 0.256418
Norwegian 0.256325
Lezgin 0.256272
Irish 0.256227
Tabasaran 0.256092
Italian_Bergamo 0.25605
English_Cornwall 0.256032
Polish_East 0.255991
Scottish 0.255955
Adygei 0.255913

Latvian 0.261845
Russian_North 0.26145
Estonian 0.260355
Finnish 0.260211
Lithuanian 0.260072
Udmurd 0.259804
Ingrian 0.259663
Surui 0.259637
Vepsa 0.259608
Karelian 0.259532
Karitiana 0.259482
Russian_West 0.259397
Russian_Central 0.259274
Wichi 0.259106
Saami 0.258982
Komi 0.258945
Icelandic 0.258854
Swedish 0.258814
Mordovian 0.258604
Irish 0.25859

Eyeballing the stats might be enough to get a general impression about what they mean, but to understand them properly it's necessary to get technical with something like PAST3 (see here). That's because f3-stats pick up shared genetic drift from all drift paths, and don't especially focus on more recently shared ancestry. This can often lead to confusing outcomes.

Below are a few examples of linear models based on my f3-stats. Note that many Indo-European speakers, especially from Northern Europe, are foremost attracted to ancient samples from the Pontic-Caspian steppe. On the other hand, non-Indo-European speakers, from such far flung locations as the Caucasus and Iberia, show relatively stronger affinity to ancient samples from Anatolia and the Caucasus. Moreover, Uralic speakers show elevated affinity to ancient hunter-gatherer samples from Eastern Europe and Siberia. Makes sense, right?
Based on these and other data, I'd say that Maykop and the culturally related Steppe Maykop were something of a multi-ethnic polity, with many near and far related languages spoken by its people, including perhaps Kartvelian, Northwest Caucasian, Yeniseian and Indo-European. But it seems to me that Proto-Indo-European was spoken by steppe foragers turned pastoralists just outside of the Maykop zone. And I'm quite sure that after the Maykop collapse various early Indo-European groups pushed across the Caucasus and deep into the Near East. Just take a look at the f3-stats and linear model for Hajji_Firuz_BA to see what I mean.

