search this blog

Thursday, April 25, 2019

Some myths die hard

Ancient DNA tells us that the Bronze Age wasn't kind to the indigenous populations of Central Asia. It seems to have wiped them out totally. Indeed, Central Asia might well be the only major world region in which native hunter-gatherers failed to make a perceptible impact on the genetics of any extant populations.

Before the Neolithic transition, much of Central Asia was home to hunter-gatherers closely related to those of nearby western Siberia. During the Neolithic, agriculturalists and pastoralists from the Near East gradually moved into the more arable parts of southern and eastern Central Asia, eventually giving rise to the Bactria Margiana Archaeological Complex, or BMAC, and other similar communities.

It's not clear what their relationship was like with the native hunter-gatherers in these areas. But they did mix with them in varying degrees. This is obvious because genome-wide genetic ancestry characteristic of the Botai people, who hunted and eventually domesticated horses on the Kazakh steppe during the 4th millennium BCE, and were probably the archetypal Central Asians for their time, is found at significant levels in a number of later samples from Central Asian farmer and pastoralist sites, such as Dali, Gonur Tepe and Sarazm.

Thus, even though the Neolithic transition did have a big impact on Central Asia, and clearly led to large scale population replacements in some parts of the region, this was just the beginning of these population shifts. Moreover, in some cases the expanding farmer and pastoralist populations seem to have acquired significant indigenous Central Asian ancestry and spread it with them.

The precise geographic extent of the relatively unique Botai-related ancestry in prehistoric Eurasia is still something of a mystery. But to give you a general picture of where it was found from around 6,000 BCE to 2,000 BCE, here's a map with info about samples with significant levels of this type of ancestry from a wide range of sites in space and time.

Going by this map, I'd say it's safe to infer that the Botai-related ancestry was a major feature of practically all forager populations living between the Caspian Sea and the Altai Mountains. It was also present in the Early Bronze Age (EBA) pastoralist population associated with the Steppe Maykop archeological culture of Eastern Europe, so it may have already been in Europe as early as 3,800 BCE, because that's when the Steppe Maykop culture first appeared.

It's an interesting question where the ancestors of the Steppe Maykop herders came from. I once simply assumed that they were closely related to the Maykop people who lived in the Caucasus Mountains. But it's now clear that the populations associated with these two similar cultures were starkly different, with the Maykop people being basically of Near Eastern origin and lacking any discernible Botai-like ancestry. My guess for now is that the Steppe Maykop herders were in large part the descendants of the Kelteminar culture population from just east of the Caspian Sea, but we'll see about that when more ancient DNA comes in.

The other great mystery is what eventually happened to the Steppe Maykop people. Around 3,000 BCE, their culture vanished from the archeological record and their particular genetic signature disappeared from the steppe ancient DNA record. Where did they go? Did they migrate back east?

I don't know, but at about that time other Eastern European steppe herders, those associated with the Yamnaya and Corded Ware archeological cultures, began to stir and migrate in big numbers in basically all directions, including into Steppe Maykop territory. Indeed, unlike the Steppe Maykop population, these groups weren't closely related to any contemporaneous or earlier Central Asians. But they ended up moving into Central Asia, and in a big way too.

Their impact all the way from the Ural Mountains to what are now China and India was profound. For instance, not only did they end up totally replacing the Botai people, but also their horses. For more details on this topic check out the Youtube clip here. I have a strong suspicion that the same sort of thing happened to the aforementioned Steppe Maykop people. In other words, they may have been forced out from the Eastern European steppe, and perhaps sought shelter in the Caucasus Mountains?

Admittedly, I'm not offering anything new here. I just wanted to emphasize a few key points, because I'm still seeing some confusion online about the population history of Central Asia, and especially how it relates to the population history of Europe, and also the Proto-Indo-European homeland question. Make no mistake, thanks to the ancient DNA already available from Central Asia, we can confidently infer the following:

- the chance that the ancient European populations associated with the Yamnaya, Corded Ware and other closely related archeological cultures formed as a result of migrations from Central Asia is zero

- the chance that the Proto-Indo-European homeland was located in Central Asia is zero

- the chance that present-day Europeans, by and large, derive from any ancient Central Asian populations is zero

See also...

Central Asia as the PIE urheimat? Forget it

The Steppe Maykop enigma

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Monday, April 22, 2019

R1b-M269 in the Bronze Age Levant

The new Harvard genotype datasets that I blogged about recently include a couple of potentially very useful samples from the Levant dated to 1400-1100 BCE. Search for IDs I2062 and I1934 in the anno files here. They're both from an archeological paper about a Late Bronze Age (LBA) burial site in what is now Israel that was published back in 2017 (see here).

Surprisingly, individual I2062 is listed in the anno files as belonging to Y-haplogroup R1b1a1a2, which is also known as R1b-M269. The reason that this is a surprise to me is because R1b-M269 is closely associated with the Bronze Age expansions of pastoralists from the Pontic-Caspian steppe in Eastern Europe, and these expansions didn't impact the Levant in any direct or significant way.

The Y-haplogroup assignment may or may not be correct. Sometimes the Y-haplogroups in these sorts of datasheets are indeed wrong. Unfortunately, as far as I know, the BAM file for I2062 isn't available anywhere online, so I can't check whether he does really belong to R1b-M269. But, intriguingly, his autosomes do show a subtle signal of Yamnaya-related ancestry from the Pontic-Caspian steppe that is missing in earlier ancients from the Levant.

To characterize his genome-wide ancestry, I first ran a series of unsupervised and supervised analyses with the Global25/nMonte3 method (using this datasheet). For the sake of simplicity, I narrowed things down to the mixture models below based on three reference populations each. Levant_ISR_C is made up of Chalcolithic samples from Israel. The identities of the other reference sets should be obvious to most readers. If confused, feel free to ask for more details in the comments below.


[1] distance%=1.8905


[1] distance%=2.0856


[1] distance%=2.1738

To further confirm the reliability of my models, I tested them with the formal statistics-based qpAdm software. As far as I can tell, the output from qpAdm looks very solid across the board.

IRN_Seh_Gabi_C 0.193±0.052
Levant_ISR_C 0.710±0.038
Yamnaya_RUS_Samara 0.098±0.026

chisq 9.304
tail prob 0.67676
Full output

Kura-Araxes_ARM_Kaps 0.249±0.076
Levant_ISR_C 0.681±0.051
Yamnaya_RUS_Samara 0.071±0.035

chisq 11.101
tail prob 0.52032
Full output

Levant_ISR_C 0.661±0.042
Kura-Araxes_RUS_Velikent 0.339±0.042

chisq 7.979
tail prob 0.844942
Full output

Admittedly, even though I2062 can be modeled with Yamnaya-related admixture, he doesn't need to be. Indeed, his ratio of this type of ancestry varies significantly between the models, from around 10% to nothing. This appears to be dependent on the geography of the non-Levant and non-Yamnaya reference populations; the closer they are to the Pontic-Caspian steppe, the smaller the ratio of Yamnaya-related ancestry in I2062. I'd describe this as an artifact of the isolation-by-distance phenomenon, and it totally makese sense, but it prevents me from confirming beyond any doubt that I2062 does harbor genome-wide steppe ancestry. Unfortunately, individual I1934 doesn't offer enough data to be analyzed with the same methods.

Samples associated with the Kura-Araxes or Early Transcaucasian culture are particularly strong references for the eastern ancestry in I2062. This probably isn't a coincidence, and it might also explain his Y-haplogroup, because, at its maximum extent, the territory occupied by the Kura-Araxes culture stretched all the way from the Pontic-Caspian steppe to the southern Levant. The map below is from Wilkinson 2014.

By the way, what's the chance that I2062 is an awesome proxy for the earliest Jews? I reckon it's pretty good, considering that Samaritans from Israel are his closest present-day population in terms of genome-wide affinity. Who wants to test this theory with the Global25? If I see some good stuff in the comments I'll post it here in an update.

See also...

Downloadable genotypes of present-day and ancient DNA data

Early chariot riders of Transcaucasia came from...

R-V1636: Eneolithic steppe > Kura-Araxes?

Thursday, April 18, 2019

Early chariot riders of Transcaucasia came from...

I'm finding it increasingly difficult nowadays to fully appreciate all of the ancient DNA samples that are accumulating in my dataset. But it's not entirely my fault.

Among the hundreds of ancient samples published last year there was a couple of Middle Bronze Age (MBA) individuals from what is now Armenia labeled "Lchashen Metsamor" (see here). I wasn't planning to do much with these samples because, even after reading the Nature paper that they came with a couple times over, I didn't have a clue what they were about. But after some digging around, I now know that their people, those associated with the Lchashen Metsamor archeological culture, were among the earliest in Transcaucasia, and indeed the Near East, to use the revolutionary spoked-wheel horse chariot. How awesome is that?

The invention of the spoked-wheel chariot is generally credited to the Middle Bronze Age Sintashta culture of the Trans-Ural steppe in Central Asia, and its rapid spread is often associated with the early expansions of Indo-European languages deep into Asia. On the other hand, some have argued that this type of chariot was first developed in the Near East, and directly derived from solid-wheeled wagons pulled by donkeys.

It's now obvious, thanks to ancient DNA, that the Sintashta people were by and large migrants to Central Asia from somewhere in Eastern Europe, and that they didn't harbor any recent ancestry from the Near East. So if chariot technology spread into the steppes from the Near East, then it did so without any accompanying gene flow, which is possible but not entirely convincing. This begs the question of whether the Lchashen Metsamor population was of Sintashta-related origin, because if it was, then this would corroborate the consensus that spoked-wheel chariots were introduced into Transcaucasia from the steppes to the north.

Below is a Principal Component Analysis (PCA) of West Eurasian genetic variation. It does suggest that the Lchashen Metsamor pair (labeled Armenia_MBA_Lchashen), as well as most of the other currently available samples from what is now Armenia dating to the Middle to Late Bronze Age (MLBA), harbor some steppe ancestry. That's because they appear to form a cline between samples associated with the Sintashta and Kura-Araxes cultures. Of course, the Kura-Araxes culture was a major Early Bronze Age (EBA) archeological phenomenon centered on Transcaucasia and surrounds, so its population can be reasonably assumed to have formed the genetic base of most subsequent populations in the region. The relevant PCA datasheet is available here.

To investigate the possibility of Sintashta-related admixture in Lchashen Metsamor with formal methods, I ran a series of mixture models with the qpAdm software. Here are the three statistically most sound outcomes that I was able to come up with for Lchashen Metsamor:

CWC_Kuyavia 0.183±0.036
Kura-Araxes_Kaps 0.817±0.036
chisq 13.941
tail prob 0.378021
Full output

Balkans_BA_I2163 0.193±0.045
Kura-Araxes_Kaps 0.807±0.045

chisq 14.780
tail prob 0.321267
Full output

Kura-Araxes_Kaps 0.788±0.043
Sintashta_MLBA 0.212±0.043

chisq 14.871
tail prob 0.315451
Full output

I sorted the output by "tail prob", but the fact that Sintashta_MLBA is in third place isn't a problem because the stats in all of these models are basically identical. Indeed, CWC_Kuyavia (Corded Ware culture samples from present-day Kuyavia, North-Central Poland) and Balkans_BA_I2163 (a Bronze Age singleton from what is now Bulgaria) are both very similar and probably closely related to each other and to the Sintashta samples.

Interestingly, and, I'd say, importantly, ancients from the steppe that are closest to Lchashen Metsamor in both space and time, but not particularly closely related to the Sintashta people, don't work too well as a mixture source in such models.

Kubano-Tersk 0.184±0.046
Kura-Araxes_Kaps 0.816±0.046

chisq 22.179
tail prob 0.0526526
Full output

A couple of months ago I suggested that populations associated with the Early to Middle Bronze Age (EMBA) Catacomb culture were the vector for the spread of steppe ancestry into what is now Armenia during the MLBA (see here). After taking a closer look at the Lchashen Metsamor samples, I now think that the peoples of the Sintashta and related cultures were also important in this process. If so, they may have moved from the steppe into Transcaucasia both from the west via the Balkans and the east via Central Asia, and brought with them spoked-wheel chariots. I don't have a clue what language they spoke, but I'm guessing that it may have been something Indo-European.

See also...

The mystery of the Sintashta people

A potentially violent end to the Kura-Araxes Culture (Alizadeh et al. 2018)

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, April 12, 2019

Armenians vs Georgians

Armenians and Georgians are ethnic groups that live side by side in the south Caucasus, or Transcaucasia. By all accounts, they've both been there since prehistoric times and they're very similar in terms of overall genetic structure.

However, they speak languages from totally unrelated families: Indo-European and Kartvelian, respectively. How did this happen and might the answer lie in the small genetic differences that do exist between them?

To investigate this issue, I ran a series of qpAdm formal mixture models of present-day Armenians and Georgians using tens of ancient reference populations. To come up with as straightforward and meaningful results as possible, I constrained myself to two-way models. I then discarded the runs that produced "tail probs" under 0.1 and retained less than 400K SNPs. Only a handful of models passed muster, including these two:

Mycenaeans_&_Empuries2 0.233±0.041
Kura-Araxes_Kaps 0.767±0.041

chisq 18.422
tail prob 0.142151
Full output

Globular_Amphora 0.071±0.025
Kura-Araxes_Kaps 0.929±0.025

chisq 18.419
tail prob 0.142266
Full output

At the most basic level, the results suggest that both Armenians and Georgians are overwhelmingly derived from populations of Bronze Age Transcaucasia associated with the Kura-Araxes archeological culture, albeit with minor ancestries from somewhat different sources from the west. As far as I can see, when using more than 400K SNPs and a wide range and large number of outgroups (or right pops), neither Armenians nor Georgians can pass perfectly for any one ancient population in my dataset.

The best proxies for the minor but significant western ancestry in Armenians are Mycenaeans of the Bronze Age Aegean region and Greek colonists from Iron Age Iberia (Empuries2). Obviously, and perhaps importantly, these are both attested Indo-European-speaking groups. On the other hand, the very minor western ancestry in Georgians is best characterized as gene flow from Middle to Late Neolithic European farmers rich in indigenous European forager ancestry. It's practically impossible to say what language or languages these farmers spoke. How about something Kartvelian?

In any case, for me, the perplexing thing about present-day Armenians is that they harbor very little steppe ancestry. By and large, no more than a few per cent. Compare that to the currently available samples from what is now Armenia dating to the Middle to Late Bronze Age, which show ratios of steppe ancestry of up to 25%. For now, I'm guessing that what we're dealing with here is the classic bounce back of older ancestry layers that has been documented for different parts and periods of prehistoric Europe.

See also...

Early chariot drivers of Transcaucasia came from...

Catacomb > Armenia_MLBA

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Sunday, April 7, 2019

On the association between Uralic expansions and Y-haplogroup N

Almost all present-day populations speaking Uralic languages show moderate to high frequencies of Y-chromosome haplogroup N. I reckon there are two likely explanations for this:

- the speakers of Proto-Uralic were rich in N because they lived in an area, probably somewhere around the Ural Mountains, where it was common, and they spread it with them as they expanded from their homeland

- Uralic languages often came to be spoken in areas of North Eurasia where N was already found at moderate to high frequencies

The major exception to this rule are Hungarians, whose language belongs to the Ugric branch of Uralic. Their frequency of N is close to zero and they don't differ much in terms of overall genetic structure from their Indo-European-speaking neighbors in East Central Europe.

This is an issue that has generated much debate over the years about the nature of Uralic expansions, who the Hungarians really were, and how the Hungarian language came to be spoken in the heart of Europe (for instance, see here).

But I never understood what the fuss was about, because based on historical sources alone it seemed rather obvious that Hungarian was introduced into the Carpathian Basin during the Middle Ages by a relatively small number of invaders from the east, probably from somewhere around the Ural Mountains, who imposed it on local Indo-European-speaking populations.

As far as I can remember, this has always been the academic consensus, and the results from one of the first ancient DNA studies of human remains soundly corroborated it. Back in 2008, Csányi et al. reported that two out of four skeletons from elite Hungarian conqueror graves dating to the 10th century carried the Tat C allele, which meant that they belonged to Y-haplogroup N (see here).

We've since had to wait over a decade to get a more comprehensive look at the Y-chromosome haplogroups of medieval Hungarians. The most useful effort to date, a manuscript courtesy of Neparáczki et al., was posted this week at bioRxiv (see here).

The results in the preprint suggest a much more complex picture than simply a migration of an obviously Uralic-speaking population rich in Y-haplogroup N into the medieval Carpathian Basin. But they do confirm the presence of N in Hungarian conqueror elites, and, in fact, of very specific subclades of N that link them to the present-day speakers of Uralic languages from around the Ural Mountains. Here are some pertinent quotes from the prepint:

Three Conqueror samples belonged to Hg N1a1a1a1a2-Z1936, the Finno-Permic N1a branch, being most frequent among northeastern European Saami, Finns, Karelians, as well as Komis, Volga Tatars and Bashkirs of the Volga-Ural region. Nevertheless this Hg is also present with lower frequency among Karanogays, Siberian Nenets, Khantys, Mansis, Dolgans, Nganasans, and Siberian Tatars 23.


It is generally accepted that the Hungarian language was brought to the Carpathian Basin by the Conquerors. Uralic speaking populations are characterized by a high frequency of Y-Hg N, which have often been interpreted as a genetic signal of shared ancestry. Indeed, recently a distinct shared ancestry component of likely Siberian origin was identified at the genomic level in these populations, modern Hungarians being a puzzling exception 36. The Conqueror elite had a significant proportion of N Hgs, 7% of them carrying N1a1a1a1a4-M2118 and 10% N1a1a1a1a2-Z1936, both of which are present in Ugric speaking Khantys and Mansis 23.


Population genetic data rather position the Conqueror elite among Turkic groups, Bashkirs and Volga Tatars, in agreement with contemporary historical accounts which denominated the Conquerors as “Turks” 38. This does not exclude the possibility that the Hungarian language could also have been present in the obviously very heterogeneous, probably multiethnic Conqueror tribal alliance.

Indeed, a large proportion of the 44 males from elite Hun, Avar and Hungarian conqueror burials analyzed in the study belonged to Y-haplogroups that can't be plausibly associated with the earliest Uralic speakers, but rather with those of various Indo-European languages, such as I1 and R1b-U106 (these are Germanic-specific markers), I2a-L621 and R1a-CTS1211 (obviously Slavic) and R1a-Z2124 (largely Eastern Iranian).

If most of these results aren't due to contamination, then it's likely that both the early Hungarian commoners and elites were, by and large, derived from Indo-European-speaking populations. No wonder then, that present-day Hungarians are basically indistinguishable genetically from their Indo-European-speaking neighbors and, like them, show hardly any Y-haplogroup N.

See also...

Uralic-specific genome-wide ancestry did make a signifcant impact in the East Baltic

Corded Ware people =/= Proto-Uralics (Tambets et al. 2018)

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Thursday, April 4, 2019

Downloadable genotypes of present-day and ancient DNA data

They're freely available via the Harvard Medical School at this LINK. The linked web page includes this message:

We would be grateful if users of this dataset could alert us to any errors they detect and help us to fill in missing data. This could include: (1) errors or missing information for location, latitude, longitude, archaeological context, date, and group label, (2) concerns about Y chromosome or mitochondrial DNA haplogroup determinations, and (3) evidence for other problems in the data or annotations for individuals. Please write to Swapan 'Shop' Mallick and David Reich with any suggestions. We would also be grateful if members of the community could suggest additional content that would be helpful to add to this page to make it maximally useful. Finally, please let us know if there is any ancient DNA data we should be including that we have missed.

By the way, I've updated my Global25 datasheets with many of the samples from this new Harvard release. Same links as always...

Global 25 datasheet (scaled)

Global 25 pop averages (scaled)

Global 25 datasheet

Global 25 pop averages

See also...

New release of ADMIXTOOLS with two additional programs

Modeling genetic ancestry with Davidski: step by step

Unleash the power: Global 25 test drive thread