search this blog

Tuesday, June 19, 2018

An exploration of distance-based models of language relationships with a special focus on Indo-European (Kozintsev 2018)

The latest edition of the Journal of Indo-European Studies includes an interesting methodological paper by Alexander Kozintsev, in which the author tests the relationship between Indo-European and other language families using lexicostatistical data and a wide range of distance-based models (see here). My impression, after reading the paper a couple of times, is that we probably have a long way to go before someone comes up with a robust enough way to study languages with these sorts of methods, which are more widely used for the classification of living things.

However, note that Kozintsev's results are very consistent in placing Indo-European, including Hittite (HIT in the figure below), significantly closer to Uralic than to any of the language families south of the Caucasus. This is in line with the general consensus amongst historical linguists working with more traditional methods of studying languages, and, if true, has significant implications for the search for the Proto-Indo-European (PIE) homeland. Why? Because it's very difficult to imagine the PIE homeland being located anywhere south of the Caucasus considering the present-day distribution and likely homeland of Uralic languages well to the north of this region. Emphasis is mine:

The paper explores the informative potential of various distance-based methods of language classification such as cluster analysis, networks, and two-dimensional projections, using lexicostatistical data on 41 languages belonging to seven families (IE, Uralic, Altaic, Yupik-Chukchee, Kartvelian, Semitic, and North Caucasian) represented in the STARLING database. Rooting and weighting are of critical importance, radically affecting the graphic models. Special focus is made on two-dimensional charts generated by the multidimensional scaling and on the little-used minimum spanning tree method. The latter two techniques are employed to test the hybridization/ Sprachbund theory of Indo-European origins. The “Semitic” tendency of IE relative to Uralic is significant whereas neither the “Kartvelian” tendency nor the North Caucasian substratum hypothesis are supported by the two-dimensional models.


Finally, having come full circle, we return to our working hypothesis––that IE is closer to Uralic than to any of the “southern” families. I did not test this assumption because it appeared almost self-evident; now it can be easily tested by the same analysis. But, in fact, even statistical testing is unnecessary, because the triangle data cited above speak for themselves. IE, according to these data, is 20.8% closer to Uralic than to West Caucasian; 18.4% closer to Uralic than to East Caucasian; 13.7% closer to Uralic than to Kartvelian; and 16.9% closer to Uralic than to Semitic. Given the statistical reliability of a 5.6% difference (see above), all these values are highly significant a fortiori.

Kozintsev, Alexander, On Certain Aspects of Distance-based Models of Language Relationships, with Reference to the Position of Indo-European among other Language Families, Journal of Indo-European Studies, Vol. 46, 2018, No. 1 & 2, pp. 1-264

Saturday, June 16, 2018

Yamnaya isn't from Iran just like R1a isn't from India

A strange thing sometimes happens in population genetics: highly capable and experienced researches come up with stupid ideas and push them so hard that, despite all the evidence to the contrary, they become accepted as truths. At least for a little while.

It's obvious now, thanks to full genome sequencing and ancient DNA, that Y-chromosome haplogroup R1a cannot be native to India. It arrived there rather recently from the Eurasian steppe, in all likelihood during the Bronze Age, probably as the Indus Valley Civilization (IVC) was collapsing or, perhaps, just after it had collapsed.

But for quite a few years this was something of a taboo, even politically incorrect, narrative, and it was vehemently rubbished by many Indians, including Indian scientists, and their western academic sympathizers.

Indeed, a whole series of papers came out, often in high brow scientific journals, claiming that R1a originated in South Asia, and that it spread from there to Europe. This, it was also claimed, was the final nail in the coffin of the so called Aryan Invasion Theory (AIT), because R1a was often described as the "Aryan" haplogroup.

I wasn't impressed by any of this nonsense. I said so here and elsewhere, to the great annoyance of those who believed, against all reason and logic, that the Indo-Aryans, and even Indo-Europeans, were indigenous to India. Here's a taste of some of my work on the topic going back to 2013.

South Asian R1a in the 1000 Genomes Project

Children of the Divine Twins

The Poltavka outlier

Looking back, it's all a bit rough, but very cool nonetheless. However, I was often accused of being biased, unscientific and even bigoted and racist as a result of offering such commentary and research. Make no mistake, my detractors were seething that I would dare to question what was apparently a scientific reality, and they wanted to shut me up. It was a nasty experience, but it now feels great to be vindicated.

Certainly, nowadays, no objective person who, more or less, knows their stuff would argue that the vast majority of the R1a in India doesn't ultimately derive from the Pontic-Caspian steppe in Eastern Europe.

But otherwise things haven't changed all that much since then. For instance, despite a whole heap of ancient DNA data being available from Eastern Europe and West Asia, there's a widely accepted idea that the Early Bronze Age (EBA) Yamnaya culture formed on the Pontic-Caspian steppe as a result of migrations from what is now Iran.

This is not true. It can't be true, because it's contradicted by all of the data. I've tried to explain this on several occasions, but generally to no avail.

Yamnaya =/= Eastern Hunter-Gatherers + Iran Chalcolithic

Another look at the genetic structure of Yamnaya

Likely Yamnaya incursion(s) into Northwestern Iran

Thus, the Yamnaya people and culture were indigenous to Eastern Europe, and basically formed as a result of the amalgamation of at least three different populations closely related to Eastern European Hunter-Gatherers (EHG), Caucasus Hunter-Gatherers (CHG), Early European Farmers (EEF) and Western European Hunter-Gatherers (WHG). They did not harbor any significant ancestry from what is now Iran; at least not from within any reasonable time frame.

However, me communicating this fact has resulted in some rather strange and unsavory reactions from a number of individuals who appear to have a big emotional investment in this issue. They become frustrated and even angry when I try to explain to them that there's no sense in looking for the genetic origins of Yamnaya in Iran, much like the people who argued with me when I tried to reason with them that R1a wasn't native to India. Here's an example from a recent blog post (for the full conversation scroll down to the comments here).

Heh, here we go again with the accusations of bias, scientific impropriety and whatnot. Ironically, the poor chap just couldn't comprehend that he never had an argument to begin with, quite obviously due to his own bias in regards to this topic. Well, at least he didn't call me a racist.

In a recent preprint, Wang et al. correctly characterized Yamnaya as, by and large, a mixture of populations closely related to EHG, CHG, EEF and WHG (see here), with no obvious input from what is now Iran. Sounds familiar, right?

They also discovered that, during the Chalcolithic and Bronze Age, the Caucasus and nearby steppes were mainly home to three quite distinct populations: 1) Steppe groups, including Eneolithic steppe and Caucasus Yamnaya, 2) Caucasus groups, including Kura-Araxes and Maykop, and 3) Steppe Maykop, which they classified as part of 1. These populations were all separated by clear genetic and cultural borders, with significant and unambiguous mixture from the Caucasus cluster only in a couple of Steppe Maykop outliers and one Yamnaya outlier from what is now Ukraine.

Clearly, this leaves no room for any migrations from what is now Iran to the steppe that would potentially give rise to Yamnaya. In other words, the main genetic ingredients for what was to become Yamnaya were already on the steppe well before Yamnaya, during the Eneolithic, and it's quite likely that they were indigenous to the region.

However, interestingly, Wang et al. did appear to try to save the link between Yamnaya and Iran by referring to the CHG-related ancestry in Yamnaya as "CHG/Iranian". I'm not surprised because most of these authors are associated with the Max Planck Institute for the Science of Human History (MPI-SHH), which is currently pushing a proposal that the Proto-Indo-European (PIE) homeland was located in what is now Iran and surrounds (see here). So, obviously, they need to somehow show a relationship between Yamnaya and Iran, because Yamnaya and the closely related Corded Ware archaeological complex are generally seen as early Indo-European cultural horizons. Good luck with that.

Actually, let me make it clear once and for all that I couldn't care less where the very first Indo-European words were uttered. It's just something that I find interesting. I rather doubt that this was within the borders of present-day Iran, and I explained in some detail why in a post almost two years ago (see here). But if someone manages to prove that the PIE homeland was indeed located partly or wholly within what is now Iran, that's OK. I won't be emotionally traumatized as a result.

However, obviously, this will have to be done with the assumption in mind that Yamnaya and Corded Ware became Indo-European-speaking almost purely via an linguistic transmission, with hardly any associated gene flow. It's possible, I guess. But then there's almost 200 years of scholarship based on linguistics and archaeological data that generally agrees in favor of the Pontic-Caspian steppe as the PIE homeland.

On a related note, I also couldn't care less whether the Aryan Invasion Theory (AIT) reflects what really happened during the Indo-Europeanization of South Asia, or if it's more appropriate to call it the Aryan Migration Theory (AMT). I'll accept whatever an objective analysis of all of the relevant data shows when we have enough of it to make an informed decision.

However, currently, I see nothing in the data that would prevent the AIT from being true. To me, the profound impact that the Bronze Age steppe peoples obviously had on South Asia, and especially on the Indo-European-speaking Indian upper castes, suggests that, overall, an invasion-like scenario is quite plausible. But I might be wrong, and so what if I am?

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Tuesday, June 12, 2018

Dali_EBA and West_Siberia_N in qpGraph

Below is a qpGraph tree that I've been working on for a while. I'll be posting more output from random analyses like this from now on. The relevant graph file is available in the zip folder here. Any ideas what else can be done with this topology?

See also...

Graeco-Aryan parallels

Friday, June 8, 2018

Of horses and men

Y-HT-1 is today by far the most common Y-chromosome haplogroup in domesticated horse breeds. According to Wutke et al. 2018, this is probably the result of artificial, human induced selection for this lineage, initially on the Eurasian steppe during the Iron Age, and then subsequently in Europe during the Roman period (see here).

However, during the Bronze and Iron Ages, before Y-HT-1 reached fixation, another very important Y-haplogroup in domesticated horses was its older sister clade Y-HT-4.

Indeed, it's likely that both Y-HT-1 and Y-HT-4 first dominated the domesticated horse gene pool during the Bronze Age, probably because they happened to have been present in the horse population exploited by the early Indo-Europeans. This was missed, or at least not directly discussed by Wutke et al., but I'd say it's a fairly obvious conclusion that can be drawn from their data, especially if we consider the fact that horses are the most important animal in the Indo-European pantheon.

Thus, the story of Y-HT-1 and, up to a point, Y-HT-4 is probably very similar to that of two human Y-haplogroups, R1a-M417 and R1b-M269. Both of these lineages also rose to prominence rather suddenly during the Eneolithic and Bronze Age, in all likelihood because they were present amongst early Indo-European-speaking males (see here).

Below is a map of the earliest reliably called and dated instances of Y-HT-1, Y-HT-4, R1a-M417 and R1b-M269 in the ancient DNA record. Not surprisingly, all of the points on the map are located on or very close to the Pontic-Caspian steppe, which is generally accepted to have been the Proto-Indo-European homeland. Fascinating stuff.

See also...

Central Asia as the PIE urheimat? Forget it

Cultural hitchhiking and competition between patrilineal kin groups may have led to the post-Neolithic Y-chromosome bottleneck (Zeng et al. 2018)

Was Ukraine_Eneolithic I6561 a Proto-Indo-European?

Thursday, May 31, 2018

What's Maykop (or Iran) got to do with it? #2

For the past few days I've been trying to copy and also improve on the qpGraph tree in the Wang et al. preprint (see here). I've managed to come up with a new version of my model that not only offers a better statistical fit, but, in my opinion, also a much more sensible solution. For instance, the Eastern Hunter-Gatherer node now shows 73% MA1-related admixture, which, I'd say, makes more sense than the 10% in the previous version. The relevant graph file is available here.

Samara Yamnaya can be perfectly substituted in this graph by early Corded Ware samples from the Baltic region (CWC_Baltic_early) and a pair of Yamnaya individuals from what is now Ukraine. This is hardly surprising, considering how similar all of these samples are to each other in other analyses, but it's nice to see nonetheless, because I think it helps to confirm the reliability of my model.

And yes, I have tested all sorts of other Yamnaya-related ancient and present-day populations with this tree. They usually pushed the worst Z score to +/- 3 and well beyond, probably because they weren't similar enough to Yamnaya. But, perhaps surprisingly, Bell Beakers from Britain produced a decent result (see here).

See also...

On the genetic prehistory of the Greater Caucasus (Wang et al. 2018 preprint)

Another look at the genetic structure of Yamnaya

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, May 25, 2018

Cultural hitchhiking and competition between patrilineal kin groups may have led to the post-Neolithic Y-chromosome bottleneck (Zeng et al. 2018)

A very interesting paper has just appeared at Nature Communications that potentially offers an explanation for the well documented explosions of certain Y-chromosome lineages in the Old World after the Neolithic, such as those that led to most European males today belonging to Y-haplogroups R1a and R1b (LINK). I might have more to say about this paper in the comments below after I've read it a couple of times. Emphasis is mine:

In human populations, changes in genetic variation are driven not only by genetic processes, but can also arise from cultural or social changes. An abrupt population bottleneck specific to human males has been inferred across several Old World (Africa, Europe, Asia) populations 5000–7000 BP. Here, bringing together anthropological theory, recent population genomic studies and mathematical models, we propose a sociocultural hypothesis, involving the formation of patrilineal kin groups and intergroup competition among these groups. Our analysis shows that this sociocultural hypothesis can explain the inference of a population bottleneck. We also show that our hypothesis is consistent with current findings from the archaeogenetics of Old World Eurasia, and is important for conceptions of cultural and social evolution in prehistory.


If the primary unit of sociopolitical competition is the patrilineal corporate kin group, deaths from intergroup competition, whether in feuds or open warfare, are not randomly distributed, but tend to cluster on the genealogical tree of males. In other words, cultural factors cause biases in the usually random process of transmission of Y-chromosomes, increasing the rate of loss of Y-chromosomal lineages and accelerating genetic drift. Extinction of whole patrilineal groups with common descent would translate to the loss of clades of Y-chromosomes. Furthermore, as success in intergroup competition is associated with group size, borne out empirically in wars [43] as ‘increasing returns at all scales’ [44], and as larger group size may even be associated with increased conflict initiation, borne out in data on feuds45, there may have been positive returns to lineage size. This would accelerate the loss of minor lineages and promote the spread of major ones, further increasing the speed of genetic drift.

In addition, the assimilation of women from groups that are disrupted or extirpated through intergroup competition into remaining groups is a common result of warfare in small-scale societies [46]. This, together with female exogamy, would tend to limit the impact of intergroup competition to Y-chromosomes.


Figure 6 shows a striking pattern of differences in shallowness of coalescence in samples from hunter-gatherer, farmer and pastoralist cultures. While hunter-gatherer Y-chromosomes from the same culture, and often the same sites, commonly divide into haplotypes that coalesce in multiple millennia, Y-chromosomes of samples from farmer and pastoralist cultures are more homogeneous and have more recent coalescences. The Bell Beaker culture has a high proportion of sampled males (81%) from a large geographical area (Iberia to Hungary) who belong to an identical Y-chromosomal haplogroup (R1b-S116), implying common descent from a kin group that existed quite recently. Some groups of males share even more recent descent, on the order of ten generations or fewer [64]. Such recent common descent may even be retained in cultural memory via oral genealogies, such as among descent groups in Northern and Western Africa, whose members can trace descent relationships up to three to four centuries before the generation currently living [40]. Likewise, from Germany to Estonia, the Y-chromosomes of all Corded Ware individuals sampled, except one, belong to a single clade within haplogroup R1a (R1a-M417) and appear to coalesce shortly before sample deposition.

Thus, groups of males in European post-Neolithic agropastoralist cultures appear to descend patrilineally from a comparatively smaller number of progenitors when compared to hunter gatherers, and this pattern is especially pronounced among pastoralists. Our hypothesis would predict that post-Neolithic societies, despite their larger population size, have difficulty retaining ancestral diversity of Y-chromosomes due to mechanisms that accelerate their genetic drift, which is certainly in accord with the data. The tendency of pastoralist cultures to show the lowest Y-chromosomal diversity and the shallowest coalescence would also be explained, as they may have experienced the social conditions that characterized cultures of the Central Asian steppes [42]. Indeed, the Corded Ware pastoralists may have been organized into segmentary lineages [65], an extremely common tribal system among pastoralist cultures, including those of historical Central Asia [66].

Zeng et al., Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck, Nature Communicationsvolume 9, Article number: 2077 (2018) doi:10.1038/s41467-018-04375-6

Update 30/05/2018: For those clued in, here's an awesome quote from the relevant press release.

The outlines of that idea came to Tian Chen Zeng, a Stanford undergraduate in sociology, after spending hours reading blog posts that speculated - unconvincingly, Zeng thought - on the origins of the "Neolithic Y-chromosome bottleneck," as the event is known.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Thursday, May 24, 2018

What's Maykop (or Iran) got to do with it?

I had a go at imitating this qpGraph tree, from the recent Wang et al. preprint on the genetic prehistory of the Caucasus, using the ancient samples that were available to me. I'm very happy with the outcome, because everything makes good sense, more or less. The real populations and singleton individuals, ten in all, are marked in red. The rest of the labels refer to groups inferred from the data.

However, this is still a work in progress, and, if possible, I'd like simplify the model and also get the worst Z score much closer to zero. If anyone wants to help out, the graph file is available HERE. Feel free to post your own versions in the comments, and I'll run them for you as soon as I can.

Update 31/05/2018: I've managed to come up with a new version of my model that not only offers a better statistical fit, but, in my opinion, also a much more sensible solution. For instance, the Eastern Hunter-Gatherer node now shows 73% MA1-related admixture, which, I'd say, makes more sense than the 10% in the previous version. The relevant graph file is available here.

For more details and a discussion about the updated model, including additional trees with Baltic Corded Ware and British Beaker samples, please check out my new thread on the topic at the link below.

What's Maykop (or Iran) got to do with it? #2


Wang et al., The genetic prehistory of the Greater Caucasus, bioRxiv, posted May 16, 2018, doi:

See also...

On the genetic prehistory of the Greater Caucasus (Wang et al. 2018 preprint)

Another look at the genetic structure of Yamnaya

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, May 23, 2018

More Botai genomes (Jeong et al. 2018 preprint)

Over at bioRxiv at this LINK. Actually, these may or may not be the same Botai genomes that have already been published along with Damgaard et al. 2018 (see comments below for the discussion about that). Here's the abstract. Emphasis is mine:

The indigenous populations of inner Eurasia, a huge geographic region covering the central Eurasian steppe and the northern Eurasian taiga and tundra, harbor tremendous diversity in their genes, cultures and languages. In this study, we report novel genome-wide data for 763 individuals from Armenia, Georgia, Kazakhstan, Moldova, Mongolia, Russia, Tajikistan, Ukraine, and Uzbekistan. We furthermore report genome-wide data of two Eneolithic individuals (~5,400 years before present) associated with the Botai culture in northern Kazakhstan. We find that inner Eurasian populations are structured into three distinct admixture clines stretching between various western and eastern Eurasian ancestries. This genetic separation is well mirrored by geography. The ancient Botai genomes suggest yet another layer of admixture in inner Eurasia that involves Mesolithic hunter-gatherers in Europe, the Upper Paleolithic southern Siberians and East Asians. Admixture modeling of ancient and modern populations suggests an overwriting of this ancient structure in the Altai-Sayan region by migrations of western steppe herders, but partial retaining of this ancient North Eurasian-related cline further to the North. Finally, the genetic structure of Caucasus populations highlights a role of the Caucasus Mountains as a barrier to gene flow and suggests a post-Neolithic gene flow into North Caucasus populations from the steppe.

Jeong et al., Characterizing the genetic history of admixture across inner Eurasia, Posted May 23, 2018, doi:

See also...

New PCA featuring Botai horse tamers, Hun and Saka warriors, and many more...

Global25 workshop 2: intra-European variation

Even though the Global25 focuses on world-wide human genetic diversity, it can also reveal a lot of information about genetic substructures within continental regions.

Several of the dimensions, for instance, reflect Balto-Slavic-specific genetic drift. I ensured that this would be the case by running a lot of Slavic groups in the analysis. A useful by-product of this strategy is that the Global25 is very good at exposing relatively recent intra-European genetic variation.

To see this for yourself, download the datasheet below and plug it into the PAST program, which is freely available here. Then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Ordination > Principal Components.


You should end up with the plot below. Note that to see the group labels and outlines, you need to tick the appropriate boxes in the panel to the right of the image. To improve the experience, it might also be useful to color-code different parts of Europe, and you can do that by choosing Edit > Row colors/symbols. Of course, if you have Global25 coordinates you can add yourself to the datasheet to see where you plot.

Components 1 and 2 pack the most information and, more or less, recapitulate the geographic structure of Europe. However, many details can only be seen by plotting the less significant components. For instance, a plot of components 1 and 3 almost perfectly separates Northeastern Europe into two distinct clusters made up of the speakers of Indo-European and Finno-Ugric languages.

This plot might also be useful for exploring potential Jewish ancestry, because Ashkenazi, Italian and Sephardi Jews appear to be relatively distinct in this space. Thus, people with significant European Jewish ancestry will "pull" towards the lower left corner of the plot. For example, someone who is half Ashkenazi and half German will probably land in the empty space between the Northwest Europeans and Jews.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 PAST-compatible datasheets

Monday, May 21, 2018

Global25 workshop 1: that classic West Eurasian plot

In this Global25 workshop I'm going to show how to reproduce, more or less, that classic plot of West Eurasian genetic diversity seen regularly in ancient DNA papers and at this blog (for instance, here). To do this you'll need the datasheet below, which I'll be updating regularly, and the PAST program, which is freely available here.


This is what you'll get if you follow my instructions to the letter. Note the fairly strong correlation with geography. I think this is impressive for so many reasons.

OK, so, download the said datasheet, plug it into PAST, select columns 1 to 8, and go to Multivariate > Ordination > Principal Components. Here's a screen cap of me doing it:

The initial output won't resemble my plot above. So you'll need to place PC2 on the X axis, PC1 on the Y axis, and set the image size to 1206x706. After doing that, you should end up with exactly this:

Then, export the image, flip it horizontally with whatever imaging software that can do the job, and that's it, unless you want to add some labels like I did. Feel free to ask questions and make suggestions in the comments below.

See also...

Global25 workshop 2: intra-European variation

Global25 PAST-compatible datasheets

Saturday, May 19, 2018

Global25 PAST-compatible datasheets

I'm planning to run regular workshops over the next few months on how to get the most out of Global25 data with various programs, and expecially PAST (see here). So if you have Global25 coordinates, please stay tuned.

To that end, I've put together four color-coded, PAST-compatible Global25 datasheets with thousands of present-day and ancient samples, available at the links below:





PAST is an awesome little statistical program and simple to use. The manual is available here. To kick things off, here's a quick guide how to run a Neighbor Joining tree on your Global25 coordinates:

- download the Global_25_PCA_pop_averages_scaled.dat from the last link above

- open the dat file with something a little more advanced than Windows notepad, like, say, TextPad (see here)

- stick your scaled coordinates at the bottom of the sheet, so that they look exactly like those of the other samples, except give yourself an original symbol, like, say, a black star

- open the edited dat file with PAST and choose all of the columns and rows by clicking the empty tab above the labels

- then, at the top, go to Multivariate > Clustering > Neighbor joining

After a few seconds you should see a nice, color-coded tree like the one below, except you'll also be on it, in black text. I'm very happy with these results, by the way. As far as I can see, all of the populations and individuals cluster exactly where they should.

Those of you who are already very proficient in using PAST, feel free to go nuts with these new datasheets and show us the results in the comments below. I'll try to put together a workshop for beginners within the next couple of weeks.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Wednesday, May 16, 2018

On the genetic prehistory of the Greater Caucasus (Wang et al. 2018 preprint)

Finally, the focus shifts to the Eneolithic/Bronze Age North Caucasus. In a new manuscript at bioRxiv, Wang et al. present genome-wide SNP data for 45 prehistoric individuals from the region along a 3000-year temporal transect (see here). From the preprint (emphasis is mine):

Based on PCA and ADMIXTURE plots we observe two distinct genetic clusters: one cluster falls with previously published ancient individuals from the West Eurasian steppe (hence termed ‘Steppe’), and the second clusters with present-day southern Caucasian populations and ancient Bronze Age individuals from today’s Armenia (henceforth called ‘Caucasus’), while a few individuals take on intermediate positions between the two. The stark distinction seen in our temporal transect is also visible in the Y-chromosome haplogroup distribution, with R1/R1b1 and Q1a2 types in the Steppe and L, J, and G2 types in the Caucasus cluster (Fig. 3A, Supplementary Data 1). In contrast, the mitochondrial haplogroup distribution is more diverse and almost identical in both groups (Fig. 3B, Supplementary Data 1).

Thus, the most important "Indo-European" Y-haplogroups today, R1a-M417 and R1b-M269, did not arrive in Europe from the Caucasus or Near East. They're native to Europe. Hence, it appears that Eneolithic/Bronze Age Eastern Europeans mostly acquired their Near Eastern-related ancestry via female exogamy from populations in the Caucasus. That's basically what I've been arguing for a few years now. It feels good to be vindicated, especially considering the unfair criticism that I was subjected to here and elsewhere because of expressing this opinion (for instance, see here).

However, as far as I can see, based on the samples in this preprint, neither the Caucasus Maykop nor steppe Maykop appear to be unambiguous sources of this southern admixture in ancient Eastern Europe. That's because the Caucasus Maykop mtDNA profile still looks somewhat off in this context, while steppe Maykop harbors West Siberian forager-related genome-wide ancestry that is practically absent in the Yamnaya and all other closely related peoples.

In any case, please note the happy coincidence that academia has finally caught up to this blog and managed to find European farmer-derived ancestry in Yamnaya:

Importantly, our results show a subtle contribution of both Anatolian farmer-related ancestry and WHG-related ancestry (Fig.4; Supplementary Tables 13 and 14), which was likely contributed through Middle and Late Neolithic farming groups from adjacent regions in the West. A direct source of Anatolian farmer-related ancestry can be ruled out (Supplementary Table 15). At present, due to the limits of our resolution, we cannot identify a single best source population. However, geographically proximal and contemporaneous groups such as Globular Amphora and Eneolithic groups from the Black Sea area (Ukraine and Bulgaria), which represent all four distal sources (CHG, EHG, WHG, and Anatolian_Neolithic) are among the best supported candidates (Fig. 4; Supplementary Tables 13,14 and 15).

Check out what I had to say about this issue exactly two years ago: Yamnaya = Khvalynsk + extra CHG + maybe something else. Not bragging, just making a point that I do know what I'm doing here, most of the time anyway.

Wang et al. conclude their preprint with, unfortunately I have to say, some downright bizarre comments in regards to the Proto-Indo-European (PIE) homeland debate. But I'll get back to that later, when the ancient data from this and forthcoming related papers are released online.


Wang et al., The genetic prehistory of the Greater Caucasus, bioRxiv, posted May 16, 2018, doi:

See also...

What's Maykop (or Iran) got to do with it?

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

New PCA featuring Botai horse tamers, Hun and Saka warriors, and many more...

Just in case anyone's wondering how the ancient samples from the two recent archaeogenetic papers by Damgaard et al. (Nauture and Science) behave in my two main Principal Component Analyses (PCA), here you go:

The relevant datasheet is available here. Over 90 of the new samples made into onto this plot, but to keep things simple I only highlighted a few of them. To see the positions of any or all of the rest, plug the datasheet into, say, PAST (freely available here) and create your own version of the plot. Also, below are links to updated Global25 datasheets, featuring coordinates for almost all of the new samples (available separately here).

Global 25 datasheet

Global 25 datasheet (scaled)

Global 25 pop averages

Global 25 pop averages (scaled)

The interesting thing about those Tien Shan nomads, especially the Kangju people, is that they're much more West Eurasian (European + West Asian) than the Asian Scythians sampled to date. However, despite this, they're still no good for modeling the West Eurasian ancestry of most South Asian populations. I've looked at this closely, and the Steppe_MLBA cluster is still the one to beat in this respect.

See also...

Genetic ancestry online store (to be updated regularly)

Sunday, May 13, 2018

Hittite-era Anatolians in qpAdm

The apparent lack of steppe ancestry in five Hittite-era, perhaps Indo-European-speaking, Anatolians was interpreted in Damgaard et al. 2018 as a major discovery with profound implications for the origin of the Anatolian branch of Indo-European languages.

But I disagree with this assessment, simply because none of these Hittite-era individuals are from royal Hittite, or Nes, burials. Hence, there's a very good chance that they were Hattians, who were not of Indo-European origin, even if they spoke the Indo-European Hittite language because it was imposed on them.

Moreover, I am actually seeing a minor, but persistent, signal of steppe ancestry in one of the two Old-Hittite Period (~1750–1500 BCE) samples: Anatolia_MLBA MA2203. Indeed, I can put together very coherent, chronologically sound models using a couple of different methods to demonstrate this. Below is a fairly decent qpAdm model.

Anatolia_EBA 0.794±0.073
Ukraine_Eneolithic_I6561 0.206±0.073
tail: 0.400704
Full output

Obviously, these numbers aren't exactly impressive. But if the signal is real, then it might be an indication of things to come when someone manages to sequence at least a few genomes from confirmed Hittite remains. None of the other Anatolia_MLBA individuals, three of whom are from the Assyrian Colony Period (~2000–1750 BCE), show such obvious steppe ancestry.

Anatolia_EBA 1.000
Ukraine_Eneolithic_I6561 0.000
tail: 0.449485
Full output

Anatolia_EBA 0.983±0.069
Ukraine_Eneolithic_I6561 0.017±0.069
tail: 0.618499
Full output

Anatolia_EBA 0.868±0.089
Ukraine_Eneolithic_I6561 0.132±0.089
tail: 0.708811
Full output

Anatolia_MLBA w/o MA2203
Anatolia_EBA 1.000
Ukraine_Eneolithic_I6561 0.000
tail: 0.286377
Full output

In any case, apart from all of that, Damgaard et al. do take a measured and sober approach to interpreting their archaeogenetic data in the context of the Indo-European homeland debate. The paper also includes a very thorough linguistic supplement, freely available here, which reveals that there is Eastern European Hunter-Gatherer (EHG) ancestry in soon to be published Maykop culture samples. From the supplement (emphasis is mine):

Despite a general agreement on a Pontic-Caspian origin of the Anatolian Indo-European language family, it is currently impossible to determine on linguistic grounds whether the language reached Anatolia through the Balkans in the West (Anthony 2007; Mallory 1989: 30; Melchert 2003; Steiner 1990; Watkins 2006: 50) or through the Caucasus in the East (Kristiansen 2005: 77; Stefanini 2002; Winn 1981). From their earliest attestations, the Anatolian languages are clustered in Anatolia, and if the distribution reflects a prehistoric linguistic speciation event (as argued by Oettinger 2002: 52), then it may be taken as an indication that the arrival and disintegration of Proto-Anatolian language took place in the same area (Steiner 1981: 169). However, others have reasoned that the estimated period between the dissolution of the Proto-Anatolian language and the attestation of the individual daughter languages is extensive enough to allow for prehistoric mobility within Anatolia, theoretically leaving plenty of time for secondary East-to-West dispersals (cf. Melchert 2003: 25).

Whatever the case may be, there are no linguistic indications for any mass migration of steppe-derived Anatolian speakers dominating or replacing local populations. Rather, the Anatolian Indo-European languages appear in history as an organically integrated part of the linguistic landscape. In lexicon, syntax, and phonology, the second millennium languages of Anatolia formed a convergent, diffusional linguistic area (Watkins 2001: 54). Though the presence of an Indo-European language itself demonstrates that a certain number of speakers must have entered the area, the establishment of the Anatolian Indo-European branch in Anatolia is likely to have happened through a long-term process of infiltration and acculturalization rather than through mass immigration or elite dominance (Melchert 2003: 25). Furthermore, the genetic results presented in Damgaard et al. 2018 show no indication of a large-scale intrusion of a steppe population. The EHG ancestry detected in individuals associated with both Yamnaya (3000–2400 BCE) and the Maykop culture (3700–3000 BCE) (in prep.) is absent from our Anatolian specimens, suggesting that neither archaeological horizon constitutes a suitable candidate for a “homeland” or “stepping stone” for the origin or spread of Anatolian Indo-European speakers to Anatolia. However, with the archaeological and genetic data presented here, we cannot reject a continuous small-scale influx of mixed groups from the direction of the Caucasus during the Chalcolithic period of the 4th millennium BCE.


Under the “Steppe Hypothesis,” the Indo-Iranian languages are not seen as indigenous to South Asia but rather as an intrusive branch from the northern steppe zone (cf. Anthony 2007: 408–411; Mallory 1989: 35–56; Parpola 1995; Witzel 1999, 2001). Important clues to the original location and dispersal of the Indo-Iranians into South and Southwest Asia are provided by the Indo-Iranian languages themselves.

The Indo-Aryan and Iranian languages share a common set of etymologically related terms related to equestrianism and chariotry (Malandra 1991). Since it can be shown that this terminology was inherited from their Proto-Indo-Iranian ancestor, rather than independently borrowed from a third language, the split of this ancestor into Indo-Aryan and Iranian languages must postdate these technological innovations. The earliest available archaeological evidence of two-wheeled chariots is dated to approximately 2000 BCE (Anthony 1995; Anthony and Ringe 2015; Kuznetsov 2006: 638–645; Teufer 2012: 282). This offers the earliest possible date so far for the end of Proto-Indo-Iranian as a linguistic unity. The reference to a mariannu in a text from Tell speakers. Leilān in Syria discussed below pushes the latest possible period of Indo-Iranian linguistic unity to the 18th century BCE.


The traces of early Indo-Aryan speakers in Northern Syria positions the oldest Indo-Iranian speakers somewhere between Western Asia and the Greater Punjab, where the earliest Vedic text is thought to have been composed during the Late Bronze Age (cf. Witzel 1999: 3). In addition, a northern connection is suggested by contacts between the Indo-Iranian and the Finno-Ugric languages. Speakers of the Finno-Ugric family, whose antecedent is commonly sought in the vicinity of the Ural Mountains, followed an east-to-west trajectory through the forest zone north and directly adjacent to the steppes, producing languages across to the Baltic Sea. In the languages that split off along this trajectory, loanwords from various stages in the development of the Indo-Iranian languages can be distinguished: 1) Pre-Proto-Indo-Iranian (Proto-Finno-Ugric *kekrä (cycle), *kesträ (spindle), and *-teksä (ten) are borrowed from early preforms of Sanskrit cakrá- (wheel, cycle), cattra- (spindle), and daśa- (10); Koivulehto 2001), 2) Proto-Indo-Iranian (Proto-Finno-Ugric *śata (one hundred) is borrowed from a form close to Sanskrit śatám (one hundred), 3) Pre-Proto-Indo-Aryan (Proto-Finno-Ugric *ora (awl), *reśmä (rope), and *ant- (young grass) are borrowed from preforms of Sanskrit ā́ r ā- (awl), raśmí- (rein), and ándhas- (grass); Koivulehto 2001: 250; Lubotsky 2001: 308), and 4) loanwords from later stages of Iranian (Koivulehto 2001; Korenchy 1972). The period of prehistoric language contact with Finno-Ugric thus covers the entire evolution of Pre-Proto-Indo-Iranian into Proto-Indo-Iranian, as well as the dissolution of the latter into Proto-Indo-Aryan and Proto-Iranian. As such, it situates the prehistoric location of the Indo-Iranian branch around the southern Urals (Kuz’mina 2001).


Guus Kroonen, Gojko Barjamovic, & Michaël Peyrot. (2018). Linguistic supplement to Damgaard et al. 2018: Early Indo-European languages, Anatolian, Tocharian and Indo-Iranian.

Update 14/05/2018: I managed to, more or less, reproduce my qpAdm models with qpGraph. This is never a simple and easy task, so I'm now more confident that Anatolia_MLBA MA2203 really does harbor ancestry from the steppe.

See also...

Likely Yamnaya incursion(s) into Northwestern Iran

Graeco-Aryan parallels

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Thursday, May 10, 2018

Graeco-Aryan parallels

The clearly non-local admixture in the geographically and genetically disparate, but Indo-European-speaking, ancient Mycenaeans and present-day North Indian Brahmins is very similar. So similar, in fact, that it could derive from practically the same population in space and time. The most plausible source for this admixture are the Bronze Age herders of the Pontic-Caspian steppe and their immediate descendants, such as those belonging to the Sintashta and other closely related archaeological cultures.

To prove and simultaneously illustrate this point, below are a couple of Admixture graph or qpGraph analyses. Note that I was also able to add Balkans_BA I2163 to the Mycenaean model. This is an Srubnaya-like ancient sample from the southern Balkans dating to the early Mycenaean period. Not only does Balkans_BA I2163 help to further constrain the model, but it also suggests a proximate source of steppe-related admixture into the population that potentially gave rise to the Mycenaeans. The relevant graph files are available here.

Considering that the Bronze Age peoples of the Pontic-Caspian steppe are the only obvious and direct, and, hence, most plausible link between the Mycenaeans and Brahmins, it follows that they are also the most likely vector for the spread of Indo-European speech to ancient Greece and South Asia. Or not? But if not, then what are the alternatives, and I mean real alternatives, not just excuses? If you think that you can offer a genuine alternative then feel free to do so in the comments below. However, be warned, stupid sh*t won't be tolerated.

See also...

Main candidates for the precursors of the proto-Greeks in the ancient DNA record to date

On the doorstep of India

Steppe admixture in Mycenaeans, lots of Caucasus admixture already in Minoans (Lazaridis et al. 2017)

Monday, May 7, 2018

Protohistoric Swat Valley peoples in qpGraph #2

Three options. Just one passes muster; the one with Sintashta. Coincidence? I think not. Who still wants to claim that there's no Sintashta-related steppe stuff in these Iron Age SPGT South Asians? The relevant graph files are available here. Any ideas for better models?

Update 08/05/2018: The reason that I chose Dzharkutan1_BA, from what is now Uzbekistan, as the BMAC proxy in the above graphs was because it's geographically a proximate choice for SPGT. However, I've since discovered that Gonur1_BA, from what is now Turkmenistan, does a somewhat better job in these models. The additional graph files are available at the same link as above here.

See also...

Protohistoric Swat Valley peoples in qpGraph

The protohistoric Swat Valley "Indo-Aryans" might not be exactly what we think they are

Friday, May 4, 2018

The protohistoric Swat Valley "Indo-Aryans" might not be exactly what we think they are

I need some help interpreting these linear models of ancient and present-day South Asian populations. Overall, the Iron Age groups from the Swat Valley, or SPGT, look like rather obvious outliers. The relevant datasheet is available here.

This might be because of significant bidirectional gene flow and/or continuity between Central Asia and the northern parts of South Asia before Sintashta-related steppe herders showed up in the region, and even before the Bactria Margiana Archaeological Complex (BMAC) got going. Note that Dzharkutan1_BA is an BMAC sub-population from near South Asia, but it doesn't quite have the same effect on those Swat Valley samples as the pre-BMAC Shahr_I_Sokhta_BA1 from the present-day Iranian/Afghan border.

If true, it probably means that most of the Iron Age peoples of the Swat Valley shouldn't be modeled as simply a mixture of Indus_Periphery and Steppe_MLBA. That's because they appear to be in part of the same or similar type of ancestry as Shahr_I_Sokhta_BA1. And indeed, qpAdm also suggests that they are.

Indus_Periphery 0.692±0.042
Shahr_I_Sokhta_BA1 0.104±0.045
Sintashta_MLBA 0.204±0.015
Tail: 0.659609
Full output

I'm trying to incorporate this new information into my Admixture graph models of the SPGT groups (see here). If I manage to come up with something useful I'll update this post with the results.

Update 08/05/2018: see here.

Wednesday, May 2, 2018

Open analysis thread: genetic distance (Fst) matrix focusing on ancient Central and South Asia

I'm hoping that we can learn something new about the genomic prehistory of Eurasia, and especially Central and South Asia, based on this massive new Fst matrix:

Ancient Central and South Asia genetic distance (Fst) matrix

Hint: it's probably easiest to initially explore this format with a program called PAST. Indeed, if you'd like to model fine scale ancestry proportions based on these data, it might be a good idea to use PAST to first turn the matrix into a principal coordinates (PCoA) datasheet (like this).

On a related note, as I was typing this, commentator Chetan alerted me to a post at the Molgen forum claiming that Y-haplogroup R1b-L51 has turned up in Eneolithic remains from Pontic-Caspian steppe (see here). If true, then it's a big deal, because it's the best evidence yet that L51 expanded into Central and Western Europe from the steppes. This is the Google translation of the post. Emphasis is mine.

Hello. Today, the XIV Samara Archeological Conference was held. The following reports were heard. Khokhlov AA Preliminary results of anthropological and genetic studies of materials of the Volga-Ural region of the Neolithic-Early Bronze Age by an international group of scientists. In his report, AA Khokhlov. introduced into scientific circulation until the unpublished data of the new Eneolithic burial ground Ekatirinovsky cape, which combines both the Mariupol and Khvalyn features, and refers to the fourth quarter of the V millennium BC. All samples analyzed had a uraloid anthropological type, the chromosome of all the samples belonged to the haplogroup R1b1a2 (R-P 312 / S 116), and the haplogroup R1b1a1a2a1a1c2b2b1a2. Mito to haplogroups U2, U4, U5. In the Khvalyn burial grounds (1 half of the 4th millennium BC), the anthropological material differs in a greater variety. In addition to the uraloid substratum, European broad-leaved and southern-European variants are recorded. To the game haplogroup R1a1, O1a1, I2a2 are added to mito T2a1b, H2a1.

I'd say that this information sounds legit. But let's wait and see if the results are backed up by one of the major ancient DNA labs in the west, like the Reich Lab.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Monday, April 30, 2018

Zoroastrian genetic origins revisited

About a year ago I found that the ancestry of present-day Iranians was best explained as largely a mixture between early Anatolian and Iranian farmers and Sarmatians from the Pontic-Caspian steppe (see here).

Things have now changed somewhat after the release of several hundred ancient samples from across Eurasia. Below are the best qpAdm models that I was able to find for various Iranian ethnic/regional populations based on my new dataset.

Ganj_Dareh_N 0.363±0.031
Hajji_Firuz_ChL 0.481±0.029
Karagash_MLBA 0.156±0.019
tail: 0.753635
Full output

Ganj_Dareh_N 0.056±0.042
Hajji_Firuz_ChL 0.883±0.039
Karagash_MLBA 0.061±0.027
tail: 0.862141
Full output

Ganj_Dareh_N 0.598±0.048
Hajji_Firuz_ChL 0.244±0.045
Karagash_MLBA 0.158±0.030
tail: 0.604908
Full output

Dashti_Kozy_BA 0.143±0.025
Ganj_Dareh_N 0.286±0.034
Hajji_Firuz_ChL 0.571±0.029
tail: 0.994129
Full output

Ganj_Dareh_N 0.309±0.035
Hajji_Firuz_ChL 0.556±0.029
Yamnaya_Samara 0.134±0.019
tail: 0.383344
Full output

Ganj_Dareh_N 0.279±0.045
Hajji_Firuz_ChL 0.600±0.048
Yamnaya_Samara 0.073±0.048
West_Siberia_N 0.048±0.033
tail: 0.413456
Full output

Ganj_Dareh_N 0.417±0.033
Hajji_Firuz_ChL 0.464±0.031
Karagash_MLBA 0.120±0.020
tail: 0.777933
Full output

Bustan_BA 0.352±0.053
Dashti_Kozy_BA 0.168±0.031
Hajji_Firuz_ChL 0.480±0.036
tail: 0.921955
Full output

However, all of the Iranian groups are still scoring a fair amount of ancient steppe ancestry, with the Zoroastrians ahead of the rest, which is potentially important, because they're basically a population relict from pre-Islamic Persia. Hence, this might be betraying their stronger ties to pre-Turkic, early Indo-Iranian Central Asia relative to the other Iranians. Also worth noting:

- As far as I can see, the Zoroastrians are the only Iranians in this analysis that really benefit from the addition of an Bactria Margiana Archaeological Complex (BMAC) reference population to their model, which might also be important, for the same reason outlined above

- There's no point modeling most of the Iranian groups as partly of Western Siberian forager (West_Siberia_N) origin, except perhaps the Mazandarani Iranians

- Indeed, Mazandarani Iranians are also the only group better modeled as part Yamnaya rather than Steppe_MLBA, which might be explained by Yamnaya-related incursions into what is now Northwestern Iran during the Early Bronze Age (see here)

- No matter what, I can't find a working model (P-value >0.05) for the Bandari Iranians using the new set of right pops aka outgroups, probably because the Bandaris harbor recent admixture from outside of Iran, including from Africa

On a related note, there's yet another feature in the Indian media about the impending publication of ancient DNA from the Harappan burial site at Rakhigarhi (see here). I've lost count of how many articles like this I've read over the last few years. But unlike the rest, this one actually reveals some specific information about the results: no Y-haplogroup R1a and no steppe ancestry in the Harappan sample or samples. So this time, I'd say that we're only days or weeks away from the publication of the relevant paper.

My final prediction in this context is that we'll see an ancient genome, or, hopefully, genomes, basically identical to the Indus_Periphery samples from Narasimhan et al. 2018 (see here). And then, apart from a few crazy people still shouting online that we need many more Harappan genomes because almost anything is yet possible, it'll be game over.

See also...

The mystery of the Sintashta people

On the doorstep of India

Indian smoke and mirrors

Friday, April 27, 2018

The mystery of the Sintashta people

During the Middle to Late Bronze Age, the steppes southeast of the Ural Mountains, in what is now Russia, were home to communities of metallurgists who buried their warriors with horses and the earliest examples of the spoked-wheel battle chariot.

We don't know what they called themselves, because they didn't leave any written texts, but their archaeological culture is commonly known as Sintashta. It was named after a river near one of their main settlements; an elaborate fortified town that has also been described as an ancient metallurgical industrial center. Another of their well known settlements, very similar to Sintashta, is Arkaim, pictured below courtesy of Wikipedia.

Sintashta is arguably one of the coolest ancient cultures ever discovered by archaeologists. It's also generally accepted to be the Proto-Indo-Iranian culture, and thus linguistically ancestral to a myriad of present-day peoples of Asia, including Indo-Aryans and Persians. No wonder then, that its origin, and that of its population, have been hotly debated issues.

The leading hypothesis based on archaeological data is that Sintashta is largely derived from the more westerly and warlike Abashevo culture, which occupied much of the forest steppe north of the Black and Caspian Seas. In turn, Abashevo is usually described as an eastern offshoot of the Late Neolithic Corded Ware Culture (CWC), which is generally seen as the first Indo-European archaeological culture in Northern Europe (see here).

Below is a Principal Component Analysis (PCA) featuring 38 Sintashta individuals from the recent Narasimhan et al. 2018 preprint. Note that the main Sintashta cluster overlaps almost perfectly with the main CWC cluster. The relevant datasheet is available here.

Moreover, many ancient and present-day South and Central Asians, particularly those identified with or speaking Indo-Iranian languages, appear to be strongly attracted to the main Sintashta cluster, forming an almost perfect cline between this cluster and the likely Indus Valley diaspora individuals who show no evidence of steppe ancestry.

This is in line with mixture models based on formal statistics showing significant Sintashta-related ancestry in Indo-Iranian-speakers (for instance, see here), and high frequencies of Y-haplogroup R1a-Z93 in both the Sintashta and many Indo-Iranian-speaking populations.

Some of the Sintashta samples are outliers from the main Sintashta cluster, and that's because they harbor elevated levels of ancestry related to the Mesolithic and Neolithic foragers of Eastern Europe and/or Western Siberia. This is especially true of a pair of individuals who belong to Y-haplogroup Q. However, this doesn't contradict archaeological data, which suggest that the Sintashta community may have been multi-cultural and multi-lingual. Indeed, it's generally accepted based on historical linguistics data that there were fairly intense contacts in North Eurasia between the speakers of Proto-Indo-Iranian, Proto-Uralic and Yeniseian languages.

Thus, it appears that there's not much left to debate because ancient DNA has seemingly backed up the most widely accepted hypotheses about the origin of Sintashta and its people, and their identification mainly as Proto-Indo-Iranian-speakers.

However, a sample from a Sredny Stog II culture burial on the North Pontic steppe, in what is now eastern Ukraine, has complicated matters somewhat. This individual, known as Ukraine_Eneolithic I6561, not only clusters very strongly with the most typical Sintashta samples, but also belongs to Y-haplogroup R1a-Z93. On the other hand, none of the CWC remains sequenced to date belong to this particular subclade of R1a (although, obviously, they do belong to a host of near and far related R1a subclades).

I've never seen anyone worth reading propose that Sintashta might derive from Sredny Stog II instead of Abashevo. And no wonder, because Sredny Stog II was long gone when Sintashta appeared in the archaeological record.

However, if CWC remains continue to fail to produce R1a-Z93, while, at the same time, the steppes of eastern Ukraine and surrounds are shown to be a hotbed of R1a-Z93 from the Sredny Stog to the Sintashta periods, which I think is possible, then ancient DNA might well force a serious re-examination of how the awesome Sintashta culture and people came to be.

See also...

On the doorstep of India

The beast among Y-haplogroups

Sunday, April 22, 2018

Likely Yamnaya incursion(s) into Northwestern Iran

Despite being stratigraphically dated to 5900-5500 BCE (ie. the Chalcolithic period), ancient sample Hajji_Firuz I2327 from Narasimhan et al. 2018, belongs to Y-haplogroup R1b-Z2103 and shows minor, but unambiguous, Yamnaya-related ancestry on the autosomes. Why is this a problem? Because both R1b-Z2103 and the Yamnaya culture are dated to the Bronze Age, and Yamnaya samples from Kalmykia and Samara are exceptionally rich in R1b-Z2103.

Hence, pending a successful radiocarbon (C14) dating analysis, it seems unlikely that Hajji_Firuz I2327 was alive during the Chalcolithic. Rather, it appears that he's partly of Yamnaya origin and has been wrongly dated. His remains are likely to be from a secondary burial from the Bronze Age that collapsed into the layer below, right into a Chalcolithic bin ossuary burial full of much older bones.

This scenario is strongly corroborated by data from two other ancient individuals from what is now Northwestern Iran:

- Hajji_Firuz_BA I4243 (also from Narasimhan et al. 2018 and from the same site as Hajji_Firuz I2327) was initially also stratigraphically dated to the Chalcolithic, but is now labeled as a Bronze Age sample after a radiocarbon (C14) analysis of the remains revealed a date of 2465-2286 calBCE. Moreover, this individual packs around 50% Yamnaya-related ancestry.

- Iran_IA F38 (from Broushaki et al. 2016) from an Iron Age burial at Tepe Hasanlu, which is just a few miles from Hajji Firuz, also belongs to Y-haplogroup R1b-Z2103 and harbors some sort of steppe ancestry on the autosomes (see here).

Below is a Principal Component Analysis (PCA) showing how this trio compare in terms of genome-wide ancestry to C14-dated Chalcolithic samples from Hajji Firuz and the nearby Seh Gabi. The relevant datasheet is available here.

Clearly, they're shifted "north" relative to the Chalcolithic group and thus closer to the Eneolithic/Bronze Age steppe cluster, suggesting that they carry steppe ancestry that was missing, or at least much less pronounced, in the region before the Bronze Age. I can use qpAdm and Global25/nMonte to double check this and also estimate more precisely their levels of Yamnaya-related admixture.

Afanasievo 0.172±0.033
Hajji_Firuz_ChL 0.313±0.156
Seh_Gabi_ChL 0.515±0.158
tail: 0.668410201 (full output)

Hajji_Firuz_ChL 0.484±0.033
Yamnaya_Samara 0.516±0.033
tail: 0.26511852 (full output)




Considering the standard errors and statical fits, qpAdm and Global25/nMonte have produced very similar results for both samples, which cannot be explained away as coincidental outcomes. I think these are signals of a population movement or movements from the Pontic-Caspian steppe into the South Caspian region, probably across the Caucasus, and most likely during the Bronze Age rather than the Chalcolithic.

I don't have a clue who these people were. It's rather unlikely that they were the early Iranians, who probably arrived in the region from Central Asia during the Late Bronze Age or even Iron Age (for instance, see here). Perhaps they were the Hittites? Indeed, in his book In Search of the Indo-Europeans, archaeologist James Mallory suggested that the ancestors of the Hittites and other Anatolian-speakers entered the Near East via the Caucasus route:

Most arguments for an Indo-European invasion from the northeast concern the appearance of a new burial rite at the end of the fourth and through the third millennium BC. At that time, both north of the Black Sea and the Caucasus, burials on the Russian-Ukrainian steppe were typically placed in an underground shaft and covered with a mound (kurgan in Russian). Before 3000 BC there begin to appear in the territory of the indigenous Transcaucasian (Kuro-Araxes) culture somewhat similar burials such as the royal tomb of Uch-Tepe on the Milska steppe. As tumulus burials are previously unknown in this region, some would explain their appearance by an intrusion of steppe pastoralists who migrated through the Caucasus and subjugated the local Early Bronze Age culture. More importantly, a status burial inserted into a mound at the site of Korucu Tepe in eastern Anatolia has been compared with somewhat similar burials both in the Caucasus and the Russian steppe. The discovery of horse bones on several sites of east Anatolia such as Norsun Tepe and Tepecik are seen to confirm a steppe intrusion since, as mentioned earlier, the horse, long known in the Ukraine and south Russia, is not attested in Anatolia prior to the Bronze Age.

Another option, however, is that they belonged to some other extinct Indo-European group, such as the Gutians (see here). In any case, keep an eye out for more Bronze Age samples from this part of the world. I have a strong feeling that, unlike their Neolithic and Chalcolithic predecessors, they will be rich in steppe ancestry and R1b-Z2103.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, April 18, 2018

Protohistoric Swat Valley peoples in qpGraph

If I was to add one thing to the Narasimhan et al. 2018 preprint, it'd be a series of uncomplicated qpGraph trees that back up, very simply and directly, the main conclusions in the manuscript. Such as this:

If some of you think that it's possible to show pretty much anything in these sorts of graphs, then you're wrong. For instance, it's not possible to swap West_Siberia_N for Sintashta, because the highest Z score usually blows out from almost nothing to well over five. And it's not possible to push Sintashta-related ancestry into Dravidian-speakers from South India. But if you think it is, then, by all means, have a go. The graph file is here.

See also...

Protohistoric Swat Valley peoples in qpGraph #2