search this blog

Saturday, June 16, 2018

Yamnaya isn't from Iran just like R1a isn't from India


A strange thing sometimes happens in population genetics: highly capable and experienced researches come up with stupid ideas and push them so hard that, despite all the evidence to the contrary, they become accepted as truths. At least for a little while.

It's obvious now, thanks to full genome sequencing and ancient DNA, that Y-chromosome haplogroup R1a cannot be native to India. It arrived there rather recently from the Eurasian steppe, in all likelihood during the Bronze Age, probably as the Indus Valley Civilization (IVC) was collapsing or, perhaps, just after it had collapsed.

But for quite a few years this was something of a taboo, even politically incorrect, narrative, and it was vehemently rubbished by many Indians, including Indian scientists, and their western academic sympathizers.

Indeed, a whole series of papers came out, often in high brow scientific journals, claiming that R1a originated in South Asia, and that it spread from there to Europe. This, it was also claimed, was the final nail in the coffin of the so called Aryan Invasion Theory (AIT), because R1a was often described as the "Aryan" haplogroup.

I wasn't impressed by any of this nonsense. I said so here and elsewhere, to the great annoyance of those who believed, against all reason and logic, that the Indo-Aryans, and even Indo-Europeans, were indigenous to India. Here's a taste of some of my work on the topic going back to 2013.

South Asian R1a in the 1000 Genomes Project

Children of the Divine Twins

The Poltavka outlier

Looking back, it's all a bit rough, but very cool nonetheless. However, I was often accused of being biased, unscientific and even bigoted and racist as a result of offering such commentary and research. Make no mistake, my detractors were seething that I would dare to question what was apparently a scientific reality, and they wanted to shut me up. It was a nasty experience, but it now feels great to be vindicated.

Certainly, nowadays, no objective person who, more or less, knows their stuff would argue that the vast majority of the R1a in India doesn't ultimately derive from the Pontic-Caspian steppe in Eastern Europe.

But otherwise things haven't changed all that much in the last few years. For instance, despite a whole heap of ancient DNA data being available from Eastern Europe and West Asia, there's a widely accepted idea that the Early Bronze Age (EBA) Yamnaya culture formed on the Pontic-Caspian steppe as a result of migrations from what is now Iran.

This is not true. It can't be true, because it's contradicted by all of the data. I've tried to explain this on several occasions, but generally to no avail.

Yamnaya =/= Eastern Hunter-Gatherers + Iran Chalcolithic

Another look at the genetic structure of Yamnaya

Likely Yamnaya incursion(s) into Northwestern Iran

Thus, the Yamnaya people and culture were indigenous to Eastern Europe, and basically formed as a result of the amalgamation of at least three different populations closely related to Eastern European Hunter-Gatherers (EHG), Caucasus Hunter-Gatherers (CHG), Early European Farmers (EEF) and Western European Hunter-Gatherers (WHG). They did not harbor any significant ancestry from what is now Iran; at least not from within any reasonable time frame.

However, me communicating this fact has resulted in some rather strange and unsavory reactions from a number of individuals who appear to have a big emotional investment in this issue. They become frustrated and even angry when I try to explain to them that there's no sense in looking for the genetic origins of Yamnaya in Iran, much like the people who argued with me when I tried to reason with them that R1a wasn't native to India. Here's an example from a recent blog post (for the full conversation scroll down to the comments here).


Heh, here we go again with the accusations of bias, scientific impropriety and whatnot. Ironically, the poor chap just couldn't comprehend that he never had an argument to begin with, quite obviously due to his own bias in regards to this topic. Well, at least he didn't call me a racist.

In a recent preprint, Wang et al. correctly characterized Yamnaya as, by and large, a mixture of populations closely related to EHG, CHG, EEF and WHG (see here), with no obvious input from what is now Iran. Sounds familiar, right?

They also discovered that, during the Chalcolithic and Bronze Age, the Caucasus and nearby steppes were mainly home to three quite distinct populations: 1) Steppe groups, including Eneolithic steppe and Caucasus Yamnaya, 2) Caucasus groups, including Kura-Araxes and Maykop, and 3) Steppe Maykop, which they classified as part of 1. These populations were all separated by clear genetic and cultural borders, with significant and unambiguous mixture from the Caucasus cluster only in a couple of Steppe Maykop outliers and one Yamnaya outlier from what is now Ukraine.

Clearly, this leaves no room for any migrations from what is now Iran to the steppe that would potentially give rise to Yamnaya. In other words, the main genetic ingredients for what was to become Yamnaya were already on the steppe well before Yamnaya, during the Eneolithic, and it's quite likely that they were indigenous to the region.

However, interestingly, Wang et al. did appear to try to save the link between Yamnaya and Iran by referring to the CHG-related ancestry in Yamnaya as "CHG/Iranian". I'm not surprised because most of these authors are associated with the Max Planck Institute for the Science of Human History (MPI-SHH), which is currently pushing a proposal that the Proto-Indo-European (PIE) homeland was located in what is now Iran and surrounds (see here). So, obviously, they need to somehow show a relationship between Yamnaya and Iran, because Yamnaya and the closely related Corded Ware archaeological complex are generally seen as early Indo-European cultural horizons. Good luck with that.

Actually, let me make it clear once and for all that I couldn't care less where the very first Indo-European words were uttered. It's just something that I find interesting. I rather doubt that this was within the borders of present-day Iran, and I explained in some detail why in a post almost two years ago (see here). But if someone manages to prove that the PIE homeland was indeed located partly or wholly within what is now Iran, that's OK. I won't be emotionally traumatized as a result.

However, obviously, this will have to be done with the assumption in mind that Yamnaya and Corded Ware became Indo-European-speaking almost purely via an linguistic transmission, with hardly any associated gene flow. It's possible, I guess. But then there's almost 200 years of scholarship based on linguistics and archaeological data that generally agrees in favor of the Pontic-Caspian steppe as the PIE homeland.

On a related note, I also couldn't care less whether the Aryan Invasion Theory (AIT) reflects what really happened during the Indo-Europeanization of South Asia, or if it's more appropriate to call it the Aryan Migration Theory (AMT). I'll accept whatever an objective analysis of all of the relevant data shows when we have enough of it to make an informed decision.

However, currently, I see nothing in the data that would prevent the AIT from being true. To me, the profound impact that the Bronze Age steppe peoples obviously had on South Asia, and especially on the Indo-European-speaking Indian upper castes, suggests that, overall, an invasion-like scenario is quite plausible. But I might be wrong, and so what if I am?

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Tuesday, June 12, 2018

Dali_EBA and West_Siberia_N in qpGraph


Below is a qpGraph tree that I've been working on for a while. I'll be posting more output from random analyses like this from now on. The relevant graph file is available in the zip folder here. Any ideas what else can be done with this topology?

See also...

Graeco-Aryan parallels

Friday, June 8, 2018

Of horses and men


Y-HT-1 is today by far the most common Y-chromosome haplogroup in domesticated horse breeds. According to Wutke et al. 2018, this is probably the result of artificial, human induced selection for this lineage, initially on the Eurasian steppe during the Iron Age, and then subsequently in Europe during the Roman period (see here).

However, during the Bronze and Iron Ages, before Y-HT-1 reached fixation, another very important Y-haplogroup in domesticated horses was its older sister clade Y-HT-4.

Indeed, it's likely that both Y-HT-1 and Y-HT-4 first dominated the domesticated horse gene pool during the Bronze Age, probably because they happened to have been present in the horse population exploited by the early Indo-Europeans. This was missed, or at least not directly discussed by Wutke et al., but I'd say it's a fairly obvious conclusion that can be drawn from their data, especially if we consider the fact that horses are the most important animal in the Indo-European pantheon.

Thus, the story of Y-HT-1 and, up to a point, Y-HT-4 is probably very similar to that of two human Y-haplogroups, R1a-M417 and R1b-M269. Both of these lineages also rose to prominence rather suddenly during the Eneolithic and Bronze Age, in all likelihood because they were present amongst early Indo-European-speaking males (see here).

Below is a map of the earliest reliably called and dated instances of Y-HT-1, Y-HT-4, R1a-M417 and R1b-M269 in the ancient DNA record. Not surprisingly, all of the points on the map are located on or very close to the Pontic-Caspian steppe, which is generally accepted to have been the Proto-Indo-European homeland. Fascinating stuff.


See also...

Central Asia as the PIE urheimat? Forget it

Cultural hitchhiking and competition between patrilineal kin groups may have led to the post-Neolithic Y-chromosome bottleneck (Zeng et al. 2018)

Was Ukraine_Eneolithic I6561 a Proto-Indo-European?

Thursday, May 31, 2018

What's Maykop (or Iran) got to do with it? #2


For the past few days I've been trying to copy and also improve on the qpGraph tree in the Wang et al. preprint (see here). I've managed to come up with a new version of my model that not only offers a better statistical fit, but, in my opinion, also a much more sensible solution. For instance, the Eastern Hunter-Gatherer node now shows 73% MA1-related admixture, which, I'd say, makes more sense than the 10% in the previous version. The relevant graph file is available here.


Samara Yamnaya can be perfectly substituted in this graph by early Corded Ware samples from the Baltic region (CWC_Baltic_early) and a pair of Yamnaya individuals from what is now Ukraine. This is hardly surprising, considering how similar all of these samples are to each other in other analyses, but it's nice to see nonetheless, because I think it helps to confirm the reliability of my model.



And yes, I have tested all sorts of other Yamnaya-related ancient and present-day populations with this tree. They usually pushed the worst Z score to +/- 3 and well beyond, probably because they weren't similar enough to Yamnaya. But, perhaps surprisingly, Bell Beakers from Britain produced a decent result (see here).

See also...

On the genetic prehistory of the Greater Caucasus (Wang et al. 2018 preprint)

Another look at the genetic structure of Yamnaya

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, May 25, 2018

Cultural hitchhiking and competition between patrilineal kin groups may have led to the post-Neolithic Y-chromosome bottleneck (Zeng et al. 2018)


A very interesting paper has just appeared at Nature Communications that potentially offers an explanation for the well documented explosions of certain Y-chromosome lineages in the Old World after the Neolithic, such as those that led to most European males today belonging to Y-haplogroups R1a and R1b (LINK). I might have more to say about this paper in the comments below after I've read it a couple of times. Emphasis is mine:

In human populations, changes in genetic variation are driven not only by genetic processes, but can also arise from cultural or social changes. An abrupt population bottleneck specific to human males has been inferred across several Old World (Africa, Europe, Asia) populations 5000–7000 BP. Here, bringing together anthropological theory, recent population genomic studies and mathematical models, we propose a sociocultural hypothesis, involving the formation of patrilineal kin groups and intergroup competition among these groups. Our analysis shows that this sociocultural hypothesis can explain the inference of a population bottleneck. We also show that our hypothesis is consistent with current findings from the archaeogenetics of Old World Eurasia, and is important for conceptions of cultural and social evolution in prehistory.

...

If the primary unit of sociopolitical competition is the patrilineal corporate kin group, deaths from intergroup competition, whether in feuds or open warfare, are not randomly distributed, but tend to cluster on the genealogical tree of males. In other words, cultural factors cause biases in the usually random process of transmission of Y-chromosomes, increasing the rate of loss of Y-chromosomal lineages and accelerating genetic drift. Extinction of whole patrilineal groups with common descent would translate to the loss of clades of Y-chromosomes. Furthermore, as success in intergroup competition is associated with group size, borne out empirically in wars [43] as ‘increasing returns at all scales’ [44], and as larger group size may even be associated with increased conflict initiation, borne out in data on feuds45, there may have been positive returns to lineage size. This would accelerate the loss of minor lineages and promote the spread of major ones, further increasing the speed of genetic drift.

In addition, the assimilation of women from groups that are disrupted or extirpated through intergroup competition into remaining groups is a common result of warfare in small-scale societies [46]. This, together with female exogamy, would tend to limit the impact of intergroup competition to Y-chromosomes.

...

Figure 6 shows a striking pattern of differences in shallowness of coalescence in samples from hunter-gatherer, farmer and pastoralist cultures. While hunter-gatherer Y-chromosomes from the same culture, and often the same sites, commonly divide into haplotypes that coalesce in multiple millennia, Y-chromosomes of samples from farmer and pastoralist cultures are more homogeneous and have more recent coalescences. The Bell Beaker culture has a high proportion of sampled males (81%) from a large geographical area (Iberia to Hungary) who belong to an identical Y-chromosomal haplogroup (R1b-S116), implying common descent from a kin group that existed quite recently. Some groups of males share even more recent descent, on the order of ten generations or fewer [64]. Such recent common descent may even be retained in cultural memory via oral genealogies, such as among descent groups in Northern and Western Africa, whose members can trace descent relationships up to three to four centuries before the generation currently living [40]. Likewise, from Germany to Estonia, the Y-chromosomes of all Corded Ware individuals sampled, except one, belong to a single clade within haplogroup R1a (R1a-M417) and appear to coalesce shortly before sample deposition.


Thus, groups of males in European post-Neolithic agropastoralist cultures appear to descend patrilineally from a comparatively smaller number of progenitors when compared to hunter gatherers, and this pattern is especially pronounced among pastoralists. Our hypothesis would predict that post-Neolithic societies, despite their larger population size, have difficulty retaining ancestral diversity of Y-chromosomes due to mechanisms that accelerate their genetic drift, which is certainly in accord with the data. The tendency of pastoralist cultures to show the lowest Y-chromosomal diversity and the shallowest coalescence would also be explained, as they may have experienced the social conditions that characterized cultures of the Central Asian steppes [42]. Indeed, the Corded Ware pastoralists may have been organized into segmentary lineages [65], an extremely common tribal system among pastoralist cultures, including those of historical Central Asia [66].
Citation...

Zeng et al., Cultural hitchhiking and competition between patrilineal kin groups explain the post-Neolithic Y-chromosome bottleneck, Nature Communicationsvolume 9, Article number: 2077 (2018) doi:10.1038/s41467-018-04375-6

Update 30/05/2018: For those clued in, here's an awesome quote from the relevant press release.

The outlines of that idea came to Tian Chen Zeng, a Stanford undergraduate in sociology, after spending hours reading blog posts that speculated - unconvincingly, Zeng thought - on the origins of the "Neolithic Y-chromosome bottleneck," as the event is known.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Thursday, May 24, 2018

What's Maykop (or Iran) got to do with it?


I had a go at imitating this qpGraph tree, from the recent Wang et al. preprint on the genetic prehistory of the Caucasus, using the ancient samples that were available to me. I'm very happy with the outcome, because everything makes good sense, more or less. The real populations and singleton individuals, ten in all, are marked in red. The rest of the labels refer to groups inferred from the data.


However, this is still a work in progress, and, if possible, I'd like simplify the model and also get the worst Z score much closer to zero. If anyone wants to help out, the graph file is available HERE. Feel free to post your own versions in the comments, and I'll run them for you as soon as I can.

Update 31/05/2018: I've managed to come up with a new version of my model that not only offers a better statistical fit, but, in my opinion, also a much more sensible solution. For instance, the Eastern Hunter-Gatherer node now shows 73% MA1-related admixture, which, I'd say, makes more sense than the 10% in the previous version. The relevant graph file is available here.


For more details and a discussion about the updated model, including additional trees with Baltic Corded Ware and British Beaker samples, please check out my new thread on the topic at the link below.

What's Maykop (or Iran) got to do with it? #2

Citation...

Wang et al., The genetic prehistory of the Greater Caucasus, bioRxiv, posted May 16, 2018, doi: https://doi.org/10.1101/322347

See also...

On the genetic prehistory of the Greater Caucasus (Wang et al. 2018 preprint)

Another look at the genetic structure of Yamnaya

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, May 23, 2018

More Botai genomes (Jeong et al. 2018 preprint)


Over at bioRxiv at this LINK. Actually, these may or may not be the same Botai genomes that have already been published along with Damgaard et al. 2018 (see comments below for the discussion about that). Here's the abstract. Emphasis is mine:

The indigenous populations of inner Eurasia, a huge geographic region covering the central Eurasian steppe and the northern Eurasian taiga and tundra, harbor tremendous diversity in their genes, cultures and languages. In this study, we report novel genome-wide data for 763 individuals from Armenia, Georgia, Kazakhstan, Moldova, Mongolia, Russia, Tajikistan, Ukraine, and Uzbekistan. We furthermore report genome-wide data of two Eneolithic individuals (~5,400 years before present) associated with the Botai culture in northern Kazakhstan. We find that inner Eurasian populations are structured into three distinct admixture clines stretching between various western and eastern Eurasian ancestries. This genetic separation is well mirrored by geography. The ancient Botai genomes suggest yet another layer of admixture in inner Eurasia that involves Mesolithic hunter-gatherers in Europe, the Upper Paleolithic southern Siberians and East Asians. Admixture modeling of ancient and modern populations suggests an overwriting of this ancient structure in the Altai-Sayan region by migrations of western steppe herders, but partial retaining of this ancient North Eurasian-related cline further to the North. Finally, the genetic structure of Caucasus populations highlights a role of the Caucasus Mountains as a barrier to gene flow and suggests a post-Neolithic gene flow into North Caucasus populations from the steppe.


Jeong et al., Characterizing the genetic history of admixture across inner Eurasia, Posted May 23, 2018, doi: https://doi.org/10.1101/327122

See also...

New PCA featuring Botai horse tamers, Hun and Saka warriors, and many more...

Global25 workshop 2: intra-European variation


Even though the Global25 focuses on world-wide human genetic diversity, it can also reveal a lot of information about genetic substructures within continental regions.

Several of the dimensions, for instance, reflect Balto-Slavic-specific genetic drift. I ensured that this would be the case by running a lot of Slavic groups in the analysis. A useful by-product of this strategy is that the Global25 is very good at exposing relatively recent intra-European genetic variation.

To see this for yourself, download the datasheet below and plug it into the PAST program, which is freely available here. Then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Ordination > Principal Components.

G25_Europe_scaled.dat

You should end up with the plot below. Note that to see the group labels and outlines, you need to tick the appropriate boxes in the panel to the right of the image. To improve the experience, it might also be useful to color-code different parts of Europe, and you can do that by choosing Edit > Row colors/symbols. Of course, if you have Global25 coordinates you can add yourself to the datasheet to see where you plot.


Components 1 and 2 pack the most information and, more or less, recapitulate the geographic structure of Europe. However, many details can only be seen by plotting the less significant components. For instance, a plot of components 1 and 3 almost perfectly separates Northeastern Europe into two distinct clusters made up of the speakers of Indo-European and Finno-Ugric languages.


This plot might also be useful for exploring potential Jewish ancestry, because Ashkenazi, Italian and Sephardi Jews appear to be relatively distinct in this space. Thus, people with significant European Jewish ancestry will "pull" towards the lower left corner of the plot. For example, someone who is half Ashkenazi and half German will probably land in the empty space between the Northwest Europeans and Jews.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 PAST-compatible datasheets

Monday, May 21, 2018

Global25 workshop 1: that classic West Eurasian plot


In this Global25 workshop I'm going to show how to reproduce, more or less, that classic plot of West Eurasian genetic diversity seen regularly in ancient DNA papers and at this blog (for instance, here). To do this you'll need the datasheet below, which I'll be updating regularly, and the PAST program, which is freely available here.

G25_West_Eurasia_scaled.dat

This is what you'll get if you follow my instructions to the letter. Note the fairly strong correlation with geography. I think this is impressive for so many reasons.


OK, so, download the said datasheet, plug it into PAST, select columns 1 to 8, and go to Multivariate > Ordination > Principal Components. Here's a screen cap of me doing it:


The initial output won't resemble my plot above. So you'll need to place PC2 on the X axis, PC1 on the Y axis, and set the image size to 1206x706. After doing that, you should end up with exactly this:


Then, export the image, flip it horizontally with whatever imaging software that can do the job, and that's it, unless you want to add some labels like I did. Feel free to ask questions and make suggestions in the comments below.

See also...

Global25 workshop 2: intra-European variation

Global25 PAST-compatible datasheets

Saturday, May 19, 2018

Global25 PAST-compatible datasheets


I'm planning to run regular workshops over the next few months on how to get the most out of Global25 data with various programs, and expecially PAST (see here). So if you have Global25 coordinates, please stay tuned.

To that end, I've put together four color-coded, PAST-compatible Global25 datasheets with thousands of present-day and ancient samples, available at the links below:

Global_25_PCA.dat

Global_25_PCA_pop_averages.dat

Global_25_PCA_scaled.dat

Global_25_PCA_pop_averages_scaled.dat

PAST is an awesome little statistical program and simple to use. The manual is available here. To kick things off, here's a quick guide how to run a Neighbor Joining tree on your Global25 coordinates:

- download the Global_25_PCA_pop_averages_scaled.dat from the last link above

- open the dat file with something a little more advanced than Windows notepad, like, say, TextPad (see here)

- stick your scaled coordinates at the bottom of the sheet, so that they look exactly like those of the other samples, except give yourself an original symbol, like, say, a black star

- open the edited dat file with PAST and choose all of the columns and rows by clicking the empty tab above the labels

- then, at the top, go to Multivariate > Clustering > Neighbor joining

After a few seconds you should see a nice, color-coded tree like the one below, except you'll also be on it, in black text. I'm very happy with these results, by the way. As far as I can see, all of the populations and individuals cluster exactly where they should.


Those of you who are already very proficient in using PAST, feel free to go nuts with these new datasheets and show us the results in the comments below. I'll try to put together a workshop for beginners within the next couple of weeks.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation