We're probably not too far away from unlocking the secrets of how Europe was populated in prehistoric times, mostly thanks to huge advances in ancient DNA technology. So I thought I'd throw my hat in the ring just before that happens, and try and predict where two of Europe's major Y-DNA haplogroups originated, and how they spread across the continent. Obviously, I'm referring here to R1a and R1b. But instead of just writing an opinion piece, I've prepared an analysis that uses autosomal SNPs for the job.
After many months of playing around with a wide variety of samples from across Europe, I've noticed some interesting things about European genetic substructures. One of the most important, I believe, is the very specific difference between Western European groups high in R1b, and Northeastern European groups high in R1a. They're actually very similar at genome-wide autosomal level, but like I say, there is an important dissimilarity, which seems to mimic the stark contrast in R1b vs. R1a frequencies in these populations. Basically, they apparently show different influences from Asia in many ADMIXTURE analyses.
Generally speaking, Western Europeans come out more Caucasus, Middle Eastern and Mediterranean, while Northeastern Europeans seemingly more South Central Asian (as opposed to South Asian). This can't be a coincidence, especially as the results are somewhat out of whack with geography. For instance, Western Europeans appear to share more in common with Caucasus groups than Eastern Europeans do, despite the fact that the latter are much closer to the Caucasus Mountains. Below is an example of what I mean, featuring 890 samples in a K=9 West Eurasian ADMIXTURE run. Note the small numbers of European samples, which was a deliberate ploy to prevent them from triggering more specific regional clusters, and thus possibly concealing various admixtures.

Key: Red = Middle Eastern, Orange = Northeast African, Light Green = East Central Asian, Green = ASI-like, Aqua Green = ANI-like, Aqua Blue = East Asian, Blue = Sub-Saharan African, Purple = North/Central/East European, Pink = Caucasus. See spreadsheet for details.

For those who aren't familiar with the terms ASI and ANI, they stand for Ancestral South Indian and Ancestral North Indian respectively, and were coined by Reich et al. a few years ago (see here). Thus, my ASI-like cluster describes a set of allele frequencies associated with South Asian ancestry, while the ANI-like cluster is basically referring to various intrusive West Eurasian influences in South Asia. The latter peaks in the Pathans from the HGDP, but based on further tests, it's roughly 90% West Asian and 10% Northeast European. For instance, I made a small number of synthetic samples from the ANI-like cluster, and then ran another ADMIXTURE test with one of these individuals to see how he scored. The results of this K=6 run are in the spreadsheet here.
I think this means the ANI-like cluster and R1a originally came from West Asia – but from the Near East rather than from the Caucasus. Indeed, I believe they more or less spread together to Europe, the Caucasus and South Asia during the Neolithic. Further population movements from the west to the east, including possibly the Indo-European expansion from Europe, then strengthened the genetic relationships between these regions. The map below shows in more detail my suppositions. However, to keep things reasonably tidy, I left out the Turkic migrations of the early middle ages and later, which most likely carried R1a-Z93 from Central Asia to the North Caucasus and present day Ukraine. Also, please note the multiple entry points of implied migrations from west to east into India, all of which probably had a role in the make-up of the modern ANI-like cluster.

Now, here's a map showing the expansions of R1b, as well as the pink “Caucasus” and red “Middle Eastern” components. For the sake of good taste, I didn't include any pink arrows. Obviously, the theorized point of origin of R1b is very close to that of R1a - both are very rough estimates. But I suspect that, for one reason or another, the former set off on its European journey much later (and this is important, because genetic substructures are created by both distance and time). By the way, Africa isn't really my forte, so I declined to show the potential trail of R1b into Cameroon, where it reaches very high frequencies.

So that's pretty much it. I guess we'll soon see if I'm right. In any case, I think the scenarios outlined here explain fairly well a number of facts, which have come to light in recent years. These include:
- The discovery of Bronze and Iron Age mummies in South Siberia and the Tarim Basin carrying R1a, European-specific mtDNA lineages, and Central European-like cranial characteristics.
- The lack of any R1b in these mummies. Also the lack of any apparent associations between R1b and prehistoric Eastern European, and potentially proto-Indo-European, cultural horizons, like the Corded Ware and Yamnaya.
- The presence near the Altai of people with significant European (as opposed to West Asian) autosomal admixture (see here).
- High variance of Indian R1a in terms of STRs, but low variance in terms of SNPs, and the lack of the paragroup R1a* anywhere on the sub-continent in samples taken to date.
- The scarcity of R1b in India.
- The high frequencies of R1a-Z93 in West, Central and South Asia as a total of R1a, but its almost complete absence in ethnic Eastern Europeans (ie. not of Turkic or Ashkenazim descent).
- Very limited direct genetic contacts between populations of the North Caucasus and Eastern Europe.
- Maximum peaks in European-specific and North European-specific ADMIXTURE clusters in Northeastern Europeans – especially in groups from around the Eastern Baltic.
Also, I've been involved in some online discussions of late about whether or not the Indo-Iranian speaking Kalash carry Northeast European admixture. I think they do, and I included a Kalash individual from the HGDP in my K=9 analysis above to prove the point (ID code HGDP00302). The reason I used only one sample from this group, was to avoid creating a Kalash-specific composite cluster, which would very likely hide almost all admixtures. This tends to happen a lot with the Kalash, and it's probably due to genetic isolation and drift. Anyway, this individual scored 4% in the North/Central/East European cluster, and also 58% in the ANI-like cluster, which, as mentioned above, is most likely of partly European origin. I also had the aforementioned Kalash tested for Northeast European-specific segments with LAMP, and a solid signal of admixture was located on chromosome one, from rs10429857 (bp. 33315639) to rs9960 (bp. 43090380).
Admittedly, that's not much, but there's definitely something there, and it shows up in several Pakistani groups. What I find interesting is that many Northwest Indians score more of such European admixture than the Pakistanis. I have no idea why that's the case, but it's not just showing up in my analyses. I think it's high time for someone to take it upon themselves to study this issue in a formal way. I'm surprised it hasn't happened yet, because it looks to be a really fascinating phenomenon, with huge implications for the history of South Asia and beyond. I hope politics isn't the reason for the lack of interest.
Here are the unsupervised ADMIXTURE results of all the 23 Kalash individuals from:
ReplyDeletehttp://dodecad.blogspot.com/2011/10/origin-of-kalash-inferred-with.html (UPDATE II)
922 HGDP00279 0.007 0.081 0.361 0 0 0.031 0.521
911 HGDP00307 0.004 0.059 0.336 0 0 0.018 0.583
916 HGDP00315 0.019 0.036 0.338 0 0 0.000 0.606
908 HGDP00302 0.006 0.030 0.337 0 0 0.020 0.608
925 HGDP00311 0.014 0.029 0.325 0 0 0.021 0.611
913 HGDP00285 0.000 0.027 0.319 0 0 0.019 0.635
912 HGDP00333 0.000 0.020 0.324 0 0 0.018 0.638
910 HGDP00277 0.000 0.016 0.334 0 0 0.021 0.630
905 HGDP00298 0.012 0.016 0.325 0 0 0.016 0.631
924 HGDP00281 0.011 0.015 0.332 0 0 0.010 0.633
920 HGDP00304 0.007 0.012 0.329 0 0 0.013 0.638
903 HGDP00290 0.007 0.010 0.325 0 0 0.021 0.637
919 HGDP00274 0.007 0.004 0.341 0 0 0.013 0.635
923 HGDP00309 0.007 0.000 0.317 0 0 0.019 0.656
921 HGDP00330 0.000 0.000 0.335 0 0 0.026 0.639
918 HGDP00319 0.011 0.000 0.328 0 0 0.010 0.651
917 HGDP00288 0.004 0.000 0.339 0 0 0.013 0.644
915 HGDP00286 0.000 0.000 0.329 0 0 0.018 0.653
914 HGDP00313 0.000 0.000 0.351 0 0 0.015 0.634
909 HGDP00328 0.000 0.000 0.310 0 0 0.023 0.667
907 HGDP00267 0.000 0.000 0.332 0 0 0.022 0.647
906 HGDP00326 0.000 0.000 0.307 0 0 0.030 0.663
904 HGDP00323 0.002 0.000 0.304 0 0 0.013 0.680
Individual HGDP00302 has 3% Atlantic_Baltic, which is reasonably close to the value obtained here (4%). He is 4th of 23 in terms of the Atlantic_Baltic component, with the average Kalash having 1.5% (sd=2.1%, median=1%).
David,
ReplyDeleteGlad you took a stab at this. So basically R1a is from Mesopotamia and R1b is from the Levant?
Do you see either of these groups as being the bearers of Indo-European, or both of them? As Indo-European contains many agricultural words with Semitic cognates, it certainly has to be somewhere to where Semitic was spoken.
What about R1b crossing North Africa or the Southern Mediterranean and reaching Iberia? AFAIK there is R1a in Egypt, but very little in Western Iran? There is some R1a in the Gulf States and Oman, so yes that's a possible source to India at an earlier time.
Interesting, so you think R-M420 and R-M343 both originated in the Fertile Crescent and their subclade variation subsequently diversified rapidly in Europe and Asia? That makes sense, blonde abd red hair, blue and green eyes are still found in some Assyrians with basically no recent north European admixture; implying the northern Middle East as a parent population to the proto-Indo-Europeans, so the Fertile Crescent is a good bet where R1a and R1b might have originated.
ReplyDeleteDo you know the R1a migration map from Anatole Klyosov: http://secher.bernard.free.fr/DNA/R1a_migrations.jpeg ?
ReplyDeleteWhat do you think about it. Klyosov shares with you the idea of R1a migration from west to east.
Hi,
ReplyDeleteYes, I think R1a1a moved in a major way from Central Europe across Eastern Europe and past the Urals. This probably happened during the Chalcolithic and Bronze Age.
But I don't think R1a originated in Siberia. It most likely comes from the Middle East.