Hello authors, Thanks for the interesting preprint and data. However, I'd like to see you address a couple of technical issues and perhaps one theoretical issue in the final manuscript: - the output you posted shows some unusual results, which are potentially false positives that appear to be concentrated among the shotgun and noUDG samples. I'm guessing that this is due to the same types of ancient DNA damage creating IBD-like patterns in these samples. If so, isn't there a risk that many or even most of the individuals in your analysis are affected by this problem to some degree, which might be skewing your estimates of genealogical relatedness between them? - many individuals from groups that have experienced founder effects, such as Ashkenazi Jews, appear to be close genetic cousins, even though they're not genealogical cousins. Basically, the reason for this is reduced haplotype diversity in such populations. Have you considered the possibility that at least some of the close relationships that you're seeing between individuals and populations might be exaggerated by founder effects? - thanks to ancient DNA we've learned that the Yamnaya phenomenon isn't just an archeological horizon, but also a closely related and genetically very similar group of people. Indeed, in my mind, ancient DNA has helped to redefine the Yamnaya concept, with Y-chromosome haplogroup R1b-Z2103 now being one of the key traits of the Yamnaya identity. So considering that the Corded Ware people are not rich in R1b-Z2103, and even the earliest Corded Ware individuals are somewhat different from the Yamnaya people in terms of genome-wide genetic structure, it doesn't seem right to keep claiming that the Corded Ware population is derived from Yamnaya. I can't see anything in your IBD data that would preclude the idea that the Corded Ware and Yamnaya peoples were different populations derived from the same as yet unsampled pre-Yamnaya/post-Sredny steppe group.See also... Dear Harald #2 On the origin of the Corded Ware people
search this blog
Showing posts with label Nick Patterson. Show all posts
Showing posts with label Nick Patterson. Show all posts
Saturday, April 8, 2023
Dear Harald...
I've started analyzing the Identity-by-Descent (IBD) data from the recent Ringbauer et al. preprint (see here). Unfortunately, it'll take me a few weeks to do this properly, so I won't be able to write anything detailed on the topic for a while.
Meantime, this is the comment that I left for the authors at bioRxiv (at this time it's still being approved, but it should appear there within a day or so, possibly along with a reply from the authors):
Labels:
ancIBD,
ancient ancestry,
ancient DNA,
Corded Ware Culture,
CWC,
David Reich,
Eastern Europe,
haplotype,
Harald Ringbauer,
IBD,
Identity-by-Descent,
Nick Patterson,
Pontic-Caspian steppe,
Yamnaya
Monday, February 13, 2023
Dear David, Nick, Iosif...let me tell you about Yamnaya
Lazaridis, Alpaslan-Roodenberg et al. recently claimed that the Yamnaya people of the Pontic-Caspian (PC) steppe carried "substantial" ancestry from what is now Armenia or surrounds.
However, this claim is essentially false.
Only one individual associated with the Yamnaya culture shows an unambiguous signal of such ancestry. This is a female usually labeled Ukraine_Yamnaya_Ozera_o:I1917. The "o" suffix indicates that she is an outlier from the main Yamnaya genetic cluster.
Unlike I1917, typical Yamnaya individuals carry a few per cent of ancient European farmer admixture. This ancestry is only very distantly Armenian-related via Neolithic Anatolia (see here).
It's difficult for me to understand how Lazaridis, Alpaslan-Roodenberg et al. missed this. I suspect that they relied too heavily on formal statistics and overinterpreted their results.
Formal statistics are a very useful tool in ancient DNA work. Unfortunately, they're also a relatively blunt tool that often has problems distinguishing between similar sources of gene flow.
There are arguably better methods for studying fine scale ancestry, such as Principal Component Analysis (PCA).
Below is a somewhat special PCA featuring a wide range of ancient populations that plausibly might be relevant to the genetic origins of the Yamnaya people. Unlike most PCA with ancient samples, this PCA doesn't rely on any sort of projection, so that all of the actors are interacting with each other and directly affecting the outcome.
Here's another version of the same plot with a less complicated labeling system. Note that I designed this PCA specifically to differentiate between European populations and those from the Armenian highlands, the Iranian plateau and surrounds.
And here's a close up of the part of the plot that shows the Yamnaya cluster. This cluster is made up of samples associated with the Afanasievo, Catacomb, Poltavka and Yamnaya cultures. All of the individuals in this part of the plot are closely related, which is why they're so tightly packed together. The differentiation between them is caused by admixture from different groups mostly from outside of the PC steppe.
The Yamnaya cluster can be broadly characterized as a population that formed along the genetic continuum between the Eneolithic groups of the Progress region and Neolithic foragers from the Dnieper River valley (Progress_Eneolithic and Ukraine_N, respectively). However, this cluster also shows a slight western shift that is increasingly more pronounced in the Corded Ware samples. This shift is due to the aforementioned admixture from early European farmers.
Indeed, the plot reveals two parallel clines extending west from the Progress samples. One of the clines is made up of the Yamnaya cluster and the Corded Ware samples, and pulls towards the ancient European farmers. The other cline includes Ukraine_Yamnaya_Ozera_o:I1917 and pulls towards samples from the Armenian highlands and surrounds.
Being aware of these two clines and knowing how they came about is important to understanding the genetic prehistory of the PC steppe and indeed of much of Eurasia.
At some point, probably during the late Eneolithic, a Progress-related group experienced gene flow from the west and became the Yamnaya and Corded Ware populations. Sporadically, admixture from the Armenian highlands and the Iranian plateau also entered the PC steppe, giving rise to people like the Steppe Maykop outliers and Ukraine_Yamnaya_Ozera_o:I1917.
Unfortunately, this sort of PCA doesn't offer output suitable for mixture modeling, basically because the recent genetic drift shared by many of the samples creates significant noise.
However, to check that my inferences based on the plot are correct I can create composites with specific ancestry proportions to see how they behave. In the plot below Mix1 is 80% Progress_Eneolithic and 20% Iran_Hajji_Firuz_N, Mix2 is 80% Progress_Eneolithic and 20% Armenia_EBA_Kura_Araxes, while Mix3 is 80% Progress_Eneolithic, 15% Ukraine_N and 5% Hungary_MN_Vinca (Middle Neolithic farmers from the Carpathian Basin).
Obviously, we can't get Yamnaya by mixing Progress_Eneolithic with any ancients from the Armenian highlands or the Iranian plateau. On the other hand, Mix3 works quite well, at least in the first two dimensions. In some of the other dimensions genetic drift specific to Ukraine_N pulls it away from the Yamnaya cluster, but this is to be expected.
By the way, the plots were created with the excellent Vahaduo Custom PCA tool freely available here. It's well worth trying the interactive 3D option using my PCA data. The relevant datasheet is available here.
See also...
Dear David, Nick, Iosif...let's set the record straight
The Caucasus is a semipermeable barrier to gene flow
Friday, January 13, 2023
Dear David, Nick, Iosif...let's set the record straight
Almost a decade ago scientists at the David Reich Lab extracted DNA from the remains of three men from the Khvalynsk II cemetery at the northern end of the Pontic-Caspian (PC) steppe.
These Eneolithic Eastern Europeans showed significant genetic heterogeneity, with highly variable levels of Eastern Hunter-Gatherer (EHG) and Near Eastern-related ancestry components.
As a result, the people at the David Reich Lab concluded that the Eneolithic populations of the PC steppe formed from a relatively recent admixture between local hunter-gatherers and Near Eastern migrants.
Unfortunately, this view has since become the consensus among scientists working with ancient DNA.
I say unfortunately because there's a more straightforward and indeed obvious explanation for the genetic heterogeneity among the samples from Khvalynsk II. It's also the only correct explanation, and it doesn't involve any recent gene flow from the Near East.
Here it is, in point form, as simply as I can put it:
- EHG is best represented by samples from Karelia and Lebyazhinka, which are modern-day Russian localities in the forest zone and on the border between the steppe and the forest-steppe, respectively - Khvalynsk II is also located on the boundary between the steppe and the forest-steppe, and very far from the Near East - so the genetic structure of the people buried at Khvalynsk II does represent an admixture event - however, this admixture event simply involved an EHG population from the forest-steppe and a very distantly Near Eastern-related group native to the steppe (that is, two different Eastern European populations).I've written this blog post because I think David Reich, Nick Patterson, Iosif Lazaridis and colleagues should finally admit that they didn't quite get this right. And it'd be nice if they could put out a paper sometime soon in which they set the record straight. See also...
Thursday, December 23, 2021
When did Celtic languages arrive in Britain?
A new paper at Nature by Patterson et al. argues that Celtic languages spread into Britain during the Bronze Age rather than the Iron Age [LINK]. This argument is based on the observation that there was a large-scale shift in deep ancestry proportions in Britain during the Bronze Age.
In particular, the ratio of Early European Farmer (EEF) ancestry increased significantly in what is now England during the Late Bronze Age (LBA). On the other hand, the English Iron Age was a much more stable period in this context.
I don't have any strong opinions about the spread of Celtic languages into Britain, and Patterson et al. might well be correct, but their argument is potentially flawed because:
- significant population shifts need not result in any noticeable changes in ancient ancestry proportions - ancient ancestry proportions can shift without significant migrations from afar due to cryptic population substructures - large-scale population shifts need not result in langage shifts, especially if they're gradual - small-scale population shifts can result in language shifts, especially if they're sudden.Indeed, when I plot some of the key ancient samples from the paper in my ultra fine scale Principal Component Analyses (PCA) of Northern and Western Europe, it appears that it's only the Early Iron Age (EIA) population from England that overlaps significantly with a roughly contemporaneous group from nearby Celtic-speaking continental Europe. The relevant PCA data are available here and here, respectively. See also... Celtic vs Germanic Europe Avalon vs Valhalla revisited R1a vs R1b in third millennium BCE central Europe
Monday, June 28, 2021
The PIE homeland controversy: June 2021 status report
Archeologist David Anthony has made several appearances online recently to promote his theories about the origins of the Corded Ware and Yamnaya cultures and peoples.
In a clip on Youtube he reiterated his theory that the so called Iranian-related ancestry in the Yamnaya people actually came from what is now Iran, and, more precisely, that it was carried by hunter-gatherers who travelled relatively rapidly from the South Caspian region into the Volga Delta in what is now Russia.
It's still a complete mystery to me as to why a group of hunter-gatherers from the South Caspian would undertake such a migration, instead of, say, expanding their range gradually over thousands of years, first into the Caucasus and eventually into Eastern Europe.
But there's a more serious problem with Anthony's theory: it contradicts the currently available ancient DNA. That's because the so called Iranian-related ancestry in the Yamnaya people is most closely related to the Kotias and Satsurblia hunter-gatherers from what is now Georgia, and these hunter-gatherers form a separate clade from the earliest samples from what is now Iran. For instance, see here and here.
Also, in a podcast on Razib's blog, Anthony doubled down on his theory that Y-chromosome haplogroup R1a was closely associated with Yamnaya plebs who were excluded from Kurgan burials, and, as a result, their remains haven't yet been sampled.
At least this theory isn't yet contradicted by ancient DNA, but it's more complicated and less parsimonious than my theory, which posits that R1a, or rather R1a-M417, was simply a very rare lineage in the Yamnaya population, and that it only became a common and widespread marker thanks to the Corded Ware expansion (see here).
Intriguingly, my understanding is that there are several unpublished R1a samples from the Caspian and Volga steppes at Harvard's David Reich Lab that have been classified by its scientists as Yamnaya outliers. Of course, Anthony is collaborating on at least one major paper with this lab (see here).
Ergo, I strongly suspect that Anthony's theory is in part based on these Yamnaya outliers. However, I also believe that these samples are wrongly dated and probably represent Scythians and/or Sarmatians. I'll be able to look into that if they're ever published.
Speaking of the David Reich Lab, its leading scientists, David Reich and Nick Patterson, have also made appearances online recently, on Youtube and Razib's blog, respectively, to reveal that the Corded Ware and Yamnaya peoples aren't just very similar genetically, but in fact close cousins.
This is a very interesting finding. Apparently it's based on a relatively high level of Identity-by-Descent (IBD) segment sharing between Corded Ware and Yamnaya samples, but that's all I know. I'm guessing that the relevant paper is coming soon (that is, within the next five years).
However, the long-standing question that the readers of this blog want to see answered is not whether the Corded Ware and Yamnaya peoples are close cousins, but whether Yamnaya migrants founded the Corded Ware culture. The obvious way to prove that they did is to find at least one ancient population unambiguously classified as part of the Yamnaya horizon that is rich in the typically Corded Ware Y-haplogroups R1a-M417 and R1b-L151.
See also...
On the origin of the Corded Ware people
The PIE homeland controversy: January 2019 status report
The PIE homeland controversy: August 2019 status report
Labels:
ancient DNA,
Corded Ware,
CWC,
David Anthony,
David Reich,
David Reich Lab,
Eastern Europe,
Indo-European,
Iran,
Nick Patterson,
Proto-Indo-European,
R1a-M417,
R1b-L151,
R1b-L51,
Yamna,
Yamnaya
Saturday, June 27, 2020
Major updates to ADMIXTOOLS
An important message from Nick Patterson:
Dear Eurogenes bloggers, Many of you use ADMIXTOOLS and you might like to know that there is a new release on github [LINK] with some important enhancements. From the README *** NEW *** 1) Version 7.0 has numerous upgrades. a) Two new executables --qpfstats qpfmv allow precomputation of f-statistic basis. This can greatly reduce computation costs. b) qpAdm, qpWave, qpGraph support qpfstats output as input. *** This is a much improved way of running with allsnps: YES. *** c) A new experimental feature of qpGraph (halfscore: YES) allows comparison of 2 phylogenies + a (weak) goodness of fit score. Be careful if running with a large number of populations and consider reducing block size say blgsize: .005 2) Note that several of the new ideas implemented in version 7.0 were developed collaboratively with Robert Maier, who has implemented them along with the great majority of other ADMIXTOOLS functionality in R: See https://github.com/uqrmaie1/admixtools Executables run fast, and it has features not available in this C version, such as interactive exploration of graph phylogenies. A manuscript describing the algorithmic ideas and providing documentation of the methods is in preparation. qpfstats is the most important new executable. This estimates f-statistics and covariance on a basis. a) This can be passed into other programs of the package without having to reaccess the genotype files, greatly speeding the computations. b) In allsnps: YES mode a new computation is carried out (explained in qpfs.pdf) that is much more logical when there is a lot of missing data. Sometimes standard errors are greatly reduced. qpfstats can be used with up to 30 populations. Much beyond that the output files become large. As usual there may be bugs... Nick Patterson 6/27/2020Update 29/06/2020: As pointed out above, qpfstats is the most important new executable. Indeed, Nick Patterson now recommendeds that qpAdm analyses run with the allsnps: YES flag should be based on qpfstats output. Several of my recent blog posts featured qpAdm models run with the allsnps: YES flag, but they were based on genotype data because obviously I didn't know anything about qpfstats at the time. So I went back and ran some of these models again, just to make sure that they were still relevant. Below are three examples which you can compare to the original analyses here, here and here, respectively.
TUR_Arslantepe_LC_Maykop RUS_Maykop_Novosvobodnaya 0.281±0.042 TUR_Arslantepe_LC 0.719±0.042 chisq 10.923 tail prob 0.449752 Full output TUR_Barcin_C RUS_Vonyuchka_En 0.137±0.031 TUR_Buyukkaya_EC 0.863±0.031 chisq 15.074 tail prob 0.0889099 Full output UKR_N_admixed RUS_Progress_En 0.083±0.020 UKR_N 0.917±0.020 chisq 6.825 tail prob 0.65538 Full outputAs far as I can tell, they're very similar to the original runs, which is a relief, because it means that the conclusions in my blog posts still make sense.
Subscribe to:
Posts (Atom)