In the debate over the location of the Proto-Indo-European urheimat, Colin Renfrew's Anatolian hypothesis is usually mentioned as the most viable alternative to the steppe or Kurgan hypothesis. But probably not for very much longer.
Below is a Principal Component Analysis (PCA) featuring extant Indo-European and non-Indo-European groups from West Eurasia, a couple of typical early Neolithic farmers from Central Europe, a typical Western Hunter-Gatherer, also from Central Europe, and the Iceman from the Copper Age Tyrolean Alps, again typical of his time and place.*
It's just a taste of the ancient genomic data we have available from prehistoric Europe, but it has almost everything that is pertinent to the issue at hand.
You don't need to be familiar with PCA methodology to be able to read the plot. Basically, it shows that the present-day European population structure is the result of two main events:
- the arrival of early farmers from Anatolia during the Neolithic transition, which eventually caused the extinction of people like the Western Hunter-Gatherer, who is the most obvious outlier on the plot
- the expansion of Kurgan groups such as the Yamnaya, which led to the formation of the Corded Ware horizon across much of Europe and shifted the genetic structure of almost all Europeans to the east, away from the Neolithic and Copper Age samples.
These were massive population turnovers, and, as a rule, massive population turnovers are accompanied by language change. So it's highly unlikely that any Europeans today are speaking languages derived from those of the Western Hunter-Gatherers or early Neolithic farmers of Central Europe (ie. according to Renfrew the ancestors of Celts, Germanics and other Indo-Europeans). Moreover, consider this:
- most present-day Indo-European speaking Europeans form an elongated cluster between the Neolithic farmers and the Corded Ware sample, pointing to the steppe-derived Corded Ware Culture as the proximate agent of the Indo-European expansion in much of Europe
- the only present-day Europeans who closely resemble Neolithic farmers are some Sardinians (the small Romance cluster just above the two Neolithic samples), but Sardinians spoke Paleo-Sardinian or Nuragic languages until they adopted Indo-European speech, in the form of Latin, from the Romans (see page 118 here).
Also, this isn't shown on the plot, but the dominant Y-chromosome haplogroup of early Neolithic farmers is G2a, which is a low frequency marker in Europe today. The two most common Y-chromosome haplogroups among present-day Europeans are R-M198 and R-M269, which are also typical of Corded Ware and Yamnaya males, respectively, and probably originally from the steppe.
So is there any way to rework the Anatolian hypothesis so that it can be salvaged? I doubt it. Even making the steppe a homeland for all of the main Indo-European branches apart from Anatolian and Armenian probably won't help.
It is true that the Yamnaya nomads carried Near Eastern-related ancestry which may represent Proto-Indo-European admixture from outside of the steppe. But there's no evidence that it came from Anatolia.
In fact, if Neolithic Anatolians were basically identical to early Neolithic European farmers, which seems to be the case (see here and here), then it's unlikely that it did, because the latter carried a peculiar genome-wide signal that is missing in Yamnaya genomes (orange cluster in the ADMIXTURE bar graph below).** Heck, even the early Corded Ware genomes from Germany barely show any of it.
The Indo-European Controversy: Facts and Fallacies in Historical Linguistics. I haven't read it yet, so I welcome the opinions here of those who have. I did, however, read a lot of the online articles on which the book is based. As far as I know most of them are still available here and here.
*Another version of the same PCA, with the samples labeled individually, is available here. All possible combinations of dimensions 1 to 4 are shown here. The samples are listed here. All of the samples are from Haak et al. and Allentoft et al. The PCA was run using ~56K high confidence SNPs listed here.
The Corded Ware sample is a composite of Corded Ware sequences from Germany, Scandinavia, Estonia and Poland. The Yamnaya sample is a composite of Yamnaya sequences from the Kalmykia and Samara regions of Russia.
I chose to use these composites instead of individual sequences because I didn't want to run any samples with genotype rates of less than 98%.
** For a more detailed ADMIXTURE analysis comparing early Neolithic farmers to Yamnaya refer to Haak et al. Supplementary Information 6. Note the minimal sharing of components at the higher K between the early Neolithic farmers and Yamnaya, especially at K=16, which has the lowest median cross-validation (CV) error. This is in agreement with the PCA above.
Population genomics of Early Bronze Age Europe in three simple graphs