search this blog

Friday, February 2, 2018

Early Baltic Corded Ware form a genetic clade with Yamnaya, but...

This is what Mittnik et al. 2018 say about a couple of their Corded Ware, or Baltic Late Neolithic (Baltic_LN), samples from what is now Lithuania:

Computing D-statistics for each individual of the form D(Baltic LN, Yamnaya; X, Mbuti), we find that the two individuals from the early phase of the LN (Plinkaigalis242 and Gyvakarai1, dating to ca. 3200–2600 calBCE) form a clade with Yamnaya (Supplementary Table 7), consistent with the absence of the farmer-associated component in ADMIXTURE (Fig. 2b). Younger individuals share more alleles with Anatolian and European farmers (Supplementary Table 7) as also observed in contemporaneous Central European CWC individuals [2].

We can add a third early Baltic Corded Ware sample, Latvia_LN1, to this list, because this individual was also shown to lack the above mentioned farmer-associated component in ADMIXTURE by Jones et al. 2017.

However, in my Principal Component Analysis (PCA) of ancient West Eurasia, all three samples fall just "northwest" of Yamnaya, along with one German Corded Ware outlier, and form a separate cluster that is shifted slightly closer to European hunter-gatherers and farmers. Hence, Plinkaigalis242 and Gyvakarai1 only form a clade with Yamnaya to the limit of the resolution in the analysis by Mittnik et al., but aren't exactly identical to Yamnaya. The relevant datasheet is available here.

So what might this mean? Possibly that the ancestors of this Corded Ware trio "absorbed" trace forager and/or farmer admixture as they migrated from the Pontic-Caspian steppe to the East Baltic. Or it could mean that they came from a more westerly part of the Pontic-Caspian steppe where people harbored slightly elevated forager and/or farmer ancestry relative to Yamnaya.

More sampling of Eneolithic and Early Bronze Age (EBA) burial sites on the Pontic-Caspian steppe, particularly north of the Black Sea, will probably solve this mystery. Please note, however, that we already have an Eneolithic sample from the Pontic-Caspian steppe that not only packs extra farmer admixture over Yamnaya, but also belongs to Y-haplogroup R1a-M417, which is a marker intimately associated with the Corded Ware expansion (see here).

By the way, this is how the Corded Ware set from Mittnik et al. behaves in another of my PCA, which is designed to focus on entho-linguistic-specific genetic drift in Northern Europe. I don't usually run samples older than the Bronze Age in this analysis, the reason being that they often don't share enough genetic drift with modern-day Europeans to produce meaningful output. And to be honest, I'm not quite sure what to make of these results. But it's probably not a coincidence that the Scandinavian Corded Ware (CWC_Battle_Axe) individual clusters so strongly with the Nordic Iron Age and modern-day Scandinavian samples. The relevant datasheet is here.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Modern-day Poles vs Bronze Age peoples of the East Baltic

The genetic history of Northern Europe (or rather the South Baltic)


Matt said...

Interesting stuff; it looks like on the Northern European PCA, the Spiginas2 CWC Baltic sample is distinct in sitting with the Baltic BA, while the CWC_Early overlap with Yamnaya and then the other CWC Baltic are pretty "typical" folk who sit with the CWC Germany and Steppe_MLBA:

This also looks to be the case in the West Eurasia PCA as well:

As they say in the paper: "The individual Spiginas2, dated to a very late period of the LN (2130–1750 calBCE), stands out in that it shares an excess of alleles with European forager groups when compared to the Yamnaya populations, with the top hits being Switzerland_HG, WHG, Baltic Mesolithic and Baltic EMN Narva (Supplementary Table 7).".

So perhaps this is kind of like the earliest sample of the Baltic_BA proper who are sampled later in history and along with Welzin / Tollensee BA and Hungary_BA samples (who both collectively more Central European affinities) which begin to demonstrate the affinities to Slavic populations. (So far unsampled Trzciniec will, as some of your commentators like Arza have said, will probably start to fill this out more, and we will likely see that this culture contains individuals with similar affinities).

Ryan said...

"And to be honest, I'm not quite sure what to make of these results."

Are the two on the right just pre-WHG admixture and the one on the left post? (With WHG ancestry continuing to increase even into the Bronze age?)

Davidski said...


WHG admixture can't be the cause of the shift to the left, because there are many samples with higher WHG sitting right of those with lower WHG.

For instance, the early Baltic CWC have less WHG than many ancient and modern samples sitting to the right of them.

Arza said...

The difference between two new CWC_early samples is... interesting:

CWC_Baltic_early:Plinkaigalis242 83.9
Sunghir:SIV 16.1
distance % = 0.468 / distance= 0.00468

Initial distance between both samples: 0.011221
Artefact or something real?

Matt said...

@Davidski: WHG admixture can't be the cause of the shift to the left, because there are many samples with higher WHG sitting right of those with lower WHG.

Though the example of Spinigas2 seems to me to indicate that, although WHG clade in general probably doesn't have anything much to do with the position on the North European specific PCA, admixture specifically from the Narva culture or whichever HG contributed to Spinigas2 probably does.

If that wasn't the case, and it was due to later drift, I would guess that we'd see that Spinigas2 would be sort of central (near Yamnaya), and then only populations that pick up later drift would be near the Baltic-Slavic populations. But instead Spinigas2 seems to be close here to the Baltic BA *only* from forager admixture.

(In a way it seems more likely that Baltic-Slavic populations have got their unique genetic drift from slightly different admixture, as I don't think models have ever shown a population bottleneck? E.g. the general populations look about as heterozygous and genetically diverse as other Europeans, not like they have gone through a small Ashkenazi or even Orcaadian style bottleneck?).

Of course, it's possible that Spinigas2 had already gone through a population bottleneck, but that seems a bit less parsimonious and I think we can safely say that a late population bottleneck and expansion after 0AD was not important.

Alberto said...

Thanks for adding these samples to Global 10. Results are interesting. Doing a simple 4 way mixture (WHG-AN-EHG-CHG) all for averages per population:

Yamnaya_Samara: 12.8 - 0 - 42.2 - 45
CWC_Baltic_Early: 29 - 0 - 27.8 - 43.2
CWC_Baltic: 39 - 12.2 - 13.4 - 35.4
CWC_Germany: 27.2 - 14 - 24.2 - 34.6
Baltic_BA: 58.2 - 5 - 4.8 - 32
Lithuanian: 42.2 - 20.8 - 13.8 - 23.2

So the early CWC_Baltic samples are quite Yamnaya-like, but they seem to have input from a more western HG than Yamnaya. If proto-CWC formed somewhere around the lower Don and proto-Yamnaya somewhere around the lower Volga, that could explain it.

Comparing CWC_Baltic (not early, but later samples) with CWC_Germany, they seem a bit different in the same way, as if CWC_Baltic derives from the western type of Yamnaya-like population, but the CWC_Germany came from a more eastern (Yamnaya proper-like) population.

More perplexing are the results for the Baltic_BA. Usually I would think it's a quality issue, but there are enough samples giving similar results. Very high WHG and CHG, very low AN and EHG. Strange results.

Here a spreadsheet with individual results:

A bit off topic, but somehow related and probably interesting for many here, a nice video with maps of European cultures/Empires/Nations... each year, from 400 BCE to 2017 AD, with population estimates for each culture.

Arza said...

Re: origin of Spiginas2 and Baltic BA

In the preprint in supplementary information (page 8) some qpWave models were shown.

Baltic_BA Baltic_LN 1.96E-11 rejected
Baltic_BA Baltic_LN Narva 1.18E-255 1.14E-02 rejected rejected
Baltic_BA Baltic_LN CordedWare_Central 1.63E-13 2.03E-02 rejected rejected

Baltic_LN_Spiginas2 CordedWare_Central 8.08E-07 rejected
Baltic_LN_Spiginas2 Steppe_EMBA 4.68E-16 rejected
Baltic_LN_Spiginas2 Narva 8.44E-68 rejected
Baltic_LN_Spiginas2 CordedWare_Central Narva 1.38E-242 1.80E-01 rejected not rejected

It's strange that it disappeared from the final version and that they didn't try to model Baltic BA the same way as Spiginas2.

Also, they've included Turlojiske4/RISE598 in the Baltic BA cluster, despite that he is strikingly different with zero or close to zero Narva admixture. This also could've mess up the stats (or not?).

Here is a map of Trzciniec horizon (Trzciniec proper, East Trzciniec, Komarov and Sosnica) with overlayed maps of archaeological sites and Narva Culture range from "Farming in Estonia":

Alberto said...

Taking another look at the Baltic_BA, I found in the paper the f3 stats for signals of admixture. Curiously the lowest Z score is between Baltic_EMN_Narva and Iran_ChL, followed by Baltic_Mesolithic and Levant_BA. So I added those and many other samples (including the preceding 2 higher coverage CWC_Baltic) to the source pops and run against the 2 high coverage Baltic_BA samples:

Narva_Lithuania:Kretuonas4 62.85 %
Armenia_EBA:I1658 23.7 %
CWC_Baltic_early:Gyvakarai1 12.85 %
Barcin_N:I0707 0.6 %
Loschbour:Loschbour 0 %
EHG:I0061 0 %
Kotias:KK1 0 %
Levant_BA:I1705 0 %
Baltic_HG:Spiginas4 0 %
Iran_ChL:I1661 0 %
Germany_MN_average 0 %
ALPc_MN 0 %
Iberia_ChL 0 %
Iberia_EN 0 %
Iberia_MN 0 %
Koros_HG:I1507 0 %
LBK_EN 0 %
CWC_Baltic:Spiginas2 0 %

Distance 0.002408

Narva_Lithuania:Kretuonas4 69.25 %
Armenia_EBA:I1658 16.55 %
Kotias:KK1 9.2 %
CWC_Baltic_early:Gyvakarai1 5 %
Loschbour:Loschbour 0 %
Barcin_N:I0707 0 %
EHG:I0061 0 %
Levant_BA:I1705 0 %
Baltic_HG:Spiginas4 0 %
Iran_ChL:I1661 0 %
Germany_MN_average 0 %
ALPc_MN 0 %
Iberia_ChL 0 %
Iberia_EN 0 %
Iberia_MN 0 %
Koros_HG:I1507 0 %
LBK_EN 0 %
CWC_Baltic:Spiginas2 0 %

Distance 0.002987

So, strangely, it does seem like they're a mix of local HG with some CHG-rich population close to Armenia_EBA, with little continuity with the preceding CWC. No idea if it's some artefact or what does it mean.

Matt said...

Re: lower f3 admixture statistics with Iran_CHL with Narva, those will be a combination of:

a) offset of Levant farmer related ancestry into Iran_CHL, compared to CHG. same reasons Yamnaya got those low admixture f3 with Iran_CHL, though it's unlikely to have actual Iran_CHL rather than some other Western farmer related ancestry. although in this case, Baltic_BA is likely to have even more Western farmer related ancestry...

b) admixture f3 is a test of how strongly the third population (e.g. Baltic_BA) most strongly violates forming a clade with / coalescing with either one of the other two populations (e.g. Narva, Iran_CHL).

This will be strongly weighted towards populations that have a balance of being extreme on trees, and most clearly form a strong divergent tree phylogeny, even if they are not the actual admixing populations (Yamnaya would not form such a divergent tree phylogeny with Narva so its f3 admixture statistics would not be as strong).

(Worth reading, though I find it hard to take in, you can get this out of that, if I'm not wrong)

If we took models a la Lazaridis 2017 and simulated a Baltic_BA via triples of either A: (Yamnaya, Anatolia_EN, Narva) and B: (Iran_CHL, Anatolia_EN, Narva), we'd pretty quickly find that (A,B;Real_Baltic_BA,outgroup) were always positive (that more drift shared between best fitting triples A and Real_Baltic_BA than best fitting B and Real_Baltic_BA).

(We'd also find that simple f2 branch length indicates the simulated A were always closer to Real Baltic_BA than B, and Fst too!).

tl;dr most negative f3 is not the best measure of the best real admixing populations...

Alberto said...


Yes, indeed that's correct and I didn't mean those f3 stats as proof of anything. I was trying to find D-stats that could be helpful, but I didn't (and it's not so easy due to high WHG increasing affinity to EHG, etc...). And came across those f3 stats that looked worth testing with real models given the previous results.

I do find strange the results, but can't think a special reason why they should be wrong (technically speaking), so maybe worth further testing with other methods (though as always, it's only more sampling that can give us better answers...).

Rob said...

Lol Arza
Where’s you find that map? I made it ~20 years ago

Arza said...

This one? :)

Alberto said...

OK, so keeping adding populations till something that makes sense comes out:

CWC_Germany 47.95 %
Narva_Lithuania:Kretuonas4 43.1 %
Salzmuende_MN 4.85 %
Kotias:KK1 2.4 %
Barcin_N:I0707 1.7 %

CWC_Germany 53.6 %
Narva_Lithuania:Kretuonas4 42.95 %
Salzmuende_MN 3.3 %
Koros_HG:I1507 0.15 %

A bit counter intuitive given the first results, but certainly more parsimonious than the other models.

Samuel Andrews said...

I might be the only who cares. But one Baltic BA individual Kivutkalns 42 was a redhead. And Kivutkalns 209 may have been a redhead.

Kivutkalns 42
rs1805008 TT

Kivutkalns 209
rs1805008 CT
rs1110400 AC

The German BA individual with R1a-Z280 had red hair. Also, one of the Sintashta individuals carried rs1805008 CT. It looks like Bronze age R1a M417 groups carried the red hair mutation at appreciable frequencies.

Ric Hern said...

@ Samuel Andrews

Interesting thanks. I wonder if the oldest Yamnaya like individual in that area dates to 3200 cal BCE exactly ? Will this mean that Corded Ware spread a few hundred years earlier than previously thought (2900 calBCE) ?

Matt said...

@Alberto, sure, my comment post up there was mainly me trying to think aloud through the theory for my own benefit about why the lowest f3 admixture stat doesn't imply best fitting admixture model, or a better fitting 2pop admixture model than two populations that don't get such a low f3 admixture stat.

(low f3 admixture stat comes about through a mix of factors that don't actually imply the two pops with the lowest stat are the best fitting real two populations! this is probably underappreciated in most papers that use the methodology, I'd say.)

I don't find the results on the admixture f3 so strange at all for the reasons in my post; as for the nMonte, Ger has discussed why long distance population mixtures are often favoured in nMonte over close matches and tried to include a fix in nMonte3 through distance penalization (as a means of regularization), but this has the effect as well that it can cause problems for real long distance admixture so can only be used with human judgment (which gets us back to questions of our preferences which we're trying to get away from having any influence and so on!)... I think this stuff will be solved and its mainly a question of getting higher data quality out of ancients, to distinguish effectively between scenarios with relatively fine divergences in the grand scale.

Alberto said...


"I think this stuff will be solved and its mainly a question of getting higher data quality out of ancients, to distinguish effectively between scenarios with relatively fine divergences in the grand scale."

Yes, in the end this is the only way of knowing with certainty. All other methods are approximations that sometimes can be quite accurate and others not so much (notable is the case of SC Asia in this regard, where the lack of samples forces us into very speculative terrain).

Anyway, I'm now quite sure that my initial findings were a strange artefact of some kind. Further testing has made me settle for this model for the Baltic_BA samples:

CWC_Germany 52.4 %
Narva_Lithuania 34 %
Hungary_BA 13.6 %

Distance 0.001912

Maybe we also need to give more consideration to the distances, though that can only be estimated when you're using the same dataset for a long time. In the case of Global 10 (undoing the normalization by reintroducing the eigenvalue data - if this way of expressing it finally helps everyone), it might be good to consider as rejected something like > 0.0025, while good fits should be below 0.002 and excellent ones below 0.0015 (though I'd need to run many more models paying attention to it to really know how to judge the distance - also considering that some models are going to be inherently worse than others due to their intrinsic nature, so the figures should not be fixed for every kind of model).

Rob said...

Matt / Alberto

I had begun dropping Hunter gatherers because they produced overly- inflated WHG levels. Is that only an issue in analysing moderns ?

Joukowski Transform said...

Samuel Andrews, the rs1805008/R160W variant you mention is still prevalent in those areas around the Baltic region and Germany.

Anthro Survey said...


I generally consider rejecting fits if distance is >.007. This tends to be slightly greater than the average distance between a given average and corresponding individual samples across the (Global10)board. So, it's a kind of a ballpark principle which I also use when modeling using the 20 PCA World dataset(don't remember the distance range there off the top of my head).

Though, I think the quality of the fit should be evaluated on a case-by-case basis. For example, using one approach and an alpha of 5%, it comes out that fit distances greater than ~.007 should be rejected for Tisza and ~.004 for Unetice on Global10.

Matt said...

@Alberto & Anthro Survey: Think I agree with Anthro Survey that we should definitely be considering nMonte distance normalizaed based on intra-population squared Euclidean distances in the other populations in the panel.

Otherwise they don't mean anything (e.g. it would be a mistake to say that a model based on one PCA1 is good because it has distance X, while a model based on another PCA2 with distance 10*X is bad, if distances in PCA2 are just typically 10x PCA1 in every instance).

I don't have anything to add to AS's comment about what a specific good distance should be tho, that said.

@Rob, I'm not sure. Unfortunately, it depends on the PCA, and how accurately the PCA is representing distances between HG and moderns or other ancients. Generally if ancients are overlapping moderns and the PCA is underdescribing differences between moderns-HG, then it should be a problem for overlapping ancients as well?

Alberto said...


Yes, in no way we can compare distances from different PCAs. I meant when using one that becomes more or less standard (as is Global 10 here, because Davidski is constantly updating it with new samples and it's what most of us use for modelling), maybe having a ballpark of what's acceptable or not could help (just mostly thinking aloud based on the problem I saw with these Baltic_BA samples, where adding many population, including CWC_Baltic and Germany_MN produced a weird model preferring Armenia_BA and Kotias with local HGs, whereas adding CWC_Germany and Salzmuende_MN "fixed" it - for no apparent reason).

@Anthro Survey

If I remember correctly, you're using the Global 10 datasheet as provided by Davidski, without any (un)scaling? Because in that case, the distances are going to be much larger than in the one with eigenvalue (un)scaling. A distance of 0.007 in the latter version is really poor fit (you can compare distances by running the same models I did above and checking the ones you get).

But otherwise, yes, I agree that mostly it should be on a case-by-case basis. It's not the same trying to model Lithuanians based on WHG-AN-EHG-CHG than doing it using CWC, Hungary_BA, etc...


I think it's probably a good practice whenever possible. That is, when we have enough populations that are less distant (in time, mostly) than HGs. In this specific case of Baltic_BA, it would not be possible to model them without HGs, because they have more HG ancestry than any other population. It also depends on the purpose of the model (my first model based on WHG-AN-EHG-CHG was meant to get the ballpark of each component, rather than to find the sources of real admixture. And usually that 4 way model is more or less balanced to at least get an idea - though as I've learned from this case, it can fail quite miserably in a few cases).

Davidski said...


There's a Global 25 coming this week. Hopefully it'll be more accurate than the Global 10 in several aspects, including estimating forager ancestry proportions. But this is a tricky area, so not easy to get right.

Samuel Andrews said...


Any news when Olalde and Mathieson will finally be published? I have a bad feeling it won't be till next fall because it took Mittnk over 1 year to publish.

Seinundzeit said...


Sounds very exciting (looking forward to it)!

Will you be using a PCA software that can output dimensions with eigenvalue scaling?

Anthro Survey said...


That's correct--I work with the raw coordinates. So, when I ran your last Baltic_BA model, got d of 0.005. So, indeed, I would imagine .007 would be a pretty poor fit with scaling.
For the Baltic_BA average, I got a d of 0.0056 (alpha of 5%) as a rejection threshold. So, I think your model isn't far from the truth.

Alberto said...


Good news, looking forward to it.

I agree with Seinundzeit that if it's possible to output the data for each dimension keeping the variance according to their respective eigenvalues it would be quite good, so we all use the same values (in Global 10 it was important for some corner cases to apply the correction, but I expect that with 25 dimensions it will be a must to have the dimensions' variance in according to their eigenvalues to avoid those very high dimensions to mess up the main information in the lower ones).

(It's easy enough to do this after, but it's mostly to avoid having 2 datasets being used by different users giving different results).

Matt said...

Re:Global_25, nice news, be really interested to see what this covers that Global10 didn't, and how they differ.

With the Martiniano Ancient67 World PCA at 20 dimensions, that run to showing some extra differentiation between West Eurasian populations that didn't really show up as well in Global10. I thought that mainly was the product of the lack of projection, but it would be cool if it could show up just due to higher dimensionality in Global25 (though I imagine if the dataset is more diverse on world populations, likely some of the extra dimensionality will relate to other distinctions).

I'm hoping this can provide some of the benefits of the Ancient67 in discerning finer structure, without losing the high ancient sample coverage of Global10.

Re:scaling (or whatever we're calling it), my impression is that this is most important to distinguish between genuinely distant and genuinely close populations. Like, if you have one dimension which packs almost all the differentiation between Africans and Eurasians, then nineteen dimensions splitting Eurasians apart with Africans at 0, then its quite possible for relatively close populations to be more different than far ones, if they're all treated at 0.

In practice I think this doesn't always happen as much because of how dimensions are constituted, with similar populations tending to be closer in most of them (e.g. I believe in Global10, Europeans are at the same position in most dimensions distinguishing intra-East Asian, intra-African, etc. differentiation). But certainly this seems to be why nMonte fits can find inflated African / Asian etc in West Eurasian "unscaled" models (likely vice versa if modeling African / Asian populations), and as Alberto notes, would probably increase as a problem where you have high dimensions splitting apart local populations.

Alberto said...

By the way, something that I always kept wondering and seems to be quite on topic now.

We have those early CWC samples from the East Baltic region that look close to Yamnaya (with some small shift from EHG to WHG), and almost no European Neolithic admixture (or none at all).

Ok, so let's suppose that the CWC came from Ukraine (or near) moving north and from there expanding west and east. The eastern early CW-related cultures would be Fatyanovo-Balanovo. I guess we could tentatively speculate that R1a-Z645 split somewhere near the Baltic Sea, with Z283 moving west and Z93 moving east. Then from the Fatyanovo-Balanovo cultures, the Abashevo culture emerged, and from it the Sintashta culture and then the Andronovo culture.

Now here's the thing that always looked strange to me. In the above scenario we should expect that the CW-related cultures moving east into the forest steppe to the Urals and beyond, the population would be getting admixture from either HGs (mostly Eastern, or at least Baltic-like), or even from Yamnaya-type of people. But not from European Neolithic people that never existed in those areas.

But quite surprisingly, Sintashta turns out to have some 40% European Neolithic admixture, being closer to the Bell Beakers from Central Europe than to Yamnaya (and even closer to Bell Beakers than to CWC_Germany, in Central Europe!). How is this possible?

One explanation could be that Sintashta does not derive from Abashevo (nor Poltavka, as we have samples from this latter culture), and not from Fatyanovo-Balanovo. But rather that they are recent migrants from Central Europe or near.

I'm not sure that makes a lot of sense, but it's possible. But also in this case I guess we should not expect Fatyanovo-Balanovo or Abashevo to be R1a-Z93, since Z93 would have been born in Central Europe (or near). Something also a big strange, given it's absence there so far. (Moreover, Sintashta is already Z2124+, so we'd expect to find its sister clade L657 in Central-Eastern Europe, I'd guess).

So does someone have a parsimonious understanding of the situation here? Or do you agree that there's something quite strange going on there?

(Does someone know if there are any Fatyanovo-Balanovo-Abashevo samples coming?)

Matt said...

@Alberto, generally, yeah, only commenting on the autosomal but, I can't fault that thinking, I would just assume that these cultures (and I guess particularly Sintasha) do derive from Central European cultures in some respect despite this not being what the mainstream archaeological model for has been (from what I understand).

Also it appears so far from the clines of modern Northern European ancestry and the clines on the West Eurasia PCA that whichever groups Sintashta / Andronovo / Srubnaya / Potapovka take ancestry from:

A) did not share in an enriched level of WHG / Villabruna HG related ancestry (e.g. we can draw a cline more or less exactly from Yamnaya to where Globular Amphora or Iberia_Chal would be)

B) nor did they pick up on whatever ancestry seems to be distinct to Baltic BA, Welzin BA, Hungary BA and is found in modern day Baltic-Slavic-Hungarian groups (which I'm presuming is mainly somewhat related to locally distinct hunter gatherers, since Spinigas2 seems to get it "immediately", from what we can tell so far from the sparse sequence).

(Hence all why the neighbour joining seems to be going Steppe_MLBA -> Corded_Ware Germany -> Unetice / Bell Beaker so far, and still not Steppe_MLBA -> Baltic_BA or anything like that).

Matt said...

(Also hence why it is difficult in most PCA for nMonte with any degree of clarity to select the North Central LNBA populations we might expect as ancestors for present day Western European populations, over Steppe MLBA, because they are not clearly that distinguished from the same steppe-Europe continuum. There's more distinction possible in the North Europe specific PCA but still hard to distinguish them from Nordic LN/BA/IA samples, and only slightly easier to diverge from Ireland_EBA).

epoch2013 said...


In this paper Mallory states that Asian and European IE languages share "a sub-
stantial amount of shared agricultural vocabulary
" pointing to "an economy based on domesticated livestock and domestic cereals". He states that all homeland models have to "be able to explain how we can recover cognate terms associated with farming from Ireland to India."

The Pontic Steppe Homeland theory has one big issue in this respect:

"The critical issue for these models is that while any and all of them could explain the distribution of domestic animal names, there are serious problems involved with the spread of arable agriculture. As Anthony remarks in this symposium, there is really no serious evidence for arable agriculture (domestic cereas) east of the Dnieper until after circa 2000 BCE (see also Ryabogina & Ivanov 2011; Mallory, in press:a). This means that there is also no evidence for domestic cereals in the Asiatic steppe until the Late Bronze Age (Andronovo etc). From the perspective of the Pontic-Caspian model, the ancestors of the Indo-Iranians and Tokharians should not cross the Ural before c 2000 BCE at the very earliest. Hypotheses linking the Tokharians to earlier eastward steppe expansions associated with the Afanasievo or Okunevo cultures of the Yenisei or Altai (Mallory and Mair 2000) become very difficult if not impossible to sustain (as long as there is no evidence of arable agriculture in these cultures) as Tokharian retains elements of the Indo-European agricultural vocabulary."

And that is where that surprising 40% European Neolithic admixture might come in. While the paper states that there may not be much proof, if any at all, of any agriculture east of the Dnieper, as the paper state, there is at least some evidence of settlement of farmers.

Mind you, this are mere thoughts, I'm not arguing something.

epoch2013 said...


So the idea could be that Western migration brought the names for the cereals (back?) to the east. At least I'd say it's food for thought.

Alberto said...

Yes, the hypothesis pointing to a recent migration from Europe makes sense autosomally (as Matt says) and it could help in solving the linguistic problem too (as epoch2013 says Though if I remember correctly another problem with Tocharian would be the word for pig, absent in the eastern steppe even in the Sintashta/Andronovo phases?).

So if this turns out to be correct, when we get samples from the forest steppe EBA cultures cited above we won't find the ancestors or Sintashta, but rather some Yamnaya looking people (maybe with extra HG) and with some sort of R1a-M417, but not Z93?

And probably the biggest problem with this is precisely the Y-DNA. If Z93 appeared in Central Europe first, and Sintashta brought a subclade of it to Central Asia, the sister clade L657 would have been left behind, in Central-Eastern Europe. But instead, we find it in India.

So things seem a bit complicated. It will really be interesting to see those forest steppe samples and figure out the origin of Sintashta more precisely.

Kristiina said...

Volga Uralics are high in Yamnaya ancestry and they carry a high amount of R1a1-Z283 haplotypes but very little Z93.

Maris and Udmurts carry c. 29% and 17% of R1a1, but only c. 2% of their yDNA is Z93 and it is probably Turk-related as, linguistically, "Mari has a large number of Turkic loanwords, a number of copied bound morphemes, verb-final word order, extensive use of nonfinite verb forms, postverbial constructions to express actional modifications, an interrogative particle mo etc. Largescale copying has endowed Mari with a typological habitus similar to that of a Turkic language".

I studied the STRs of the recent Sargat R1a1 from West Siberia but it seems that on the basis of the STR data included in that paper, it is impossible to decide if it is more probably Z283 or Z93. Is anybody more knowledgeable about this?

Alberto said...


Thanks, that's interesting. It could mean that forest steppe cultures like Fatyanovo could be Z283 and not Z93, though of course that's assuming that modern Uralics' R1a from the area descend from those cultures, which is not known with certainty.

However, if we assume that this was the case, how did Z93 get to Central Asia? The steppe itself was R1b-Z2103, and if the forest steppe was R1a-Z283, it's like a small group carrying Z93 travelled fast from Europe to Central Asia leaving no trace behind.

Time to wait for more aDNA to figure this out...

Davidski said...

M417 is from the North Pontic steppe, and obviously it's ancestral to the "Northwest European" L664 and the "Euro-Asiatic" Z645, which splits into the "European" Z282 and "Asian" Z93.

So, this whole part of the R1a tree is rooted in Europe, and only partly spills out into Asia.

This was clear from modern-day Y-DNA sequences years ago, and now it's being neatly backed up by ancient DNA from Eastern Europe. This will continue to happen, and it'll be shown that all Asian Z93 derives from a Bronze Age population that lived somewhere in Eastern Europe.

By the way, there's Z93* in Poland and western Russia, and these lineages don't look typically Turkic in the least. There's also a Cossack from Ukraine who belongs to a mutation just above L657; might be a clue as to where we'll find the earliest L657 or pre-L657 in ancient remains.

Rob said...

Perhaps the EEF admixture was chanced upon from late CT groups in forest steppe ?

Anthro Survey said...


We also can't rule out Slavic admixture into the groups you've mentioned.

After all, "contact groups" in the south like Albanians, Greeks, Vlachs--not to mention Romanians/Moldovans---all pack it. I.E in the Balkans, we can see that South Slavs have a paleo-Balkan substrate, while non-Slavic groups have a significant Slavic superstrate. The situation in Russia may have been similarly bidirectional. Just a thought.

Btw, modern Lithuanians definitely pack a good amount, and it likely accounts, in large part, for their difference from Baltic_BA.