search this blog

Friday, August 31, 2018

Focus on Hittite Anatolia

I computed a series of D-statistics on most of the currently available ancient samples from Central Anatolia - dating from almost the Epipaleolithic (Boncuklu_N) to the Hittite era (Anatolia_MLBA) - to try and get a better idea of who the Indo-European-speaking Hittites may have been. The full output as well as details about the key ancient samples used in this analysis are available here. See anything interesting? The most noteworthy statistics, I suppose, are those listed below, because they're significant (Z≥3) and organized chronologically.

However, the thing to keep in mind in regards to D-statistics, and the very similar f4-statistics, when looking for signals of mixture is that they may or may not produce significant Z scores because of several reasons, such as the choice of the outgroup, the choice of the reference samples and the phylogenetic relationship between them, or even the type, quality and density of the data being used.

Perhaps ironically, the D-statistics above suggest that the Neolithic Central Anatolians (Boncuklu_N and Tepecik_Ciftlik_N) were more European-like than those from the Bronze Age, and I suspect that this is one of the main reasons why the idea of Eastern European admixture (from the Pontic-Caspian steppe and/or Balkans) in Hittites is currently being rejected by the geneticists working on the problem. But this dilemma is easy to explain away by the fact that the Neolithic samples carry much higher ratios of Anatolian Epipaleolithic hunter-gatherer admixture and also other types of ancestry shared with and/or closely related to European hunter-gatherers and early farmers.

In other words, I'd say that most of the statistics are being confounded by deep phylogenetic relationships, and thus aren't very useful for solving the Hittite problem. Interestingly, though, that relationship to Europe is reversed somewhat in the D-statistics involving Anatolia_EBA and Anatolia_MLBA, with the latter showing significantly higher affinity to Eastern European Hunter-Gatherers (EHG) and Minoans.

Thus, in my opinion, to get a more complete picture it's also useful to look for patterns in the statistics, even those that, strictly speaking, don't reach significance. One way to do that is with linear models. So here are a few linear models based on some of my D-statistics. The relevant datasheet is available here.

Arguably, the most striking thing about these models is the position at the top of the graphs of the ancient populations from Central Asia and what is now Iran, and the gradually lower position of populations with progressively less of this type of ancestry. The most plausible explanation for this phenomenon is post-Boncuklu_N gene flow into Central Anatolia from the east, possibly as a continuation of something that was happening already since the Epipaleolithic, but becoming more intense during the Neolithic revolution, probably as a result of rapid population growth in and around the Fertile Crescent.

Indeed, I strongly suspect that one of the main reasons why we've been hearing so much lately about Iran as a likely candidate for the Indo-European homeland is this strong eastern signal in Bronze Age Anatolian DNA. If so, then this is likely to be a misunderstanding, because there are better explanations for it than the Indo-Europeans, such as the Hattians and Hurrians.

Another rather obvious outcome in my graphs is the relatively stronger affinity between the Bronze Age Anatolians and the ancient populations from Eastern Europe, including, and especially, those from the Pontic-Caspian steppe, compared to Tepecik_Ciftlik_N. In fact, looking at the Anatolia_EBA vs Tepecik_Ciftlik_N graph, I'd say that steppe admixture was already seeping into Central Anatolia during the Early Bronze Age.

If so, this is an important point that should be taken into account when modeling the ancestry of the Hittite era Anatolians. That's because if Anatolia_EBA already harbored some steppe ancestry, then we'd be shooting ourselves in the proverbial foot if we were to use it as the supposedly unadmixed reference population to try and determine whether Anatolia_MLBA was partly of steppe origin. Hence, to model the ancestry of Anatolia_MLBA, at least in the context of possible migrations from the steppe to Anatolia during the Bronze Age, it might be more useful to use Tepecik_Ciftlik_N as the likely unadmixed reference population.

Let's try that with qpAdm, first on the whole Anatolia_MLBA set, and then on one individual labeled MA2203, who, as far as I can tell, shows an elevated level of steppe ancestry in several different types of analyses. I chose Yamnaya_Kalmykia as the potential mixture source from the steppe because it's likely to be the closest available population in my dataset to the Eneolithic groups of the southern region of the Pontic-Caspian steppe.

Seh_Gabi_ChL 0.200±0.043
Tepecik_Ciftlik_N 0.659±0.033
Yamnaya_Kalmykia 0.141±0.022
chisq: 11.425
tail prob: 0.408405
Full output

Seh_Gabi_ChL 0.179±0.065
Tepecik_Ciftlik_N 0.622±0.049
Yamnaya_Kalmykia 0.199±0.036
chisq: 12.914
tail prob: 0.299004
Full output

Please note, however, that these mixture models are based on f4-statisctics. So, obviously, they're going to be affected by the same factors as described above that affect f4-statistics. Hence, despite the seemingly statistically sound output, the steppe admixture that you see there might not actually be admixture from the steppe.

In fact, there's a good reason why I'm not shouting from the rooftops that I've just uncovered the presence of steppe ancestry in Bronze Age Anatolia, and thus confirmed the steppe or kurgan hypothesis positing that the Hittite and indeed Indo-European homeland was located in the Pontic-Caspian steppe. That's because I used a mixed bag of UDG-treated capture data and non-UDG-treated shotgun data. This is known to be a serious problem, which can skew the results of even the most robust analyses, and produce spurious statistics and Z scores.

Nevertheless, I'm reasonably confident that my findings will eventually be confirmed with more and higher quality data from ancient Anatolia. Let's wait and see.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...


Ned said...

Hi - not a comment on this (I assume you will moderate it) but have you read Dr Kortlandt's recent article on the expansion of the Indo-European languages?

Chad said...

Gotta include Anatolia ChL and Armenia ChL to cover the ChL cline that preceded the BA and Steppe pretty much disappears though.

Davidski said...

No can do.

Anatolia_ChL and Armenia_ChL aren't from Central Anatolia, so I can't include them in this transect of time.

George Okromchedlishvili said...

So far I think it's safe to assume that some population like Bronze Age Anatolians moved to Caucasus and contributed to the formation of Georgians (and prob Abkazs, Ossetians and Armenians too).

Davidski said...

I'm still wondering what the hell Armenia_ChL really is. It's just a weird outlier that never seemed to fit into the whole picture in that area.

Although I guess these sorts of isolates are bound to pop up from time to time, especially in mountainous regions.

Chad said...

Seh Gabi and Yamnaya aren't from Anatolia either. However, a cline from NW Anatolia to Armenia is seen. Gotta cover regional variability before going outside. Much more proximate potential sources for EHG like increase.

Matt said...

Any possibility of running D-stat(Chimp, X;Boncuklu_N, Barcin_N), (Chimp, X;Boncuklu_N, Koros_N) and (Chimp, X;Boncuklu_N, Iberia_EN) for the same data and adding the WHG as a row? Just to get a look at things really, kind of not really related to the main topic of the blog post.

Matt said...

Having the results for (Chimp, X;Tepecik_Ciftlik_N, Barcin_N), (Chimp, X;Tepecik_Ciftlik_N, Koros_N), (Chimp, X;Tepecik_Ciftlik_N, Iberia_EN) would also be interesting, if you don't mind.

Matt said...

Looking at a negatively correlated pair of stats (reversing the (Chimp, X;Anatolia_MLBA, Tepecik_Ciftlik_N) stat to do so):

Comparing the stat A: D(Chimp,X;Tepecik_Ciftlik,Boncuklu_N) against B: D(Chimp,X;Tepecik_Ciftlik,Anatolia_MLBA) we find that they are mildly negative correlated in different directions.

The negative correlation though is weak though as the differences seem a bit orthogonal.

Stat A finds that the most Boncuklu like and least Tepecik like ancient populations are northern and European, whether they are east or west; the most positive (Boncuklu) are Iron Gates HG, EHG, UkraineN, while the EEF descended Europeans with HG admixture overlap with Baltic_BA, Sintashta and Ukraine_Eneolithic.

The most negative populations, which are most Tepecik like and least Boncuklu are Tepe_Hissar_Chl, Dzharkutan1_BA, Gonur1_BA, southern, but not especially eastern.

Stat B finds that the most Boncuklu like and least Anatolia_MLBA like ancient populations are eastern, whether northern or not. The Anatolia_MLBA end contains Yamnaya, EHG, CHG, Sarazm Eneolithic, without much distinction whether they are northern or southern. The Tepecik end contains mostly EEF rich European populations, but Ukraine_N and Iron_Gates_HG are non-significant around zero.

That suggests that Boncuklu has more European HG ancestry relative to Tepecik_Ciftlik though probably of a form we don't have an exact proxy for today (no sample European HG populations are extremely above the line). While Anatolia_MLBA doesn't have more HG related ancestry than Tepecik, or if it does, it has less extra than Boncuklu does, but it definitely has more eastern ancestry than Tepecik.

So these shifts are mildly negatively correlated, but mostly orthogonal, in wholly different directions.

Of course, if you consider the qpAdm results in the main post they're fairly consistent! In those Seh_Gabi_Chl (more southern than Tepecik) masks out a signal of extra Euro HG (northernness) from Yamnaya, so the main result is the Anatolia BAs having east shift without a north or south shift.

Definitely seems plausible as a model, but is that a more historically parsimonious model than just using Armenian / Caucasus_Eneolithic samples in a two way model? I don't know; I'd have to understand the population shifts in more detail than I do (and archaeology is often speculative!).

Davidski said...


You're making the assumption that populations like Anatolia_ChL and Armenia_ChL existed in Central Anatolia at some point, but I can't see any evidence of that.

Did they exist there between the time of Tepecik_Ciftlik_N and Anatolia_EBA, and then somehow contributed to Anatolia_MLBA? I don't think this is possible.

On the other hand, there is evidence, from the D-stats, of gradual gene flow into Central Anatolia from eastern populations. So I need a proxy for that in my analysis. And since I'm testing the hypothesis that the Hittites came from the steppe, then I also need a good proxy for steppe admixture.

Davidski said...

In other words, Anatolia_ChL is from Western Anatolia, and there's absolutely no evidence that anyone like this migrated through Central Anatolia to get there.

The most parsimonious assumption is that eastern gene flow into an Boncuklu_N-like substrate created populations like Anatolia_ChL in Western Anatolia and Tepecik_Ciftlik_N/Anatolia_EBA in Central Anatolia.

Nezih Seven said...
This comment has been removed by the author.
Davidski said...



velvetgunther said...

"Utter crud!" says David.
Also an article by Razib

Davidski said...

Actually, yeah, let me edit my earlier comment...

The most parsimonious assumption is that eastern and southern gene flow into an Boncuklu_N-like substrate created populations like Anatolia_ChL in Western Anatolia and Tepecik_Ciftlik_N/Anatolia_EBA in Central Anatolia.

Unknown said...

Luca Cavalleri-Sforza just died.

Aniasi said...

@David, you've been quoted in India today!

PF said...

It looks a lot more "southern" than "eastern." In fact I'm having a hard time recreating the D-stat result suggesting increased EHG in Anatolia_MLBA relative to Anatolia_EBA (using G25 pop averages and nMonte).

Anatolia_Chl is relevant, not necessarily because it directly contributed to later central Anatolian populations, but because it’s already quite similar to Anatolia_EBA and MLBA. Whatever traces of EHG exist in MLBA are already there in Chl and generally they seem very similar overall.

Testing some possible southern and eastern inputs shows no preference to EHG/Steppe (at least using G25/nMonte). E.g., when trying to recreate the qpAdm model above for Anatolia_MLBA I get:

[1] "distance%=3.3812"



Replacing Yamnaya with Levant_Chl doesn’t really change the fit even though the Tepecik_Ciftlik percentage stays exactly the same:

[1] "distance%=3.4069"



Replacing Seh_Gabi_LN with CHG does improve it:

[1] "distance%=2.7668"



And using Anatolia_Chl instead of Tepecik_Ciftlik improves it further and absorbs most of the CHG:

[1] "distance%=2.5473"



Adding EHG or Yamnaya to the above gets rejected.

All these Anatolians just seem “East Med” — which I think is basically original Anatolian_Ns + Levantine_N + CHG/Iranian-related stuff. The change starts appearing in the Neolithic and increases through the Chalcolithic, but once it’s there, I’m not seeing much extra of anything during the Bronze Age.

Davidski said...


I think what you're seeing there are the confounding factors that I talked about, which are tied to the European-related ancestry in these Anatolian groups. And by adding CHG to your models, you're basically creating an excellent steppe effect, at least in G25/nMonte.

However, two of your models are rejected in qpAdm, which suggests that by adding CHG you're overfitting your models with a reference population that is relevant, but not directly relevant.

The Anatolia_ChL model does work well, but like I said, this can't be what really happened. Rather, it seems like Anatolia_ChL and Anatolia_MLBA are products of very similar processes in different parts of the region.

Following on from that, check this out...

Davidski said...

Hmmm...actually, if an Anatolia_ChL-like population migrated to Central Anatolia from the west and gave rise to the Hittite era Anatolia_MLBA, then that's pretty much in line with the steppe hypothesis anyway, except the steppe ancestry that we see in Anatolia_MLBA isn't actually from the steppe, which makes no difference for a linguistic model anyway.

Open Genomes said...

Restricted nMonte3 for MA2208 Population: Anatolia_MLBA_low_res Bronze Age Anatolia excluding Mycenaeans

@David, what do you make of the 7.2% Afanasievo in this sample?

What's really striking here is the 16.8% BMAC (-related) Tepe Hissar Chalcolithic, similar to the Tepe Hissar / BMAC component in the roughly contemporary Middle-Late Bronze Age Levantines from Sidon.

This individual has a 24% Levantine Bronze Age north component that has in other runs been absent from the other Bronze Age Anatolians. I think that this has something to do with Kalehoyuk being an Assyrian trading colony before the presumed Hittite sack and conquest in 1750 BCE. It's interesting that MA2208 shares a ancestry with the likely Amorites of Sidon at a time that Babylonia and Assyria were under the domination of the Amorites.

To me he seems to be part-Assyrian and part Indo-European Anatolian, with the Anatolian including a good portion of native Anatolian Neolithic ancestry (which is very similar to the Peloponnese Neolithic and LBK).

So what is this very prominent "BMAC" Tepe Hissar component we see both in MA2208 and the Bronze Age Levantines all about?

Why is the steppe ancestry Afanasievo, rather than say Yamnaya? Is Afanasievo more a reflection of an immediate pre-Yamnaya (PIE including Anatolian) group, that perhaps has less Caucasus-related ancestry?

There's also a relationship to the Mycenaeans, but MA2208 has some additional Afanasievo lacking in the Mycenaeans:

Restricted nMonte3 for MA2208 Anatolia_MLBA_low_res Bronze Age Anatolia with Mycenaeans

Open Genomes said...


MA2206 comes from the same context as MA2208. This female was found underneath MA2208, dead with others in the attack on the public building at Kalehoyuk in 1750 BCE.

Restricted nMonte3 for MA2206 Anatolia_MLBA Bronze Age Anatolia

Unlike her companion MA2208, she doesn't have any Steppe ancestry at all. She has some Levant Bronze Age North (3.0%), but is much more like the earlier Levantines. She has substantial ancestry from Hajji Firuz Chalcolithic in northwest Iran, and also has she also has ancestry from BMAC (Parkhai). This western to eastern Chalcolithic Iran is much more oriented to the west than the Levantine Bronze Age. The Dzarhkutan2 female that she matches is one of those 100% Maykop - Caucasus women found in BMAC, so she is also much more Caucasian.

Open Genomes said...

What's important about MA2208 is that he has a very clear Steppe mtDNA, H6a1b2e. H6a1b was found in Yamnaya Samara, and H6a1b2e today is only found in Denmark and Ireland.

H6a1b and H6a1b2e modern distribution

Combining this evidence and the autosomal results, as I said, we can say that MA2208 is a mix of Amorite-related Assyrian and Indo-European proto-Hittite Anatolian.

He's not R1b-L23. His Y is G-M406, which is mostly Near Eastern and Mediterranean, and hasn't been seen among Steppe peoples and their descendants. However, it's his mtDNA that clearly Steppe-related, and his autosomal ancestry seems to show this Steppe admixture as well.

Double 7.4% Afanasievo and you get 14.8% Afanasievo for a typical Middle Bronze Age Indo-European Anatolian.

Looking more closely at the pre-Bronze Age roots of MA2208 shows that his Steppe likely comes from a population similar to that of the Varna outlier AN163. This is pre-Yamnaya:

restricted nMonte3 for MA2208 Anatolia_MLBA_low_res Bronze Age Anatolia Chalcolithic and earlier

Perhaps this is what we should expect from Indo-European Anatolians, a very early pre-Yamnaya Steppe-related population, who arrived in one way or another around the Black Sea (likely in this case, to the west) with substantial admixture along the way.

Davidski said...


I missed this, but you're right. H6a1b2e looks like a maternal Bronze Age steppe marker.

Ric Hern said...

@ Open Genomes

Very interesting. But how can it be Afanasievo when Afanasievo was Post-Yamnaya ? Suvorovo Maybe ?

Ric Hern said...

This makes me wonder if Novodanilovka contributed to the formation of Afanasievo ?

Open Genomes said...

@David, thank you. I think maybe MA2208 is worth a closer look on its own, and a post by itself, right?

@Ric Hern
Of course it can't be Afanasievo itself. However, the pure "Steppe component" of MA2208 has to be something "pre-Yamnaya" around the time that Afanasievo packed up and left for the East, but shortly after Khvalynsk Eneolithic. (His mtDNA H6a1b however was found in Khvalynsk.) Are there any individuals that fit the bill?
What is it that's lacking here in Yamnaya, additional Caucasus ancestry?
Or could this be something like early Steppe Maykop?

PF said...

@Davidski Thanks for the links/analysis. No doubt CHG isn’t a great reference, but neither is Yamnaya considering it post-dates Anatolia_Chl. Obviously whatever Yamnaya-related stuff had to come from somewhere, but not Yamnaya themselves… so the hunt continues.

Looking west, it’s interesting to consider Malak_Preslavets. It predates Anatolia_Chl by ~1500 years and is about as far from Anatolia_Chl as Anatolia_Chl is from the later central Anatolians. Pretty close. Malak definitely has significant EHG but lacks CHG and Iranian-related stuff entirely.

What this tells me is that 1) EHG-related people were likely all around the Black Sea for awhile and 2) whatever happened in Anatolia by ~4000 BC is unrelated to the Balkans. Going the other direction east, Armenia_Chl does look highly related to Anatolia_Chl… the latter basically a version of the former mixed with a lot more local Anatolia_N ancestry.

Considering all this, it’s most parsimonious to guess that proto-IE existed somewhere around eastern Anatolia / southern Caucuses, that proto-Anatolian was spoken before 4,000 BC, and that every other branch was spread by Yamnaya-related migrations later on. This is just my naive reading of the genetic data with next to zero actual consideration for archeology and linguistics…

[1] "distance%=4.3589"



[1] "distance%=3.132"



But what exactly is Armenia_Chl? On one hand it’s an isolate, on the other it seems relevant. Clearly it has all the possible steppe-related components, but ultimately I feel we’re missing a key ancient sample to make full sense of it?

(PS, perhaps you might not be giving your own methods enough credit. :) G25 + nMonte has been solid at replicating stuff we know with more certainty, and uncovering surprises that later made more sense, so it can’t just be dismissed easily. Though I wish to dig deeper into the math of qpAdm and related Reich lab tools if I ever get the time...)

Davidski said...


Sredny Stog is way earlier than Yamnaya.

Hittite era Anatolians in qpAdm

And a map here...

A Corded Ware-related Proto-Greek from the Pontic-Caspian steppe?

PF said...


Thanks for the analysis of MA2208. Being G-M406 myself, and this being the first and only(?) G-M406 ancient sample, I second the request for a more detailed post about him. :)

Nike81 said...

I recently did MyTrueAncestry ‘deep dive’ and it states that I am related to MA2208 Hittite Anatolian & I9010 Mycenaean

MA2208 Hittite
Total cM=22.14
Largest segment=3.0 cM (11 shared. Sample quality: 7)

I9010 Mycenaean
Total cM=3.38
Largest segment=2.01 cM (2 shared. Sample quality: 9)

Nike81 said...

I am Greek J2a1i (L88) (L198) and MyTrueAncestry states I am related to I9010

• Galatas19 (I9010): Female without an osteological age estimate, LH IIB to LH IIIC (15th to early 12th century BCE)

• Galatas4 (I9041): Male without an osteological age estimate, LH IIB to LH IIIC (15th to early 12th century BCE).

I9041 (Mycenaean from Galatas Apatheia in the Peloponnese)

This individual was derived for mutations

L26:22942897T->C and F4326:23021978A->G (J2a1) as well as upstream mutations M410:2751678A->G, L559:21674327A->G, L152:22243566C->T, L212:22711465T->C (J2a).

He was ancestral for M322:15469740C->A (J2a1a), M260:15025506G- >A and M92:21904023T->C (J2a1b1), M166:21764694C->T (J2a1b2), L210:16492197A->T (J2a1b3), M68:21878700A->G (J2a1c), M339:2881367T->G (J2a1e), P81:6739856G->A (J2a1g), L207.1:6753448A->G and L24:14286528G->A (J2a1h), L88.2:17595842T->C and L198:17595861A->C (J2a1i).

He could thus be designated as J2a1x(J2a1a, J2a1b1, J2a1b2, J2a1c, J2a1e, J2a1g, J2a1h, J2a1i).