search this blog

Friday, January 1, 2016

Kum6: Sardinian-like genome from Late Neolithic western Anatolia

Behind a pay wall at Current Biology:

Summary: Anatolia and the Near East have long been recognized as the epicenter of the Neolithic expansion through archaeological evidence. Recent archaeogenetic studies on Neolithic European human remains have shown that the Neolithic expansion in Europe was driven westward and northward by migration from a supposed Near Eastern origin [ 1–5 ]. However, this expansion and the establishment of numerous culture complexes in the Aegean and Balkans did not occur until 8,500 before present (BP), over 2,000 years after the initial settlements in the Neolithic core area [ 6–9 ]. We present ancient genome-wide sequence data from 6,700-year-old human remains excavated from a Neolithic context in Kumtepe, located in northwestern Anatolia near the well-known (and younger) site Troy [ 10 ]. Kumtepe is one of the settlements that emerged around 7,000 BP, after the initial expansion wave brought Neolithic practices to Europe. We show that this individual displays genetic similarities to the early European Neolithic gene pool and modern-day Sardinians, as well as a genetic affinity to modern-day populations from the Near East and the Caucasus. Furthermore, modern-day Anatolians carry signatures of several admixture events from different populations that have diluted this early Neolithic farmer component, explaining why modern-day Sardinian populations, instead of modern-day Anatolian populations, are genetically more similar to the people that drove the Neolithic expansion into Europe. Anatolia’s central geographic location appears to have served as a connecting point, allowing a complex contact network with other areas of the Near East and Europe throughout, and after, the Neolithic.

Omrak et al., Genomic Evidence Establishes Anatolia as the Source of the European Neolithic Gene Pool, Current Biology, DOI:


Nomen Cognomen said...

Following up on Open Genomes' link

According to this, none of the European countries have the non-steppe component in Yamna. Does that mean that Yamna didn't really have an input in Europeans, but it was rather Yamna-like people (West Yamna??)

Davidski said...

Yamnaya is a mixture of EHG and CHG (maybe with a little WHG too). This is so obvious in formal tests that there's no point debating it.

ADMIXTURE shows all sorts of things depending on the data that it's fed, and most of it is irrelevant.

Central Asians have a lot of CHG and some EHG, and with strong local drift they create their own clusters, and then pull both ancient and modern Europeans into these clusters.

Alberto said...

This does look like some amount of CHG admixture to me, though they lacked the CHG genomes to test directly. We'll see what it is exactly when the genomes are available, but these stats are pretty significant (NE1 is almost exactly an Anatolian Neolithic):

Denisovan Armenia_BA Kum6 ne1 -0.0574 -3.594
Denisovan Sintashta Kum6 ne1 -0.0353 -3.016

Dospaises said...


The files are already available at

Roy King said...

@Chad and @Krefter,
I'm curious. If Kum6 lacks CHG, then how do you explain that (using the D statistics from the paper itself) Kum6 and many Bronze Age samples from Armenia and Russia share more than these same Bronze Age samples with the European Neolithic like NE1. My guess is that Kotias would likewise have increased sharing hence Kum6 would be associated with Kotias more than the European Neolithic samples with Kotias? This analysis is separate from the k12 admixture and the PC plots.

ryukendo kendow said...

@ Royking

Roy, I suspect that the stats that clear up the picture are these:
Denisovan Armenia_BA Kum6 Iceman -0.0162 -0.92
Denisovan Karasuk Kum6 Iceman 0.0047 0.419
Denisovan Sintashta Kum6 Iceman 0.027 2.097
Denisovan Mezhovskaya Kum6 Iceman 0.0269 2.122
Denisovan Yamnaya_RISE Kum6 Iceman 0.0299 2.353
Denisovan Andronovo Kum6 Iceman 0.0319 2.861

It is not the case that all LNBA genomes favour Kum6. In comparison to Iceman, which is a genome with no especial similarity to CHG, all the genomes favour Iceman, except for BA Armenia which continues to favour Kumtepe as a proximal source of its EEF ancestry which still makes sense based on geography and history.

Using figure 4A from the figure, we have the following list of the similarity of genomes to Kum6:

Less similar than all other Neolithics to Kumtepe.

More similar than all other Neolithics to Kumtepe.

More than Gok2 and Ne1
Less than Iceman, CO1, ATP2

More similar than all but Iceman

Less than all but Gok2

More than Gok2, Stuttgart, NE1
Less than Iceman and CO1

It is a striking testament to the sensitivity of D stats that we can rearrange these into an order, of decreasing similarity with Kum6, in which every single one of the previous stats can be made to cohere:


If it is indeed true that Iceman is the neolithic genome most similar to Kum6, (there is corroboration of that from Treemix in Fig 4b as well) and that LNBA genomes with neolithic ancestry share drift with Iceman to the exclusion of other neolithic genomes, then LNBA genomes will also share drift with Kumtepe to the exclusion of other neolithic genomes, even if Kumtepe carries no CHG.

I wouldn't say that this is proven, there is still some possibility for the Kum6 to have CHG, but the fact that Iceman is favoured over Kumtepe and Iceman has no CHG makes that somewhat unlikely, especially as CHG is split 24 kya from EEF, its not a recent ancestral population, and so even a little CHG shared between Kum6 and LNBA should be expected to bias the statistic very strongly against Iceman.

Alberto said...


If I understand you correctly, you mean that the increased shared drift between Kum6 and LNBA populations is not due to Kum6 sharing drift with CHG, but to it sharing drift with Iceman. This is certainly consistent with the stats, but it leaves 2 good questions:

- Why does Iceman share much more drift with LNBA (including Armenia BA, Yamnaya or Andronovo) than Early European/Anatolian farmers (or even MN ones like Gok2 with high WHG admixture)?

- And why does Kum6 share much more drift with Iceman than the early Anatolian farmers do?

An explanation could be that:

- Kum6 is clearly different from Early Anatolian farmers, but for an unknown reason (since it's not CHG/EHG admixture).
- Kum6 had direct input into Iceman's ancestors.
- Iceman-related populations had direct input into LNBA populations (including Yamnaya).

A bit convoluted, but possible both by time and geography. We'd still need to figure out what's the exact difference between Kum6 and early Anatolian farmers, though.


- Kum6 has CHG/EHG admixture
- Iceman has increased affinity to CHG/EHG admixed populations compared to EEF, but not by admixture (reason unknown).

(When I refer to Iceman above, it also applies to CO1, but to a lesser extent).

I've seen Iceman behave strangely in some Dstats, like sharing a lot of drift with the Kalash, so I'm inclined to go for the second alternative. But who knows, maybe that increased affinity of Iceman/CO1/Kum6 to LNBA is real without any kind of (known) admixture in them.

Chad Rohlfsen said...

Balkan LN flow back into Anatolia?

Chad Rohlfsen said...

NE1 isn't the same as Anatolians either. NE1 is like Stuttgart.

Kristiina said...

I have also been wondering if in Europe, there was not only WHG but somewhere in the Mediterranean area (Italian Ice Age refuge ?), there was also another autochtonous component that started to spread northwards with the Neolithic expansion. However, this European component would also have spread to the Near East and North Africa while Anatolian and Near Eastern component travelled to Europe.

Alberto said...


"Balkan LN flow back into Anatolia?"

Kum6 is from 4700 BC. Iceman from 3200 BC. In the absence of further evidence, I would suggest Anatolian LN flow into the Balkans.

"NE1 isn't the same as Anatolians either. NE1 is like Stuttgart."

I said almost, and that's quite precise. Don't nitpick when you were telling people to look at the stats but didn't take a look at them yourself. Kum6 is not a typical Anatolian Neolithic farmer, at least not stats-wise.

Chad Rohlfsen said...

I am seeing a common pattern though. Affinity to those low quality genomes and those not fixed, when it comes to deamination. Those BA Armenians show SSA admixture in the 2-3% range, which isn't real. Iceman is the same story, and CO1, to a lesser extent. The Yamnaya Rise samples are the same story as well. I bet once this is run through, this genome will show ridiculous scores, not just because of low SNP count, but also damage/deamination giving high SSA scores. Wait until we have a decent quality genome from the area, at the same time. It will likely cluster with the Anatolians.

Arch Hades said...

So does this mean CHG was making it's way into Anatolia? It's sounds from the Abstract that we have Standard EEFs/ENFs but with a sprinkling of CHG.

Chad Rohlfsen said...

Alberto. Copper working in the Northern Balkans pre-dates Western Anatolia.

Gaspar said...

With Kum6 showing no CHG. Living on the Scamander river in NW Turkey ( Troad region )

With Hatti ( 2000BC )and Hittite ( 1700BC )texts showing zero semetic language in Anatolia.

One must revisit the old fables that the Hatti and Hitties came from coastal black sea Bulgaria.
KUM6 could also have come from there.

what relation does barcin and Kum6 have?

Rob said...

I don't think copper working in SEE per se predates that in the near east, it's just that it became more advanced (independently) in the Balkans during the M5.

Rob said...


- Kum6 is clearly different from Early Anatolian farmers, but for an unknown reason (since it's not CHG/EHG admixture).
- Kum6 had direct input into Iceman's ancestors.
- Iceman-related populations had direct input into LNBA populations (including Yamnaya).

A bit convoluted, but possible both by time and geography. We'd still need to figure out what's the exact difference between Kum6 and early Anatolian farmers, though.


- Kum6 has CHG/EHG admixture
- Iceman has increased affinity to CHG/EHG admixed populations compared to EEF, but not by admixture (reason unknown)."

Wasn't Kurd from Anthrogenica finding some "odd" results with Oetzi. Or is it due to the genome quality being poor ?

Alberto said...


It's not about where copper working started first. It's about a genetic change. Are you suggesting that because Late Neolithic people from the Balkans started to work with copper they suddenly developed a CHG affinity?


Yes, I saw those strange results with Ötzi, that's why I prefer the more simple explanation that Kum6 might have CHG (and/or ANE/EHG) admixture than the one that some mysterious drift is increasing the affinity of him and Ötzi to CHG/EHG admixed populations.

But as Krefter said, let's just wait for people to get the genome test it. That should give us the answer.

truth said...

The ADMIXTURE on this study is quite crappy. How come the EN and MN from Spain have the exact same component ? Alos WHG and SHG (Loschbour and Motala) show the exact same component, when we know they are not the same, the latter are pulled towards MA1.
Also, having a MN component in EUropeans is hidding many of their Mesolithic ancestry.

ryukendo kendow said...

@ Alberto

Alberto, this is not that convoluted really. If A and B form a clade (A, B), then if some population has input from A, its similarity to B will also be increased. Similarly, populations with input from Han will have increased shared drift with Onge, because Onge and Han share drift. In this case A and B are Kum6 and Iceman, and something Iceman-like seems to contribute to LNBA genomes.

Also, the most CHG LNBA genome, Yamaya RISE, is the least Kum6 shifted whichever the other neolithic genome is, while those LNBAs with EEF ancestry, incl the central European derived LNBAs and Armenian BA, are very Kum6 shifted, so I expect that when the genomes arrive, the connection will probably be revealed to be mediated via some sort of 'Late Balkan EEF', instead of CHG. There might still be some CHG in Kum6, but the majority of the effect must come from elsewhere.

Those 'weird stats with Oetzi' cannot be used to draw robust conclusions, because they were derived from an unstable methodology, where stats of the form

Primate outgroup, X, Y, Human outgroup

Were lined up from largest to smallest for D scores. This relies on minute discrimination of the magnitude of D-score between stats involving different X and Y, and we should know by now that the size of the D score is affected by all kinds of phenomena, while it is the sign of the D score that is very robust. We should just do direct comparisons for Kalash and Oetzi versus other populations, in which case Kalash will be exposed as much more similar to a whole host of other populations than to Oetzi, contradicting the list.

For example, the scientists, to figure out which Neolithic population is closest to Kum6, compared every single population pair directly:

Primate outgroup, Kum6, Neolithic 1, Neolithic 2

They did not line up a list of the stats:

Primate outgroup, Kum6, Neolithic, Human outgroup

After I raised this issue, Kurd has already emailed Patterson w.r.t to the double outgroup methodology, and Nick agrees that it faces problems.

ryukendo kendow said...

Previously, you also mentioned:

"But in any case, why do Dstats behave differently to IBS? For example, a population that shares a large amount of ancestry with EEF but has a small amount of SSA admixture would appear high in a list of populations by IBS sharing with an EEF but would show low affinity to it in Dstats. This is basically the phenomenon I'm talking about, and it's clearly seen in many stats. Whether it's a bug, a limitation, a design choice or whatever I don't know. But it certainly give strange results when comparing populations with different kind of admixtures."

Actually, D stats indeed behave precisely as you describe--Assyrian will indeed be closer to Loschbour than to Palestinian, for example--but it is a feature, not a bug, because D stats are drift-based, and therefore how divergent the ancestry contribution is (i.e. how long the drift path is to the divergent ancestry) is weighed into the stats as well, not just the proportion of divergent ancestry. D stats capitalise on the 'branching tree' nature of changes in populations, to derive a shared drift calculation based on shared branch lengths.

Say we have a comparison:

D Gorilla Assyrian Palestinian Loschbour

Assyrian and Palestinian share extra basal Eurasian to the exclusion of Loschbour, while Assyrian and Loschbour share extra Eurasian ancestry to the exclusion of Palestinian (as palestinian has some African ancestry).

The negative term is:
(Length of shared drift path leading to (Basal in Assyrian, Basal in Palestinian)) * (Proportion of extra Basal in Assyrian and Palestinian to the exclusion of Loschbour)

The positive term is:
(Length of shared drift path leading to (Eurasian in Assyrian, Eurasian in Loschbour)) * (Proportion of extra Eurasian in Assyrian and Losschbour to the exclusion of Palestinian)

We know that comparisons involving fully Eurasian and fully African genomes routinely generate D and Z scores 10, 20 or 30 times larger than comparisons involving very Basal and very Crown Eurasian genomes, which also makes sense as the length of the shared drift path leading to Eurasians is measured from the break point of Eurasian with African, dating back to OoA, while the length of the shared drift path to Basal is measured from the break point of Basal Eurasian with Crown Eurasian; so we can already tell that the drift path leading to Eurasians to the exlusion of African is *incredibly* long, probably around an order of magnitude longer than intra-Eurasian drift paths. This implies that the first item in the positive term is really large, so that African admixture in the single digits in Palestinians is sufficient to make them less similar to all other Eurasians.

Verbal explanations are very taxing, while reading Patterson's paper "Ancient Admixture in Human History" is more than sufficient to clear up all these questions, and allow us to interpret these statistics with more confidence. I highly recommend reading that paper.

You are of course right that IBD and Chromopainter are not so affected by those 'other' ancestries other than those directly compared, and therefore give Egyptians high chunk sharing with Anattolian Farmer--the African chunks in Egyptian are irrelevant to this measure. But then Chromopainter output is influenced by inbreeding, which is going to affect results for a vast swathe of the Middle East and SC Asia, while the same is not the case for f4 and D stats, so a combination of methods would be very useful.

Alberto said...


Thanks for the detailed explanations.

The behaviour of the Dstats I have observed it for a long time, but wasn't sure if it was a bug or a feature. If it's a feature then it's ok, but people should understand this "feature" because it can give unexpected results that can easily be misinterpreted (and often are). I'll read that paper from N. Patterson about it.

As for the Kum6 affinity to Bronze Age populations, yes it might well be what you explained exactly. But it still leaves us the question of why does this population from West Anatolia differ so much in its relationship to BA populations compared to the previous one on that same area less than 2000 years before. If it's not admixture, what is it? Drift in an already large enough population within a relatively short time span could probably not explain the big difference, so it's probably a different population that arrived there. A population that carries the same admixture components (AFAWK) but behaves differently in stats (just like Iceman). Something that we'll need to figure out why exactly.

Shaikorth said...

IBS, being a pairwise similarity comparison, will also give estimates on how divergent non-shared ancestry is. This tends to differ a bit from D-stats though, David has posted both IBS similarity and D-stats with the Scythian below and we can see that the IBS comparison will place North Africans closer to Scythian than East Asians are, but with D-stats it's the opposite. Both are valid comparisons though, IBS for raw similarity, and d-stat for let's say how divergent ancestry a population has compared to Scythian.

Anonymous said...

@Roy King

"I'm curious. If Kum6 lacks CHG, then how do you explain that (using the D statistics from the paper itself) Kum6 and many Bronze Age samples from Armenia and Russia share more than these same Bronze Age samples with the European Neolithic like NE1."

I could be not so much sharing as lacking. So NE1 could have something the others don't. The next topic already shows a substantial IBD sharing between Loschbour and NE1.