Thursday, March 7, 2019

A challenge

The datasheets below contain outgroup f3-statistics for a wide range of ancient and present-day populations. Five of the ancient groups and individuals are labeled "Unknown". In fact, I do know what they are, but I'd like you to try and work out whether they were the speakers of Indo-European or non-Indo-European languages by analyzing the datasheets with, say, PAST or nMonte.



I'll reveal the identities and likely languages of the mystery ancients in a couple of days. It'll be interesting to see if any of you nail this challenge. It shouldn't be too difficult, but to help things along, I color coded the populations in the datasheets (black = Indo-European, blue = Uralic, and grey = neither). If you haven't done this sort of thing before, these blog posts might be useful as background reading.

Maykop: a multi-ethnic layer cake?

D-stats/nMonte open thread

Update 09/03/2019: Samuel nailed the challenge in the first post below. And then Matt almost figured out the precise identities of the mystery ancients here. In hindsight I should've made this more difficult. Here are the answers:

Unknown1 = England_Anglo-Saxon (Indo-European) > more here
Unknown2 = Levanluhta_IA (non-Indo-European) > more here
Unknown3 = Minoan_Lasithi (non-Indo-European) > more here
Unknown4 = Slavic_Bohemia (Indo-European) > more here
Unknown5 = Turkmenistan_IA (Indo-European) > more here


Ric Hern said...

As I see it Proto-Baltic-Germanic-Celtic-Italic-Lusitanian (For lack of a better word) was spoken where the Notec River split from the Vistula. Then Proto-Germanic-Celtic-Italic-Lusitanian was spoken in Central Northern Germany and Denmark. Then Proto-Celtic-Italic-Lusitanian was spoken in Northwest Germany and the Netherlands. Then Proto-Italic-Lusitanian was spoken along the Rhone River. Dialects which stayed behind evolved into the later Proto-Baltic, Proto-Garmanic and Proto-Celtic.

Aram said...


In Your Globe25 there are three Anatolian Greeks. I wanted to know are they from Cappadocia or from a more western places like plain Anatolia.
I ask this because I want to compare Anatolia_MLBA samples to modern pre-Turkish people who lived in that region. It seems there was a extra significant migration from East after LBA toward Anatolia. During Early Iron Age. I have seen some Greek samples from Kayseri in Gedmatch they have much more CHG than the Anatolia MLBA samples.

This serious migration is corroborated by archaeology. In Early Iron age in many sites in Cappadocian part of Anatolia appears a new pottery called grooved or groovy. Some scholars suggested a South Caucasian origin ( or North East Anatolian ) of this ware and Sevin linked it to Indo European people Mushki that are usually linked to proto-Armenians.

Arza said...

Survival of Late Pleistocene Hunter-Gatherer Ancestry in the Iberian Peninsula

Max Planck Institute for the Science of Human History

The Iberian Peninsula in southwestern Europe represents an important test case for the study of human population movements during prehistoric periods. During the Last Glacial Maximum (LGM) the peninsula formed a periglacial refugium [1] for hunter-gatherers (HG) and thus served as a potential source for the re-peopling of northern latitudes [2]. The post-LGM genetic signature was previously described as a cline from Western HG (WHG) to Eastern HG (EHG), further shaped by later Holocene expansions from the Near East and the North Pontic steppes [3–9]. Western and central Europe were dominated by ancestry associated with the ~14,000-year-old individual from Villabruna, Italy, which had largely replaced earlier genetic ancestry, represented by 19,000-15,000-year-old individuals associated with the Magdalenian culture [2]. However, little is known about the genetic diversity in southern European refugia, the presence of distinct genetic clusters and correspondence with geography. Here, we report new genome-wide data from eleven HG and Neolithic individuals that highlight the late survival of Paleolithic ancestry in Iberia, reported previously in Magdalenian-associated individuals. We show that all Iberian HG, including the oldest ~19,000-year-old individual from El Mirón in Spain, carry dual ancestry from both Villabruna and the Magdalenian-related individuals. Thus, our results suggest an early connection between two potential refugia resulting in a genetic ancestry that survived in later Iberian HG. Our new genomic data from Iberian Early and Middle Neolithic individuals show that the dual Iberian HG genomic legacy pertains in the peninsula, suggesting that expanding farmers mixed with local HG.

ENA-LAST-UPDATE 2019-01-28

Arza said...

Whole-genome sequence analysis of a Pan African set of samples reveals archaic gene flow from an extinct basal population of modern humans into sub-Saharan populations


Population demography and gene flow among African groups, as well as the putative archaic introgression of ancient hominins, have been poorly explored at the genome level. Here, we examine 15 African populations covering all major continental linguistic groups, ecosystems, and lifestyles within Africa through analysis of whole-genome sequence data of 21 individuals sequenced at deep coverage. We observe a remarkable correlation among genetic diversity and geographic distance, with the hunter-gatherer groups being more genetically differentiated and having larger effective population sizes throughout most modern-human history. Admixture signals are found between neighbor populations from both hunter-gatherer and agriculturalists groups, whereas North African individuals are closely related to Eurasian populations. Regarding archaic gene flow, we test six complex demographic models that consider recent admixture as well as archaic introgression. We identify the fingerprint of an archaic introgression event in the sub-Saharan populations included in the models (~4.0% in Khoisan, ~4.3% in Mbuti Pygmies, and ~5.8% in Mandenka) from an early divergent and currently extinct ghost modern human lineage. The present study represents an in-depth genomic analysis of a Pan African set of individuals, which emphasizes their complex relationships and demographic history at population level.

ENA-LAST-UPDATE 2019-03-10

Gaska said...

Interesting, Thank you Arza, but I can not enter that link do you know how to access the paper?

Arza said...

It's not published yet. But the BAM files are in the repository already, so it should be published soon.

Gaska said...


Also interesting is the new paper on Anatolia, it clarifies the origin of some mitochondrial haplogroups. Some are seen later in Central Europe and others in Ukraine, confirming some relationship between Anatolia and the steppes

Ancient Mitochondrial Genomes Reveal the Absence of Maternal Kinship in the Burials of Çatalhöyük People and Their Genetic Affinities

Andrzejewski said...

@Arza But Basques are at least 75% farmer aDNA, correct?

So their language must be more farmer shifted than WHG shifted, I would surmise.

Ryan said...

@Andrezjewski - "@Arza But Basques are at least 75% farmer aDNA, correct? So their language must be more farmer shifted than WHG shifted, I would surmise."

Their Y-DNA is more than 90% WHG or EHG though, and Y-DNA seems to go with language more than aDNA.

Now it's possible that the other 10% subjugated and assimilated an early IE substrate. It just seems odd.

Ryan said...

@Ric - Google wave vs tree model. Language classification is based on a tree model. Structure and basic words are more conservative than more extended vocabulary.

For example, most of English vocabulary is of French/Latin origin, but the basic words and structure are Germanic. Hence a Germanic language.

For example, in this sentence the origin of each word is as follows:

For - Germanic
Example - French
In - Germanic
This - Germanic
Sentence - French
The - Germanic
Origin - French
Of - Germanic
Each - Germanic
Word - Germanic
Is - Germanic
As - Germanic
Follows - Germanic

Do you see what I mean? We use the Germanic words a lot more than the French/Latin ones.

JuanRivera said...


H2a and J1b are examples of EEF clades of ultimate Anatolian origin that entered to the steppe mtDNA pool.

Matt said...

Double Iberia papers today huh? The rivalry strikes again.

Ric Hern said...

That Ghost Archaic Modern Human Population introgression seems like interesting stuff....I still wonder about the lack of Archaic Humans in Europe between 300 000 and 250 000 years ago...

Drago said...


Is Olalde out yet too ?

rozenblatt said...

Olalde's paper is out and is available at Reich's website:

Samuel Andrews said...

Btw, the almight G25 PCA picks up Magdaolnian ancestry in Iberian farmers. You have to use the right farmer reference for it to show. Some Iberian farmers have more than others. Iberia_MN has 9% ElMiron stuff. It might be possible to track how much modern Iberians have (a few percent).

JuanRivera said...

This year's gonna be very interesting.

Mouthful said...

Seems like Iberia paper is out.

Gaska said...

Olalde has finally arrived, good news for science in Spain and for all those interested in European genetics

Mouthful said...

Supplementary info.

JuanRivera said...

Interesting how Y-DNA in Iberia changes drastically from C1a2, I2, G2 and H2 to R1b. Coincides with the arrival of steppe ancestry. Also interesting is how mtDNA C4 shows up in the Visigothic period.

Samuel Andrews said...

Unbelivable amount of samples. mtDNA haplogroup H was still at 25% in the Bronze age/Iron age. It reaches 40% in the Middle Ages. That looks like natural selection. I've been saying some kind of natural selection explains high H frequencies in Europe.

JuanRivera said...

I suppose having all those iberians, siberians, Dzudzuana, mesolithic anatolians and proto-indo-iranians will improve modelling by a lot.

JuanRivera said...

And Indus Periphery. And a lot more other samples.

Gaska said...

CHA002 was assigned to haplogroup R1b-M343, which together with an EN individual from Cova de Els Trocs (R1b1a) confirms the presence of R1b in Western Europe prior to the expansion of steppe pastoralists that established a related male lineage in Bronze Age Europe

It seems that the earthquake has arrived, Let's see what happens, at the moment it's very good news

Drago said...

I1 in Mesolithic Iberia ! (MPI paper)

a said...

IMO, my expectations, the anti-steppe crowd will double down on their position[no steppe], even though R1b- Iberia clearly shows steppe.

Drago said...

@ Sam
But weren’t you suggesting mtdna H expanded form Iberia ?
What are your current thoughts ?

Gaska said...

the anti steppe crowd was right, R1b is in western Europe at least from Villabruna, the steppe ancestry is the least.

Gaska said...

@Sam- Unbelivable amount of samples. mtDNA haplogroup H was still at 25% in the Bronze age/Iron age. It reaches 40% in the Middle Ages. That looks like natural selection. I've been saying some kind of natural selection explains high H frequencies in Europe.
the mitochondrial haplogroup H, H1, H3, H6, H7 are documented in Iberia since the Paleolithic, you are right in natural selection because they were imposed on all Neolithic lineages

JuanRivera said...

Pre-steppe R1b only V88 so far. Spike in all other R1b coincident with arrival of steppe ancestry.

JuanRivera said...

Frequencies definitevely rose when steppe ancestry arrived. Local phenomenons aren't responsible, as both known WHG and EEF decreased, while steppe groups were overwhelmingly R1 (with dashes of I2a, Q1a, J* and G2a)

Andrzejewski said...

@JuanRivera "H2a and J1b are examples of EEF clades of ultimate Anatolian origin that entered to the steppe mtDNA pool."

No. It was CHG.

Simon_W said...

"you completely, utterly and totally fail to mention the later Ostrogothic and especially Longobardi conquest which altered the gene pool substantially in Northern Italy."

I was speaking of pre-Roman Northern Italy, thoses Germanic tribes you mention arrived later. And BTW the Ostrogoths were Arian Christians, they were not allowed to marry Catholics, hence they cannot have had a big impact.

Simon_W said...

And, @Andrzejewski

As for the Longobards, in case you're drawing upon the evidence from Collegno, I would be cautious. The locals were indeed rather south Italian- and Greek-like, but they are just a handful and from just one site. Importantly, modern North Italians cannot be properly modeled as a two-way mixture between Germanics and South Italians/Greeks. Because they have more West Anatolian Farmer ancestry than either of these populations. So there's also an Iberian-like element in the mix. Now, CL94 and CL23 have quite a lot of Iberian-like ancestry. They were non-local to Collegno, but who knows where else they came from, they may well have been from somewhere in Northern Italy. In Matt's PCA CL94 is close to modern people from the Aosta Valley and CL23 to modern Lombards. Moreover, there's also the very North Italian-like CL36. He's from a slightly later occupation phase of Collegno, but he doesn't look like a Germanic-South Italian mix. See for instance this Global25/nMonte model:

[1] "distance%=2.4045"



Rather looks like a Celtic/Ligurian/South Italian mix.

