search this blog

Thursday, March 21, 2019

Ancient island hopping in the western Mediterranean (Fernandes et al. 2019 preprint)


Over at bioRxiv at this LINK. Here's the abstract, emphasis is mine:
A series of studies have documented how Steppe pastoralist-related ancestry reached central Europe by at least 2500 BCE, while Iranian farmer-related ancestry was present in Aegean Europe by at least 1900 BCE. However, the spread of these ancestries into the western Mediterranean where they have contributed to many populations living today remains poorly understood. We generated genome-wide ancient DNA from the Balearic Islands, Sicily, and Sardinia, increasing the number of individuals with reported data from these islands from 3 to 52. We obtained data from the oldest skeleton excavated from the Balearic islands (dating to ~2400 BCE), and show that this individual had substantial Steppe pastoralist-derived ancestry; however, later Balearic individuals had less Steppe heritage reflecting geographic heterogeneity or immigration from groups with more European first farmer-related ancestry. In Sicily, Steppe pastoralist ancestry arrived by ~2200 BCE and likely came at least in part from Spain as it was associated with Iberian-specific Y chromosomes. In Sicily, Iranian-related ancestry also arrived by the Middle Bronze Age, thus revealing that this ancestry type, which was ubiquitous in the Aegean by this time, also spread further west prior to the classical period of Greek expansion. In Sardinia, we find no evidence of either eastern ancestry type in the Nuragic Bronze Age, but show that Iranian-related ancestry arrived by at least ~300 BCE and Steppe ancestry arrived by ~300 CE, joined at that time or later by North African ancestry. These results falsify the view that the people of Sardinia are isolated descendants of Europe's first farmers. Instead, our results show that the island's admixture history since the Bronze Age is as complex as that in many other parts of Europe.

Fernandes et al., The Arrival of Steppe and Iranian Related Ancestry in the Islands of the Western Mediterranean, bioRxiv, posted March 21, 2019, doi: https://doi.org/10.1101/584714

Monday, March 18, 2019

Open thread: What are the linguistic implications of Olalde et al. 2019?


I was going to write a huge post on the linguistic implications of the latest batch of ancient DNA from Iberia courtesy of Olalde et al. 2019, and then I thought better of it. Admittedly, I don't know enough about the languages of prehistoric Iberia to say anything really useful on the topic. So instead here's an open thread to bounce around a few ideas in the comments.

Just briefly, this is what Olalde et al. say in the abstract of their paper about the relationship between ancestry from the Pontic-Caspian steppe and languages in Iron Age Iberia:

We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry. We show that, in the Iron Age, Steppe ancestry had spread not only into Indo-European–speaking regions but also into non Indo-European–speaking ones, and we reveal that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia.

However, in the paper it's revealed that "Indo-European regions" actually refers to a Celtic-speaking part of northern Iberia. And it's quite possible that Celts moved into this area from outside of Iberia only during the Iron Age. In other words, the speakers of Indo-European languages here may not have been the descendants of any of the people with steppe ancestry who came to Iberia by ~2000 BCE.

So I'm probably not alone in thinking that the question of the linguistic affinities of these early migrants with steppe ancestry to Iberia (mostly associated with the Bell Beaker culture or BBC) remains open, especially since they evidently had such a profound genetic impact on the later non Indo-European-speaking populations of southern Iberia. Could they have been the speakers of unattested Indo-European languages, as well as Proto-Iberian and Proto-Basque? If not, why not?

Below is a Principal Component Analysis (PCA) of West Eurasian genetic variation. I highlighted some of the ancient samples from Olalde et al., as well as Basques and other present-day Iberians. The Basques form a tight cluster with most of the Copper, Bronze and Iron Age Iberians, and, unlike the other present-day Iberians, they basically look like an Iberian population from the metal ages. The relevant datasheet is available here.


This is nothing new and very much in line with the results in Olalde et al., but I wanted to emphasize the point that Basques were not just a group that experienced an extreme founder effect in R1b-P312, which is a Beaker-specific Y-chromosome lineage. Rather, they're still very similar to Iberian Beakers in terms of overall genetic structure. So where did they get their language?

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Saturday, March 16, 2019

Let's try a formal heuristic approach


I created a massive outgroup f3-statistics matrix, featuring almost 300 ancient and present-day populations and individuals, for the purpose of running unsupervised, or at least semi-supervised, fine scale mixture tests with nMonte. Most of the stats were computed with 400-900K SNPs, which is a lot and should provide plenty of power. The matrix is available in a zip file here.

The results I'm getting with this new setup are very similar to those obtained with the Global25. The main differences, as far as I can see for now, are that the f3 data produce more stable results when modeling very deep ancestry, while the Global25 provides more accuracy when modeling fine scale recent ancestry (probably because it's better at picking up more recent genetic drift).

Let's investigate some pertinent issues with the new data using nMonte and PAST. How about we start with these?

- where did Bell Beakers get their steppe ancestry from?

- which Steppe_MLBA group did Indians get their steppe ancestry from?

- do the present-day Irish have any Hallstatt ancestry?

- what is the origin of present-day Basques?

- what is the precise ancestry of Armenia_ChL?

- do the Swat Iron Age samples really lack BMAC ancestry?

- does Anatolia_MLBA really lack steppe ancestry?

Note that the f3 matrix includes the ancients from the new Olalde et al. paper on the genomic history of Iberia (see here). I've also updated the Global25 datasheets with most of these samples.

Global 25 datasheet (scaled)

Global 25 pop averages (scaled)

Global 25 datasheet

Global 25 pop averages

By the way, Hajji_Firuz_ChL I2327, from Narasimhan et al. 2018, is now labeled Hajji_Firuz_IA in the above datasheets, because my understanding is that he's actually from the Iron Age rather than the Chalcolithic period. For background reading about this controversial sample see here and here. I don't have any more info on this topic; we'll just have to wait for the formal publication of the Narasimhan et al. manuscript to get all the details. Apparently it's coming very soon.

See also...

An exceptional burial indeed, but not that of an Indo-European

Maykop: a multi-ethnic layer cake?

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Thursday, March 14, 2019

Two new papers on ancient Iberia


Olalde et al. 2019 (Science) at this LINK...

Abstract: We assembled genome-wide data from 271 ancient Iberians, of whom 176 are from the largely unsampled period after 2000 BCE, thereby providing a high-resolution time transect of the Iberian Peninsula. We document high genetic substructure between northwestern and southeastern hunter-gatherers before the spread of farming. We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry. We show that, in the Iron Age, Steppe ancestry had spread not only into Indo-European–speaking regions but also into non-Indo-European–speaking ones, and we reveal that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia. Additionally, we document how, beginning at least in the Roman period, the ancestry of the peninsula was transformed by gene flow from North Africa and the eastern Mediterranean. DOI: 10.1126/science.aav4040

Villalba-Mouco et al. 2019 (Current Biology) at this LINK...

Summary: The Iberian Peninsula in southwestern Europe represents an important test case for the study of human population movements during prehistoric periods. During the Last Glacial Maximum (LGM), the peninsula formed a periglacial refugium [1] for hunter-gatherers (HGs) and thus served as a potential source for the re-peopling of northern latitudes [2]. The post-LGM genetic signature was previously described as a cline from Western HG (WHG) to Eastern HG (EHG), further shaped by later Holocene expansions from the Near East and the North Pontic steppes [3, 4, 5, 6, 7, 8, 9]. Western and central Europe were dominated by ancestry associated with the ∼14,000-year-old individual from Villabruna, Italy, which had largely replaced earlier genetic ancestry, represented by 19,000–15,000-year-old individuals associated with the Magdalenian culture [2]. However, little is known about the genetic diversity in southern European refugia, the presence of distinct genetic clusters, and correspondence with geography. Here, we report new genome-wide data from 11 HGs and Neolithic individuals that highlight the late survival of Paleolithic ancestry in Iberia, reported previously in Magdalenian-associated individuals. We show that all Iberian HGs, including the oldest, a ∼19,000-year-old individual from El MirĂ³n in Spain, carry dual ancestry from both Villabruna and the Magdalenian-related individuals. Thus, our results suggest an early connection between two potential refugia, resulting in a genetic ancestry that survived in later Iberian HGs. Our new genomic data from Iberian Early and Middle Neolithic individuals show that the dual Iberian HG genomic legacy pertains in the peninsula, suggesting that expanding farmers mixed with local HGs. DOI:https://doi.org/10.1016/j.cub.2019.02.006

See also...

Migration of the Bell Beakers—but not from Iberia (Olalde et al. 2018)

Single Grave > Bell Beakers

CHG or no CHG in Bronze Age western Iberia?

Thursday, March 7, 2019

A challenge


The datasheets below contain outgroup f3-statistics for a wide range of ancient and present-day populations. Five of the ancient groups and individuals are labeled "Unknown". In fact, I do know what they are, but I'd like you to try and work out whether they were the speakers of Indo-European or non-Indo-European languages by analyzing the datasheets with, say, PAST or nMonte.

f3-stats_language_challenge1.dat

f3-stats_language_challenge2.dat

I'll reveal the identities and likely languages of the mystery ancients in a couple of days. It'll be interesting to see if any of you nail this challenge. It shouldn't be too difficult, but to help things along, I color coded the populations in the datasheets (black = Indo-European, blue = Uralic, and grey = neither). If you haven't done this sort of thing before, these blog posts might be useful as background reading.

Maykop: a multi-ethnic layer cake?

Global25 PAST-compatible datasheets

D-stats/nMonte open thread

Update 09/03/2019: Samuel nailed the challenge in the first post below. And then Matt almost figured out the precise identities of the mystery ancients here. In hindsight I should've made this more difficult. Here are the answers:

Unknown1 = England_Anglo-Saxon (Indo-European) > more here
Unknown2 = Levanluhta_IA (non-Indo-European) > more here
Unknown3 = Minoan_Lasithi (non-Indo-European) > more here
Unknown4 = Slavic_Bohemia (Indo-European) > more here
Unknown5 = Turkmenistan_IA (Indo-European) > more here

Monday, March 4, 2019

An exceptional burial indeed, but not that of an Indo-European


Not too many people have been buried sitting on wagons. The most famous case is that of an Early Bronze Age man who, considering his injuries, may have died in a high-speed crash - high-speed for its time anyway - on the Pontic-Caspian steppe in Eastern Europe.

It's likely that this guy was one of the very first wagon-drivers in human history, because his four-wheeled wooden model is dated to 3336-3105 calBCE, which makes it the oldest wagon discovered thus far. His genotype data, under the label Steppe Maykop SA6004, were published recently along with Wang et al. 2019.

Early wagons are very important for a couple of reasons: they revolutionized human transport and warfare, and they're often closely associated with the prehistoric expansions of Indo-European languages.

So I'm pretty sure that many of you must be thinking right now that wagon-driver SA6004 was an early Indo-European, or even a Proto-Indo-European! I bet that's what Wang et al. thought too, considering the conclusion in their paper. But, alas, the chances of this are slim to none.

Steppe Maykop samples show rather peculiar genetic structure considering their geographic origin, with a large proportion of their ancestry deriving from a source closely related to western Siberian hunter-gatherers (aka West_Siberia_N in the ancient DNA record). Indeed, SA6004 basically looks like a 50/50 mix between West_Siberia_N and Piedmont_Eneolithic. Here's a map with all of the relevant details.


Thus, clearly, the Steppe Maykop population wasn't ancestral or even directly related to the steppe and steppe-derived groups generally regarded to have been Indo-European speaking, such as those associated with the Yamnaya, Corded Ware, and Bell Beaker cultures. That's because these groups lack any discernible West_Siberia_N-related ancestry.

It also wasn't ancestral or directly related to any present-day or currently sampled ancient Indo-European speaking populations, again because these populations basically lack West_Siberia_N-related ancestry.

On the other hand, Yamnaya, Corded Ware and other closely related groups show an exceptionally strong genetic relationship with Indo-European speakers, especially those from across Northern Europe, which experienced massive migrations from the Pontic-Caspian steppe during the late Neolithic period, and hardly anything from elsewhere since then.

Case in point, the samples from Wang et al. labeled Yamnaya Caucasus were recovered from the same area of the Pontic-Caspian as their Steppe Maykop samples, and yet, take a look at this linear model based on outgroup f3-statistics. Steppe Maykop does show high genetic affinity to Indo-European speakers (no doubt mediated via its Piedmont_Eneolithic-related ancestry), but, unlike Yamnaya Caucasus, it also shows unusually high affinity for a West Eurasian population to Native Americans and Siberians. The relevant datasheet is available here.
So the only way that the Steppe Maykop population was Indo-European-speaking, was if it inherited its Indo-European speech from its Piedmont_Eneolithic-related ancestors. And even if it was Indo-European-speaking, it probably spoke an extinct Indo-European language not closely related to any extant Indo-European languages. In other words, the possibility that Steppe Maykop passed on its language to Yamnaya, along with its wagons, is close to zero. More likely, Yamnaya stole a few wagons from Steppe Maykop, and the rest is history.

See also...

The Steppe Maykop enigma

On Maykop ancestry in Yamnaya

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Saturday, March 2, 2019

Maykop: a multi-ethnic layer cake?


Let's speculate about the linguistic affinities of the currently available ancient populations from the Caucasus and surrounds. I put together a series of outgroup f3-stats to help things along. They're available for download here.

Maykop
Georgian 0.258224
Abkhasian 0.257899
Latvian 0.257376
Swedish 0.257301
Turkish_Trabzon 0.256996
Basque_Spanish 0.256589
Chechen 0.256514
Icelandic 0.256418
Norwegian 0.256325
Lezgin 0.256272
Irish 0.256227
Tabasaran 0.256092
Italian_Bergamo 0.25605
English_Cornwall 0.256032
Polish_East 0.255991
Scottish 0.255955
Adygei 0.255913

Steppe_Maykop
Latvian 0.261845
Russian_North 0.26145
Estonian 0.260355
Finnish 0.260211
Lithuanian 0.260072
Udmurd 0.259804
Ingrian 0.259663
Surui 0.259637
Vepsa 0.259608
Karelian 0.259532
Karitiana 0.259482
Russian_West 0.259397
Russian_Central 0.259274
Wichi 0.259106
Saami 0.258982
Komi 0.258945
Icelandic 0.258854
Swedish 0.258814
Mordovian 0.258604
Irish 0.25859

Eyeballing the stats might be enough to get a general impression about what they mean, but to understand them properly it's necessary to get technical with something like PAST3 (see here). That's because f3-stats pick up shared genetic drift from all drift paths, and don't especially focus on more recently shared ancestry. This can often lead to confusing outcomes.

Below are a few examples of linear models based on my f3-stats. Note that many Indo-European speakers, especially from Northern Europe, are foremost attracted to ancient samples from the Pontic-Caspian steppe. On the other hand, non-Indo-European speakers, from such far flung locations as the Caucasus and Iberia, show relatively stronger affinity to ancient samples from Anatolia and the Caucasus. Moreover, Uralic speakers show elevated affinity to ancient hunter-gatherer samples from Eastern Europe and Siberia. Makes sense, right?
Based on these and other data, I'd say that Maykop and the culturally related Steppe Maykop were something of a multi-ethnic polity, with many near and far related languages spoken by its people, including perhaps Kartvelian, Northwest Caucasian, Yeniseian and Indo-European. But it seems to me that Proto-Indo-European was spoken by steppe foragers turned pastoralists just outside of the Maykop zone. And I'm quite sure that after the Maykop collapse various early Indo-European groups pushed across the Caucasus and deep into the Near East. Just take a look at the f3-stats and linear model for Hajji_Firuz_BA to see what I mean.

See also...

An exceptional burial indeed, but not that of an Indo-European

The Steppe Maykop enigma

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, February 27, 2019

The Steppe Maykop enigma


Who were the Steppe Maykop people exactly? Their ancestry must surely rank as one of the biggest surprises served up by ancient DNA to date.

I always thought that they'd turn out roughly like a mixture between populations associated with the Kura-Araxes and Yamnaya cultures (mostly because their territory was located sort of in between them). Nope, that wasn't even close. This is where they cluster compared to Kura-Araxes and Yamnaya samples in my Principal Component Analysis (PCA) of world-wide genetic variation: the Global25.
To explore the ancestry of the Steppe Maykop people in more detail I ran a couple of unsupervised Global25/nMonte tests, using basically every ancient population in the (scaled) Global25 datasheet that seemed chronologically sensible and even remotely relevant. I narrowed things down to these two mixture models.

Steppe_Maykop
Geoksiur_Eneolithic,11.2
Piedmont_Eneolithic,44.4
West_Siberia_N,44.4
distance%=1.5161

Steppe_Maykop
Piedmont_Eneolithic,46.6
Sarazm_Eneolithic,10.4
West_Siberia_N,43
distance%=1.6408

But, you might say, Global25/nMonte isn't a published analytical method and it doesn't run on formal statistics, the meat and potatoes of ancient DNA papers. OK then, let's try the same models with the qpAdm software, which is a published method and does run on formal statistics, using exactly the same samples.

Steppe_Maykop
Geoksiur_Eneolithic 0.100±0.032
Piedmont_Eneolithic 0.433±0.053
West_Siberia_N 0.467±0.028
chisq 19.155
tail prob 0.159096
Full output

Steppe_Maykop
Piedmont_Eneolithic 0.429±0.051
Sarazm_Eneolithic 0.119±0.033
West_Siberia_N 0.452±0.026
chisq 18.090
tail prob 0.202699
Full output

They're basically identical. Importantly, my models must reflect reality at some level, because otherwise I wouldn't be able to produce a pair of essentially identical results using such vastly different statistical methods. So the pertinent question is what do these results actually mean?

It seems unlikely to me that we're dealing here with a highly complex three-way mixture process, involving populations from such far flung locations as western Siberia and southern Central Asia. Rather, I suspect that Steppe Maykop was the result of a two-way mixture between Piedmont_Eneolithic (the population that lived before it on the steppe north of the Caucasus) and someone just a little bit more easterly. I'm guessing that the latter was the (as yet unsampled) population associated with the Kelteminar archeological culture.


By the way, please note that Piedmont_Eneolithic is made up of samples from two different locations on the Piedmont steppe, and I occasionally treat them as separate populations labeled Progress_Eneolithic and Vonyuchka_Eneolithic (for instance, see here).

Update 28/02/2019: Below is a PCA focusing on West Eurasian genetic variation. Overall, the position of Steppe Maykop relative to Geoksiur_Eneolithic, Piedmont_Eneolithic and West_Siberia_N appears to reflect my nMonte and qpAdm models. However, as per our discussion in the comments, one of the Steppe Maykop individuals (the most southerly one in the PCA) probably also has recent ancestry from the Caucasus.

See also...

An exceptional burial indeed, but not that of an Indo-European

Maykop: a multi-ethnic layer cake?

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Monday, February 25, 2019

All quiet on the eastern front


I put together this quick and dirty qpGraph tree just to double check what the Eneolithic trio from the Piedmont steppe (Piedmont_Eneolithic) were roughly made of, and how they related to some of the other populations from the eastern half of ancient West Eurasia. The relevant graph file is available here.


Yep, the tree basically lines up with scientific literature. In other words, Piedmont_Eneolithic appears to be a two-way mixture of populations very closely related to Caucasus and Eastern European Hunter-Gatherers (CHG and EHG, respectively). Good to know.

By the way, please note that Piedmont_Eneolithic is made up of samples from two different locations on the Piedmont steppe, and I occasionally treat them as separate populations labeled Progress_Eneolithic and Vonyuchka_Eneolithic (for instance, see here).

Update 28/02/2019: Below is a new version of the tree designed specifically to investigate whether the ancestry of Piedmont_Eneolithic can be modeled with admixture from Darkveti-Meshoko, a population from the Caucasus roughly contemporaneous with Piedmont_Eneolithic. This doesn't appear to be the case, at least not with this topology, because the mixture edge from the Darkveti-Meshoko-related D7 node to the Piedmont_Eneolithic-related E4 node is marked with a zero. The relevant graph file is available here.


See also...

Big deal of 2018: Yamnaya not related to Maykop

Yamnaya isn't from Iran just like R1a isn't from India

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Saturday, February 23, 2019

Catacomb > Armenia_MLBA


It's now clear, thanks to ancient DNA, that Transcaucasia and surrounds were affected by multiple, and at times significant, population movements from Eastern Europe during the Chalcolithic and Bronze Age periods. Based on the ancient samples from what is now Armenia, I'd say that this process peaked during the Middle Bronze Age. But who exactly were the people who perhaps swarmed south of the Caucasus at this time?

The most likely suspects are the various groups that occupied the southernmost parts of the Pontic-Caspian steppe throughout the Bronze Age. They were associated with the so called Catacomb, Kubano-Tersk and Yamnaya archeological cultures. Below is a Principal Component Analysis (PCA) that compares samples from these cultures with those from Middle to Late Bronze Age Armenia (labeled Armenia_MLBA). The relevant datasheet is available here.


Note that Armenia_MLBA forms a cline that appears to be stretching out towards the Catacomb, Kubano-Tersk, Yamnaya and other Bronze Age steppe groups, and this suggests that it harbors significant and probably recent steppe-related ancestry. But PCA plots based on just two dimensions of genetic variation can be misleading at times, so let's check this out with some formal mixture models using qpAdm.

Armenia_MLBA
Catacomb 0.234±0.028
Kura-Araxes_Kaps 0.766±0.028
chisq 10.723
tail prob 0.826248
Full output

Armenia_MLBA
Kubano-Tersk 0.254±0.030
Kura-Araxes_Kaps 0.746±0.030
chisq 13.535
tail prob 0.633284
Full output

Armenia_MLBA
Kura-Araxes_Kaps 0.768±0.028
Yamnaya_Kalmykia 0.232±0.028
chisq 14.454
tail prob 0.564954
Full output

Armenia_MLBA
Kura-Araxes_Kaps 0.762±0.029
Yamnaya_Caucasus 0.238±0.029
chisq 15.916
tail prob 0.458816
Full output

All of these models are statistically very sound, and even though I ranked the results by "tail prob", there's nothing in the output that clearly points to any one of the southern steppe groups as the obvious source of the steppe-related ancestry in Armenia_MLBA. But, interestingly, Catacomb tops the ranking, and it probably also makes the most sense based simply on Carbon-14 chronology. So, for now, I'm going with Catacomb.

I didn't get a chance yet to investigate this issue in detail with the Global25. Does it contradict the results from my PCA and qpAdm analyses? If anyone reading this would like to take a close look that'd be great. Feel free to post your findings in the comments below. And if the answer is indeed Catacomb, then what language did these Catacomb-derived migrants, or perhaps invaders, speak? If not proto-Armenian then what?

By the way, please be aware that the Kubano-Tersk samples in my analyses are the same individuals as those featured in Wang et al. 2019 under the label "North Caucasus".

On a related note, here are a couple of intriguing qpAdm models that I came up with recently for the five Hittite era Anatolians in my dataset (aka Anatolia_MLBA). I don't have a clue why these models work so well and what they mean exactly. They do suggest, however, that the Hittite era Anatolians harbor steppe-related ancestry, which may have been mediated via populations from the Caucasus similar to Armenia_MBA. But, then again, this might just be an artifact of trying to model several streams of ancestry, coming from various directions, with just two and three potential mixture sources. Any thoughts?

Anatolia_MLBA
Anatolia_EBA_Ovaoren 0.651±0.109
Armenia_MBA 0.174±0.063
Peloponnese_N 0.175±0.058
chisq 8.321
tail prob 0.91027
Full output

Anatolia_MLBA
Anatolia_EBA_Isparta 0.831±0.053
Armenia_MBA 0.169±0.053
chisq 16.170
tail prob 0.441163
Full output

See also...

Steppe ancestry in Chalcolithic Transcaucasia (aka Armenia_ChL explained)

Likely Yamnaya incursion(s) into Northwestern Iran

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...