Monday, February 26, 2018

The Yamnaya outlier


For a while now I've been arguing that the more exotic and southern, non-Eastern European Hunter-Gatherer (EHG), part of the Yamnaya genotype mostly made its way into the Pontic-Caspian steppe via female mediated gene flow, probably as a result of the practice of female exogamy with populations in the North Caucasus. For instance:

A plausible model for the formation of the Yamnaya genotype

The genotype data for the ancients that I focused on in that blog entry is now freely available in the Mathieson et al. 2018 dataset via the Reich Lab website (see here). So let's take a closer look at one of these samples, Yamnaya_Ukraine_outlier I1917, easily the most Caucasus/Near Eastern-shifted Yamnaya individual in the ancient DNA record to date.

Yes, it's a female, with a very Near Eastern mtDNA haplogroup to boot, so she cannot, in all likelihood, be the result of mixture between a Caucasus/Near Eastern father and Eastern European mother. This is where she plots in my Principal Component Analysis (PCA) of ancient West Eurasia.


Unlike the rest of the Yamnaya samples, Yamnaya_Ukraine_outlier I1917 is sitting much closer to modern-day North Caucasians, such as Chechens and Lezgins, than to the vast majority of Europeans, ancient and modern-day. Looking at this plot, it's tempting to think that she might represent an as yet unsampled ancient population from the North Caucasus, or nearby southernmost steppes, that once bridged the genetic gap between Eastern Europe and the Caucasus, but has since disappeared from the scene.

Using the Global25/nMonte method, I can try to infer which ancient populations this so called outlier is most closely related to. This might give me some clues as to the origin of at least a part of the southern ancestry in Yamnaya. I've chosen to go with Yamnaya_Samara I0429 as the Yamnaya reference, because he's both one of the oldest and least Caucasus-shifted Yamnaya individuals in my dataset (marked on an earlier version of the above plot here). And at the risk of overfitting, I'll throw in six other reference options. The relevant datasheets are available here and here.

[1] distance%=2.3661 / distance=0.023661

Yamnaya_Ukraine_outlier:I1917

Yamnaya_Samara:I0429 43.95
Armenia_EBA 28.3
Armenia_ChL 9.95
Iran_ChL 8.1
Ukraine_N 5.6
Trypillia 4.1
CHG:KK1 0

It's a fascinating result, even if somewhat overfitted. The Global 25/nMonte method is more sensitive to recent genetic drift than formal statistics-based modeling, so I'm not surprised that Caucasus Hunter-Gatherer (CHG) KK1 is not shown to be an important source of ancestry for Yamnaya_Ukraine_outlier I1917.

Instead, the chronologically more proximate samples from Early Bronze Age and Chalcolithic Transcaucasia (Armenia_EBA and Armenia_ChL, respectively) top the list after Yamnaya_Samara I0429, which does make good sense. Moreover, minor gene flow from Neolithic foragers and farmers from what is now Ukraine (Ukraine_N and Trypillia, respectively) is implied, and obviously this is also very plausible. Thus, it seems likely that population movements from the Armenian Plateau did have an impact on at least a part of the Yamnaya poulation, even if mostly on its maternal side.

Another question worth asking is whether, conversely, people like Yamnaya_Ukraine_outlier I1917 were, by and large, an important source of southern admixture in Yamnaya. At least according to this Global 25/nMonte model, perhaps they were.

[1] distance%=4.6173 / distance=0.046173

Yamnaya_Samara

EHG 43.25
Yamnaya_Ukraine_outlier:I1917 41.95
CHG:KK1 14.8
Armenia_ChL 0
Armenia_EBA 0
Iran_ChL 0
Trypillia 0
Ukraine_N 0

See also...

Unleash the power: Global 25 test drive thread

Another look at the genetic structure of Yamnaya

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, February 23, 2018

A swarm of locusts?


The dam has truly broken. Below is my usual Principal Component Analysis (PCA) of ancient West Eurasian genetic variation, except now also featuring the new samples from Mathieson et al. 2018 and Olalde et al. 2018. Incredibly, there are almost a thousand ancient individuals on this plot. The relevant datasheet is available here.


My imagination is probably running wild from all of this excitement, and I apologize if it is, but I reckon that the "Post-Kurgan expansion Europe" cluster actually looks like it's beginning to swarm all over "Old Europe", much like a swarm of locusts. These are, of course, our Bronze Age ancestors, rich in steppe ancestry and Y-haplogroups R1a and R1b. I reserve judgment on whether that's a good or bad thing.

In any case, note that I highlighted three samples in this analysis. The reason I did this is because I believe that at least two of them might be crucial to understating the Proto-Indo-European (PIE) expansion. I've given hints as to why on the plot. Am I on the right track? Feel free to let me know in the comments below.

See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Who's your (proto) daddy Western Europeans?

Migration of the Bell Beakers—but not from Iberia (Olalde et al. 2018)

Tuesday, February 20, 2018

Migration of the Bell Beakers—but not from Iberia (Olalde et al. 2018)


At last, after many months of waiting, the paper that I've been calling the Bell Beaker Behemoth will finally appear at Nature today or tomorrow, depending on your time zone [Update: the paper is here]. The accompanying dataset is already online, and it's twice as big as what the paper's bioRxiv preprint promised, packing 400 new samples from Neolithic, Copper Age and Bronze Age Europe (freely available via the Reich Lab here).

I'll incorporate these samples into my collection of ancients very shortly, and then put them through their paces in the usual and new ways.

Nevertheless, despite the much larger and more varied new dataset, I know for a fact that the conclusions in the paper are the same as those in the preprint (which we discussed here). The authors tentatively accept the archaeologically-based academic consensus that the Bell Beaker phenomenon originated in Copper Age Iberia. But they admit that they can't find evidence in their data that its expansion across much of the rest of Europe was accompanied by significant gene flow from Iberia.

However, they do see in their data a large-scale migration of Central European Beakers to Western Europe around 2500 BC, bringing with them, amongst other things, steppe or Yamnaya-related admixture to the region for the first time. Many of the new samples are from the British Isles - where the impact of this migration was profound, resulting in roughly a 90% turnover of the population - and they appear to have been collected specifically to reaffirm this conclusion.

How exactly this massive population turnover came about isn't yet known. But early indications from other parts of Europe, where similar population shifts have been inferred from ancient DNA for the Late Neolithic/Early Bronze Age period, are that plague epidemics and deadly violence may have been important factors (see here and here).

I don't have a strong opinion about the place of origin of the Beaker cultural package, and I don't find the Iberian model entirely satisfying, mostly because it doesn't gel with the latest ancient DNA data. On the other hand, I've made up my mind as to who the Central European Beakers rich in steppe ancestry and also Y-haplogroup R1b-M269 were, and you can read about that here.

What are your thoughts after looking over the new samples? It's a big dataset alright, but does it do justice to the massive and complex Bell Beaker phenomenon? If not, then what's missing? Who's actually happy that the puzzle of the origin of the Beakers has now been solved? Feel free to let me know in the comments below.

Update 21/02/2018: I've updated my Global 25 datasheets with most of the ancient samples from Olalde et al. 2018 and Mathieson et al. 2018 (see list here).

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient

See also...

Who's your (proto) daddy Western Europeans?

Sunday, February 18, 2018

C for Cheddar Man (?)


A new preprint has just appeared at bioRxiv on the Mesolithic to Neolithic transition and resulting massive population shift in Britain. It features genome-wide data from six Mesolithic and 67 Neolithic individuals, including the famous Cheddar Man.

Population Replacement in Early Neolithic Britain by Brace et al.

The peculiar thing about this preprint is that it doesn't list the Y-haplogroups of the male ancients. However, it's been rumored for a while that Cheddar Man belongs to Y-haplogroup C (for instance, see here). Has this now been confirmed officially anywhere?

On a related note, the guys at DNAGeeks have been working on a range of Cheddar Man products (see here). So for a few bucks you can get yourself a Cheddar Man tee or wall print based on this arty depiction of the Mesolithic British forager. Yes, his resemblance to pop icon Prince is indeed uncanny.


Thursday, February 15, 2018

Modeling genetic ancestry with Davidski: step by step


There are many different ways to model your genetic ancestry but I prefer the Global25/nMonte method. This is a step by step guide to modeling ancient ancestry proportions with this simple but powerful method using my own genome.


As far as I know, the vast majority of my recent ancestors came from the northern half of Europe. This may or may not be correct, but it gives me somewhere to start, so that I can come up with a coherent model. If you don't have this sort of information, because, perhaps, you were adopted, then just look in the mirror, and work from there. Like I say, it's not imperative that you know anything whatsoever about your ancestry, because your genetic data will do the talking, but you do need a model when modeling.

In scientific literature nowadays Northern Europeans are often described as a three-way mixture between Yamnaya-related pastoralists, Anatolian-derived early farmers, and Western European Hunter-Gatherers (WHG). So let's see if this model works for me. Obviously, if it does, then it'll confirm the information that I have about my origins, but it might also reveal finer details that I'm not aware of. The datasheet that I'm using for this model is available here.

[1] distance%=6.9025 / distance=0.069025

Davidski

Yamnaya_Samara 53.9
Barcin_N 30.75
Rochedane 15.35
Tepecik_Ciftlik_N 0

Yep, the model does work, with a fairly reasonable distance of almost 7%. The ancestry proportions more or less match those from scientific literature and the plethora of analyses that I've featured at this blog on the topic. Please note that I've kept things very simple, using only four reference populations and individuals as proxies for four distinct streams of ancestry. But I've put my own twist on this Neolithic/Bronze Age model by including two populations from Neolithic Anatolia (Barcin_N and Tepecik_Ciftlik_N), just to see what would happen. The WHG proxy is Rochedane.

Admittedly, though, my Yamnaya cut of ancestry appears somewhat bloated at over 53%, and the model's distance is a little higher than what I normally see for really strong models. So let's check if I can get a better fitting and more sensible result by adding a slightly more easterly forager proxy than Rochedane: Narva_Lithuania.

[1] distance%=5.9331 / distance=0.059331

Davidski

Yamnaya_Samara 45.75
Barcin_N 31.45
Narva_Lithuania 22.8
Rochedane 0
Tepecik_Ciftlik_N 0

The statistical fit does improve, and when given a choice between Rochedane and Narva_Lithuania, the algorithm picks the latter as the only source of extra forager input in my genome.

What could this mean? It might mean that a large part of my ancestry derives from the Baltic region. Actually, I know for a fact that this is true. But even if I had no idea about my genealogy, this result would be a very strong hint about my genetic origins. Indeed, let's follow this trail and try to further improve the fit of the model by adding a more relevant Yamnaya-related proxy, such as early Baltic Corded Ware (CWC_Baltic_early).

[1] distance%=5.444 / distance=0.05444

Davidski

CWC_Baltic_early 54.95
Barcin_N 26.7
Narva_Lithuania 18.35
Rochedane 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Holy shit! To be honest, I wasn't expecting this sort of resolution and accuracy, and I can't promise that everyone using the Global25/nMonte method will see such incredibly nuanced outcomes, but this isn't a fluke. It can't be, because it gels so well with everything that I know about my ancestry. Please note also that I belong to Y-chromosome haplogroup R1a-M417, which is a lineage intimately associated with the Corded Ware expansion across Northern Europe (for instance, see here).

But of course, the Baltic and nearby regions haven't been isolated from migrations and invasions since the Corded Ware times. For instance, at some point, probably during the Bronze Age, Uralic-speaking groups moved west across the forest zone of Northeastern Europe and into the East Baltic and northern Scandinavia. It's generally accepted that they brought Siberian admixture with them (see here). Moreover, from the Iron Age to the Middle Ages, East Central Europe was under intense pressure from a wide range of nomadic steppe groups with complex ancestry, such as the Sarmatians, Avars, Huns, and Mongolians. Did any of these peoples leave their mark on my genome? At the risk of overfitting the model, let's explore this possibility by adding a few more reference populations.

[1] distance%=5.444 / distance=0.05444

Davidski

CWC_Baltic_early 54.95
Barcin_N 26.7
Narva_Lithuania 18.35
Han 0
Mongolian 0
Nganassan 0
Rochedane 0
Sarmatian_Pokrovka 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Nothing changes when I add the Han Chinese, Mongolians, Nganassans (a Uralic group from Siberia), and Sarmatians to the model. But what about if I throw in the only ancient Slav in my datasheet?

[1] distance%=2.9904 / distance=0.029904

Davidski

Slav_Bohemia 85.9
CWC_Baltic_early 7.7
Narva_Lithuania 6.4
Barcin_N 0
Rochedane 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Considering that the vast majority of my recent ancestors were Poles, thus a Slavic-speaking people from near the Baltic, this outcome makes perfect sense. And check out the new distance! But the problem now is that I'm overfitting the model by using two very similar and probably very closely related references, CWC_Baltic_early and Slav_Bohemia. And overfitting should be avoided at all costs. So it might be useful to break up this effort into two models: one focusing on the Neolithic and Bronze Age, and the other on the Iron Age and Middle Ages. I'll do that soon, but not just yet, because there are still too few Iron Age and Medieval samples available from the Baltic region and surrounds for meaningful analyses of this type.

See also...

Genetic ancestry online store (to be updated regularly)

Tuesday, February 6, 2018

Mitogenomes from the Iron Age South Baltic (Stolarek et al. 2018)


Over at Scientific Reports at this LINK. And yes, full genomes of many of the samples are on the way. Emphasis is mine:

Abstract: Despite the increase in our knowledge about the factors that shaped the genetic structure of the human population in Europe, the demographic processes that occurred during and after the Early Bronze Age (EBA) in Central-East Europe remain unclear. To fill the gap, we isolated and sequenced DNAs of 60 individuals from Kowalewko, a bi-ritual cemetery of the Iron Age (IA) Wielbark culture, located between the Oder and Vistula rivers (Kow-OVIA population). The collected data revealed high genetic diversity of Kow-OVIA, suggesting that it was not a small isolated population. Analyses of mtDNA haplogroup frequencies and genetic distances performed for Kow-OVIA and other ancient European populations showed that Kow-OVIA was most closely linked to the Jutland Iron Age (JIA) population. However, the relationship of both populations to the preceding Late Neolithic (LN) and EBA populations were different. We found that this phenomenon is most likely the consequence of the distinct genetic history observed for Kow-OVIA women and men. Females were related to the Early-Middle Neolithic farmers, whereas males were related to JIA and LN Bell Beakers. In general, our findings disclose the mechanisms that could underlie the formation of the local genetic substructures in the South Baltic region during the IA.

Stolarek et al., A mosaic genetic structure of the human population living in the South Baltic region during the Iron Age, Scientific Reportsvolume 8, Article number: 2455 (2018) doi:10.1038/s41598-018-20705-6

Friday, February 2, 2018

Early Baltic Corded Ware samples form a genetic clade with Yamnaya, but...


This is what Mittnik et al. 2018 say about a couple of their Corded Ware, or Baltic Late Neolithic (Baltic_LN), samples from what is now Lithuania:

Computing D-statistics for each individual of the form D(Baltic LN, Yamnaya; X, Mbuti), we find that the two individuals from the early phase of the LN (Plinkaigalis242 and Gyvakarai1, dating to ca. 3200–2600 calBCE) form a clade with Yamnaya (Supplementary Table 7), consistent with the absence of the farmer-associated component in ADMIXTURE (Fig. 2b). Younger individuals share more alleles with Anatolian and European farmers (Supplementary Table 7) as also observed in contemporaneous Central European CWC individuals [2].

We can add a third early Baltic Corded Ware sample, Latvia_LN1, to this list, because this individual was also shown to lack the above mentioned farmer-associated component in ADMIXTURE by Jones et al. 2017.

However, in my Principal Component Analysis (PCA) of ancient West Eurasia, all three samples fall just "northwest" of Yamnaya, along with one German Corded Ware outlier, and form a separate cluster that is shifted slightly closer to European hunter-gatherers and farmers. Hence, Plinkaigalis242 and Gyvakarai1 only form a clade with Yamnaya to the limit of the resolution in the analysis by Mittnik et al., but aren't exactly identical to Yamnaya. The relevant datasheet is available here.


So what might this mean? Possibly that the ancestors of this Corded Ware trio "absorbed" trace forager and/or farmer admixture as they migrated from the Pontic-Caspian steppe to the East Baltic. Or it could mean that they came from a more westerly part of the Pontic-Caspian steppe where people harbored slightly elevated forager and/or farmer ancestry relative to Yamnaya.

More sampling of Eneolithic and Early Bronze Age (EBA) burial sites on the Pontic-Caspian steppe, particularly north of the Black Sea, will probably solve this mystery. Please note, however, that we already have an Eneolithic sample from the Pontic-Caspian steppe that not only packs extra farmer admixture over Yamnaya, but also belongs to Y-haplogroup R1a-M417, which is a marker intimately associated with the Corded Ware expansion (see here).

By the way, this is how the Corded Ware set from Mittnik et al. behaves in another of my PCA, which is designed to focus on entho-linguistic-specific genetic drift in Northern Europe. I don't usually run samples older than the Bronze Age in this analysis, the reason being that they often don't share enough genetic drift with modern-day Europeans to produce meaningful output. And to be honest, I'm not quite sure what to make of these results. But it's probably not a coincidence that the Scandinavian Corded Ware (CWC_Battle_Axe) individual clusters so strongly with the Nordic Iron Age and modern-day Scandinavian samples. The relevant datasheet is here.


See also...

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Modern-day Poles vs Bronze Age peoples of the East Baltic

The genetic history of Northern Europe (or rather the South Baltic)