search this blog

Sunday, November 10, 2019

Open analysis and discussion thread: Etruscans, Latins, Romans and others


I've just added the coordinates for more than 100 ancient genomes from the recently published Antonio et al. ancient Rome paper to the Global25 datasheets. Look for the population and individual codes listed here. Same links as always:

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient

Thus far I've only managed to check a handful of the coordinates, so please let me know if you spot any issues. Below is a Principal Component Analysis (PCA) featuring ten of the genomes belonging to Etruscan and Italic speakers. I ran the PCA with an online tool specifically designed for Global25 coordinates freely available here.

Can we say anything useful about the origins of the Etruscan and early Italic populations thanks to these new genomes? Also, to reiterate my question from the last blog post, what are the genetic differences exactly between the Etruscans, early Latins, Romans and present-day Italians? Feel free to let me know in the comments below.

See also...

Getting the most out of the Global25

Thursday, November 7, 2019

What's the difference between ancient Romans and present-day Italians?


The first paper on the genomics of ancient Romans was finally published today at Science [LINK]. It's behind a paywall, but the supplementary info is freely available here. Below is a quick summary of the results courtesy of the accompanying Ancient Rome Data Explorer.



I'm told that the genotype data from the paper will be online within a day or so at the Pritchard Lab website here. I'll have a lot more to say about ancient Romans and present-day Italians after I get my hands on it.

See also...

Open analysis and discussion thread: Etruscans, Latins, Romans and others

Tuesday, November 5, 2019

Modeling your ancestry has never been easier


An exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is freely available HERE. It works offline too, after downloading the web page onto your computer. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen cap below shows me doing just that.


Another free, easy to use online tool that works with Global25 coordinates is the Principal Component Analysis (PCA) runner HERE. Below is a screen cap of me checking out one of the eight PCA that it offers.


See also...

Getting the most out of the Global25

Wednesday, October 16, 2019

The Battle Axe people came from the steppe (Malmstrom et al. 2019)


It's been obvious for a while now that the Corded Ware culture (CWC) and its Scandinavian variant, the Battle Axe culture (BAC), originated on the Pontic-Caspian steppe. However, Malmstrom et al. drive the point home in a new open access paper at Proceedings B [LINK]. From the paper, emphasis is mine:

The Neolithic period is characterized by major cultural transformations and human migrations, with lasting effects across Europe. To understand the population dynamics in Neolithic Scandinavia and the Baltic Sea area, we investigate the genomes of individuals associated with the Battle Axe Culture (BAC), a Middle Neolithic complex in Scandinavia resembling the continental Corded Ware Culture (CWC). We sequenced 11 individuals (dated to 3330–1665 calibrated before common era (cal BCE)) from modern-day Sweden, Estonia, and Poland to 0.26–3.24× coverage. Three of the individuals were from CWC contexts and two from the central-Swedish BAC burial ‘Bergsgraven’. By analysing these genomes together with the previously published data, we show that the BAC represents a group different from other Neolithic populations in Scandinavia, revealing stratification among cultural groups. Similar to continental CWC, the BAC-associated individuals display ancestry from the Pontic–Caspian steppe herders, as well as smaller components originating from hunter–gatherers and Early Neolithic farmers. Thus, the steppe ancestry seen in these Scandinavian BAC individuals can be explained only by migration into Scandinavia. Furthermore, we highlight the reuse of megalithic tombs of the earlier Funnel Beaker Culture (FBC) by people related to BAC. The BAC groups likely mixed with resident middle Neolithic farmers (e.g. FBC) without substantial contributions from Neolithic foragers.
...

By contrast, the CWC individuals from Obłaczkowo in Poland (poz44 and poz81) show an extremely high proportion of steppe ancestry (greater than 90%), which is different from the later CWC-associated individuals excavated in Pikutkowo (Poland) [23], but similar to some other CWC-associated individuals from Germany, Lithuania, and Latvia [2,8,31]. Interestingly, these individuals with a large fraction of steppe ancestry have typically been dated to more than 2600 BCE, making them among the earliest CWC individuals genetically investigated. This observation, i.e. early CWC individuals resembled (genetically) Yamnaya-associated individuals, while later CWC groups show higher levels of European Neolithic farmer ancestry (Pearson's correlation coefficient: −0.51, p = 0.006) (figure 2), suggests an initial dispersal that occurred rapidly.

See also...


Saturday, October 12, 2019

The Balkan connection


The hot topic at the moment is social inequality in Bronze Age Europe, thanks to a new paper by Mittnik et al. at Science. The full article is sitting behind an exceedingly robust paywall here.

However, the genotype dataset from the paper is freely available at the Max Planck Society's Edmond data repository here. Below is my Principal Component Analysis (PCA) of ancient West Eurasian genetic variation featuring 41 of the highest quality ancients from the new dataset. Almost all of them are from the Lech Valley in the Bavarian Alps, covering the period from the Bell Beaker culture (BBC) to the Middle Bronze Age (MBA). Two of the samples are from a mass Corded Ware culture (CWC) burial in the more northerly Tauber Valley.


I've also highlighted other ancients on the plot associated with the BBC and CWC from present-day Netherlands and Germany, respectively. The relevant PCA datasheet can be downloaded here.

Social stratification in ancient Europe is a fascinating topic, and it's an issue that I've started looking at myself (see here). However, I can't see any correlation between the inferred social standing of the individuals from the Lech and Tauber valleys and their positions in my PCA.

Nevertheless, the PCA is interesting in that it highlights considerable genetic heterogeneity within the Lech Valley BBC population. Indeed, how is this heterogeneity even possible, if, as per Mittnik et al., ancient DNA "has shown that the spread of the BBC throughout continental Europe did not involve large-scale migrations"?

Below is another version of my PCA, but this time focusing on three males: Lech Valley Beakers UNTA58_68Sk1 and WEHR_1192SkA, as well as ALT_4 from the aforementioned mass CWC grave in the Tauber Valley. Note that UNTA58_68Sk1 and WEHR_1192SkA represent genetically the most southern and northern, respectively, Lech Valley BBC samples that had enough data to be run in my analysis. I chose to focus on males because they carry the Y-chromosome, which can be informative about male-mediated ancient population expansions.


The PCA outcomes for these individuals are generally in line with their results in other types of genetic analyses, including those based on formal statistics. For instance, compared to the other two, ALT_4 harbors excess early steppe herder ancestry, UNTA58_68Sk1 excess early European farmer ancestry, and WEHR_1192SkA excess European hunter-gatherer ancestry. Moreover...

- UNTA58_68Sk1 shows a non-local isotopic signature and belongs to Y-haplogroup G2a, a marker essentially missing from BBC populations north of the Alps, and is best modeled as a two-way mixture between Bronze Age populations from the Balkans and the Pontic-Caspian steppe (see here), which probably means that he was a migrant to the Lech Valley from south of the Alps

- importantly, UNTA58_68Sk1 is not an isolated case, at least in the sense that several other BBC individuals from Bavaria, Bohemia, Hungary and Poland show varying ratios of Balkan-related ancestry, although almost all of these people are women

- WEHR_1192SkA is very similar to Bell Beakers from the northern Netherlands with whom he shares the R1b-P312 Y-haplogroup, suggesting that he was part of a population that moved into the Lech Valley from potentially as far away as the North Sea coast

- although ALT_4 probably shares the R1b-L51 Y-haplogroup with WEHR_1192SkA and many other BBC and Bronze Age individuals from the Bavarian Alps and surrounds, this can't be used as evidence of significant local genetic continuity after the CWC period, especially considering the comparatively eastern genome-wide structure of ALT_4.

Of course, archeological data suggest that the BBC was influenced in some important ways by the Copper and Bronze Age cultures of the Balkans and Carpathian Basin. So much so, in fact, that Marija Gimbutas, author of The Civilization of the Goddess, believed that the BBC originated in the Balkans from a synthesis of the local Vucedol culture and the intrusive Yamnaya culture from the Pontic-Caspian steppe.

Considering the ancient DNA evidence, however, the main demographic center of the early BBC could not have been south of the Alps.

Rather, it appears that early BBC and even CWC groups from north of the Alps moved into the Balkans and Carpathian Basin, where they may have established contacts with the local elites. If so, this might explain the significant southern cultural influences on the BBC, but limited accompanying genetic impact. This scenario also has support from archeological data (for instance, see here).

See also...

Is Yamnaya overrated?

The Boscombe Bowmen

Single Grave > Bell Beakers

Thursday, September 26, 2019

Is Yamnaya overrated?


Four years after the publication of the seminal ancient DNA paper Massive migration from the steppe is a source for Indo-European languages in Europe by Haak et al., we're still waiting for some of its loose ends to be finally tied up with new samples. In particular...

- if the men of the Corded Ware culture (CWC) were, by and large, derived from the population of the Yamnaya culture, then where are the Yamnaya samples with R1a-M417, the main CWC Y-haplogroup?

- if the men of the Bell Beaker culture (BBC) were also, by and large, derived from the population of the Yamnaya culture, then where are the Yamnaya samples with R1b-P312, the main BBC Y-haplogroup?

- and, most crucially, if R1b-L51, which includes R1b-P312, and is nowadays by far the most important Y-haplogroup in Western Europe, arrived there from the Pontic-Caspian steppe, then why hasn't it yet appeared in any of the ancient DNA from this part of Eastern Europe or surrounds, except of course in samples that are too young to be relevant?

I'm certainly not suggesting that, in hindsight, the said paper now looks fundamentally flawed. In fact, I'd say that it has aged remarkably well, especially considering how fast things are moving in the field of ancient genomics.

But those loose ends really need tying up, one way or another. It's now time.

So someone out there, please, let us know finally if you have the relevant Yamnaya samples. And if you don't, that's OK too, just tell us what you do have. Indeed, it'd be nice know a few basic details about the thousands of samples that have been successfully sequenced in various labs and are waiting to be published. A lot of people would appreciate it.

See also...

Corded Ware as an offshoot of Hungarian Yamnaya (Anthony 2017)

Hungarian Yamnaya > Bell Beakers?

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, September 11, 2019

Y-haplogroup R1a and mental health


I've updated my map of pre-Corded Ware culture R1a samples with a couple of new entries from Central and South Asia (the original is still here). However, before any of you get overly excited, please note that these samples aren't older than the Corded Ware culture. The reason I added them to my map is to counter the ongoing absurd claims online that South Asian R1a isn't derived from European R1a.


Just in case the map can't be viewed in all of its glory in some devices, here's what the fine print says:

The oldest example of R1a in ancient DNA from Central Asia is dated to 2132-1940 calBCE (ID I3770, Narasimhan 2019). Moreover, this sequence is closely related to much older R1a samples from Central, Eastern and Northern Europe, and phylogenetically nested within their diversity. Thus, it must surely represent a population expansion from Europe to Central Asia. Indeed, it's also associated with the Bronze Age Andronovo archeological culture, which is usually seen as an offshoot of the Corded Ware culture (CWC) of Late Neolithic Europe. The vast majority of present-day R1a lineages in Central Asia are closely related to that of I3770, and so must also ultimately derive from Europe.

The oldest instance of R1a in ancient DNA from South Asia is dated to just 1044-922 calBCE (ID I12457, Narasimhan 2019). This sequence, as well as the vast majority of present-day South Asian R1a lineages, are closely related to much older R1a samples from Central, Eastern and Northern Europe, and phylogenetically nested within their diversity. Thus, they must surely represent a population expansion from Europe to South Asia via Central Asia, in all likelihood during the Bronze Age. Even if R1a existed in South Asia before the Bronze Age, which is extremely unlikely, because it's found in samples from indigenous European hunter-gatherers, the vast majority of present-day R1a lineages in South Asia must be ultimately from Europe.

The idea that most, if not all, South Asian R1a is derived from European R1a seriously scares a lot of people. This is obvious in many online discussions on the topic. I suspect they're so frightened by it because, in their minds, it has the potential to encourage discrimination and even racism, perhaps by re-defining the colonization of much of the world by European nations in the recent past as the natural order of things?

In any case, clearly we're dealing with some sort of mass phobia here. I've got advice for those of you suffering from this problem: if you're honestly worried that the geographic provenance and expansion history of some Y-haplogroup is going to negatively impact on your life in any meaningful way, then it's time to find yourself a quality mental health professional. All the best with that.

See also...

The mystery of the Sintashta people

The Poltavka outlier

Yamnaya isn't from Iran just like R1a isn't from India

Thursday, September 5, 2019

On the surprising genetic origins of the Harappan people (Shinde et al. 2019)


The long awaited paper with ancient DNA from the Indus Valley Civilization (IVC) site of Rakhigarhi has finally arrived. Courtesy of Shinde et al. at Current Biology:

An ancient Harappan genome lacks ancestry from Steppe pastoralists or Iranian farmers

The bad news is that the paper features just one low coverage IVC genome, and it belongs to a female, so there's no Y-haplogroup. However, importantly, this individual is very similar to genetic outliers from Bronze Age West and Central Asia known as Indus_Periphery. So much so, in fact, that they could easily be from the same gene pool.

This, of course, gives strong support to the idea that Indus_Periphery is a useful stand-in for the real IVC population (see here).

Surprisingly, despite being largely of West Eurasian origin, the IVC people possibly didn't harbor any ancestry from the Neolithic farmers of the Fertile Crescent or even the Iranian Plateau.

That's because, according to Shinde et al., their West Eurasian ancestors separated genetically from those of the early Holocene populations of what is now western and northern Iran around 12,000 BCE. In other words, well before the advent of agriculture.


This surely complicates matters for those arguing that Indo-European languages may have arrived in the Indian subcontinent with early farmers via the Iranian Plateau. The more widely accepted theory is that Indo-European languages spread into South Asia with Bronze Age pastoralists from the Eurasian steppes. See here...


Update 05/09/2019: I had a quick look at the ancient Rakhigarhi individual with qpAdm, just to confirm for myself that she was indeed largely of West Eurasian origin and practically indistinguishable from Indus_Periphery. The genotype data that I used are freely available here.

IND_Rakhigarhi_BA
IRN_Ganj_Dareh_N 0.711±0.065
Onge 0.232±0.067
RUS_Tyumen_HG 0.057±0.059
chisq 13.251
tail prob 0.0392147
Full output

Indus_Periphery
IRN_Ganj_Dareh_N 0.674±0.015
Onge 0.237±0.014
RUS_Tyumen_HG 0.090±0.012
chisq 14.877
tail prob 0.0212326
Full output

Indus_Periphery
IND_Rakhigarhi_BA 0.946±0.074
Onge 0.054±0.074
chisq 10.358
tail prob 0.169152
Full output

This does appear to be the case, although it's also obvious that my models are missing something important because their statistical fits are rather poor. I'm guessing the main problem is trying to use the Onge people of the Andaman Islands as a proxy for the indigenous foragers of the Indian subcontinent.

See also...

Y-haplogroup R1a and mental health

Monday, September 2, 2019

Commoner or elite?


I recently started looking at the correlations between Y-chromosome haplogroups and social standing in ancient Europe, and was surprised by what I learned about the five currently sampled prehistoric Scandinavians belonging to Y-haplogroup R1b. I certainly wasn't expecting to uncover these stories about a mass human sacrifice, a bog body, and an Arctic circle warrior:

- The earliest Scandinavian in the ancient DNA record belonging to R1b comes from a grave site in what is now northern Norway (VK531, Margaryan et al. 2019). This individual has a genome-wide profile similar to that of local Mesolithic hunter-gatherers, but is dated to just ~2,400 BCE. During this time, Scandinavia was dominated by a "new" population associated with the Battle-Axe culture (BAC), with high levels of ancestry from the steppes of Eastern Europe. Since VK531 wasn't buried with any BAC grave goods, and indeed with no grave goods at all, it's possible that he may have been from a remnant forager population that was displaced and ultimately forced into extinction.

- R1b-U106 is today by far the most common R1b subclade in Scandinavia, but it's not yet clear how it managed to attain this status. Was it perhaps through elite dominance? The earliest ancient individual belonging to R1b-U106 is dated to 2275-2032 calBCE and comes from a Late Neolithic, likely post-BAC burial ground in what is now Sweden (RISE98, Lilla Beddinge, grave 49, southern skeleton, Allentoft et al. 2015). However, RISE98 wasn't buried in any way that would suggest he was an individual of high social standing. In fact, he was found in a mass grave, along with two other adults and two infants, possibly representing a human sacrifice. The only artefact in the grave was a bone needle. More details are available here.

- During the Nordic Bronze Age it became customary for Scandinavian elites to be laid to rest in richly furnished barrows, while commoners were buried in flat graves with few or no offerings. Human remains recovered from a "commoner" flat grave cemetery dated to the Early Bronze Age near the present-day city of Aalborg, northern Denmark, included the skeleton of a male belonging to Y-haplogroup R1b-M269 (RISE47, grave 3, skeleton 8, Allentoft et al. 2015). Keep in mind, however, that this might have been another case of an ancient Scandinavian R1b-U106 if not for missing data. A flint dagger was found alongside one of the skeletons in this cemetery, but RISE47 wasn't accompanied by any grave goods (see here).

- One of the most amazing archeological discoveries made in Scandinavia is the Trundholm Sun Chariot. Found in a peat bog on the island of Zealand, Denmark, in 1902, it's thought to be an Indo-European religious artefact dating back to the Nordic Bronze Age; a representation of a horse pulling the sun and perhaps also the moon in a spoked wheel chariot. Another important discovery in a peat bog near Trundholm dating to the Nordic Bronze Age was the body of a man belonging to R1b-M269 (RISE276, Trundholm mose II, bog find 1940, Allentoft et al. 2015). However, chances are slim that RISE276 was a charioteer or, say, a spiritual guru who accidentally drowned in the bog. Most Danish bog bodies are thought to have belonged to sacrificial victims or executed criminals.

- Interestingly, the earliest likely Scandinavian warrior belonging to R1b, and also R1b-U106, is from an early Iron Age burial in present-day northwestern Norway (VK418, Margaryan et al. 2019). This site isn't quite as far north as the grave of the above mentioned VK531, but it's still well within the Arctic circle. Apparently, VK418 was buried with some impressive weapons, potentially of "eastern origin", including a shield, spearheads and a sword. Who knows, he may even have been an elite warrior for his time and place?

The other two main Scandinavian Y-haplogroups, I1a and R1a, haven't yet been found in prehistoric Nordic remains from such, shall we say, depressing burials. That's not to say, of course, that they won't be sooner or later. RISE175, from Allentoft et al. 2015, is currently the only individual who fits the bill as a representative of the Nordic Bronze Age elite. He was buried in a barrow grave in what is now southwest Sweden and probably belongs to Y-haplogroup I1a. That's not much to go on, but perhaps it's a sign of things to come?


See also...

Isotopes vs ancient DNA in prehistoric Scandinavia

Who were the people of the Nordic Bronze Age?

They came, they saw, and they mixed

Tuesday, August 27, 2019

Isotopes vs ancient DNA in prehistoric Scandinavia


Four of the samples from the recent Frei et al. paper on human mobility in prehistoric southern Scandinavia are in my Global25 datasheets. Their genomes were published along with Allentoft et al. back in 2015. So I thought it might be interesting to check whether their strontium isotope ratios correlated with their genomic profiles.

In the Principal Component Analysis (PCA) below, RISE61 is a subtle outlier along the horizontal axis compared to the other three Nordic ancients, as well as a Danish individual representative of the present-day Danish gene pool. Also note that RISE61 shows the most unusual strontium isotope ratio (0.712588). The PCA was run with an online tool freely available here.


To help drive the point home, here's a figure from Frei et al., edited by me to show the positions of RISE47, RISE61 and RISE71. If RISE276 was also in this graph, he'd be sitting well under the "local" baseline, in roughly the same spot along the vertical axis as RISE47.


Interestingly, RISE61 belongs to Y-chromosome haplogroup R1a-M417, while RISE47 and RISE276, who appear to have been locals, both belong to R1b-M269. My guess is that RISE61 was a recent migrant from a more northerly part of Scandinavia dominated by the Battle-Axe culture (BAC). The BAC population was probably rich in R1a-M417 because it moved into Scandinavia from the Pontic-Caspian steppe via the East Baltic. This is what Frei et al. say about RISE61 and his burial site:

The double passage grave of Kyndeløse (Fig 1, S1 File) located on the island of Zealand yielded 70 individuals as well as a large number of grave goods, including flint artefacts, ceramics, and tooth and amber beads. We conducted strontium isotope analyses of seven individuals from Kyndeløse encompassing a period of c. 1000 years, indicating the prolonged use of this passage grave. The oldest of the seven individuals is a female (RISE 65) from whom we measured a “local” strontium isotope signature ( 87 Sr/ 86 Sr = 0.7099). Similar values were measured in five other individuals, including adult males and females. Only a single individual from Kyndeløse, an adult male (RISE 61) yielded a somewhat different strontium isotope signature of 87 Sr/ 86 Sr = 0.7126 which seems to indicate a non-local provenance. The skull of this male individual revealed healed porosities in the eye orbits, cribra orbitalia, a condition which is possibly linked to a vitamin deficiency during childhood, such as iron deficiency.

By the way, RISE47 was buried in a flat grave, which suggests that he was a commoner. RISE276 was found in a peat bog in Trundholm, where the famous Trundholm sun chariot was discovered (see here). He may have been a human sacrifice.

Citation...

Frei KM, Bergerbrant S, Sjögren K-G, Jørkov ML, Lynnerup N, Harvig L, et al. (2019) Mapping human mobility during the third and second millennia BC in present-day Denmark. PLoS ONE 14(8): e0219850. https://doi.org/10.1371/journal.pone.0219850

See also...

Commoner or elite?

Who were the people of the Nordic Bronze Age?

They came, they saw, and they mixed

Tuesday, August 20, 2019

Roopkund Lake dead


Fifteen of the Roopkund Lake samples from the Harney et al. paper published today at Nature Communications made it into the Global25 datasheets. Look for the prefix IND_Roopkund here...

Global25 datasheet (scaled)

Global25 datasheet

Global25 pop averages (scaled)

Global25 pop averages

Their genotypes are freely available in a ~590K SNP dataset via the Reich Lab here. I might be able to run more of the samples at some point if and when they're released in a dataset with more SNPs.

In any case, much like everyone else, I don't have a clue how those Mediterranean migrants ended up in the Himalayas back in the 1800s, but I do know where they came from. Most appear to have been from Crete, while others from mainland Greece. However, one of the individuals that I was able to analyze with the Global25 was almost certainly an Anatolian Greek. Below are a couple of Principal Component Analyses (PCA) based on the Global25 data. The relevant datasheet is available here.


I don't yet have a strong opinion about the origins of the earlier, typically South Asian Roopkund dead. They may have been visitors from all over India, or members of different castes from northern India. A PCA with six of these individuals can be seen here and the relevant datasheet gotten here. Any thoughts? Feel free to share them in the comments below.

Update 23/08/2019: A new ~1240K SNP genotype dataset with the Roopkund Lake samples is now available here. More markers means that I can produce more accurate PCA and run almost twice as many of the samples. I've updated all of the datasheets accordingly. The links are the same.


See also...

Getting the most out of the Global25

A surprising twist to the Shirenzigou nomads story

The Poltavka outlier

Saturday, August 17, 2019

A surprising twist to the Shirenzigou nomads story


Remember those potentially Afanasievo-derived and Tocharian-related Shirenzigou nomads from the Ning et al. paper? Well, in my opinion, they're probably neither. The genotypes and other data for these Iron Age individuals from the eastern Tian Shan are available here.

Below are a few successful and not so successful qpAdm mixture models for them. Note that I tried to use a wide range of relevant "right pops", but also retain a lot of markers, specifically to be able to discriminate between different types of steppe and steppe-derived sources of gene flow (refer to the full output). Admittedly, the Shirenzigou nomads can be modeled with Afanasievo-related ancestry, but...

CHN_Shirenzigou_IA
KAZ_Botai 0.161±0.023
KAZ_Wusun 0.490±0.023
NPL_Mebrak_2125BP 0.349±0.019

chisq 5.793
tail prob 0.926172
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.143±0.022
NPL_Mebrak_2125BP 0.295±0.019
Saka_Tian_Shan 0.562±0.024

chisq 6.796
tail prob 0.870794
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.185±0.023
NPL_Mebrak_2125BP 0.428±0.021
RUS_Sintashta_MLBA 0.270±0.026
TJK_Sarazm_En 0.117±0.027

chisq 11.351
tail prob 0.414345
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.032±0.027
KAZ_Zevakinskiy_LBA 0.567±0.025
NPL_Mebrak_2125BP 0.401±0.019

chisq 15.157
tail prob 0.232961
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.452±0.031
RUS_Afanasievo 0.435±0.025
RUS_Okunevo_BA 0.114±0.049

chisq 19.808
tail prob 0.0708003
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.409±0.031
RUS_Okunevo_BA 0.173±0.050
Yamnaya_RUS_Caucasus 0.418±0.026

chisq 20.453
tail prob 0.0589872
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.464±0.033
RUS_Okunevo_BA 0.104±0.053
Yamnaya_RUS_Samara 0.432±0.027

chisq 27.189
tail prob 0.0072566
Full output

Both the Wusun and Saka are generally accepted to have been the speakers of Indo-Iranian languages. So it's possible that the Shirenzigou nomads were Indo-Iranian speakers too, or at least derived from such peoples.

Surprisingly, NPL_Mebrak_2125BP was the key to obtaining the best statistical fits. This is a trio of samples, roughly contemporaneous with the Shirenzigou nomads, from a burial site high up in the Himalayas in what is now Nepal (see here).

To be honest, I'm not quite sure why the Himalayan ancients work so well in my models. Perhaps they're just a really good proxy for an Iron Age population from the northern edge of the Tibetan Plateau?

By the way, most of the Shirenzigou nomads made it into the latest Global25 datasheets (see here). They can be analyzed in a variety of ways described in this blog post: Getting the most out of the Global25. Below is a screen cap of me comparing the effectiveness of Afanasievo, Sintashta and Wusun samples as proxies for the steppe ancestry in the Shirenzigou nomads with an online tool freely available here. As expected, the algorithm picks Sintashta ahead of Afanasievo, and the Wusun ahead of both.


See also...

They mixed up Huns with Tocharians

Some myths die hard

The mystery of the Sintashta people

Wednesday, August 14, 2019

Did South Caspian hunter-fishers really migrate to Eastern Europe?


The idea that most of the Near Eastern-related ancestry in the ancient populations of the Pontic-Caspian (PC) steppe is, one way or another, sourced from the territory of present-day Iran is a fairly popular one nowadays (for instance, see here). It might turn out to be correct, once there are enough relevant samples to test it properly, but in my opinion the chances of this are slim.

My skepticism is based on literally hours of analyses with the currently available ancients from the Caucaso-Caspian region, like, for instance, the admixture graphs below featuring foragers and early farmers from Russia, Georgia and Iran. The relevant qpGraph and dot files are available here.

Note that the further I move away from Eastern Europe in these graphs when looking for the source of the southern ancestry in the Eneolithic population from the southernmost part of the PC steppe (Piedmont_En), the more difficult it is for me to create a statistically sound model. What might this tell us about the provenance of this so called southern ancestry?




See also...

The PIE homeland controversy: August 2019 status report

Some myths die hard

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, August 2, 2019

The PIE homeland controversy: August 2019 status report


Archeologist David Anthony has a new paper on the Indo-European homeland debate titled Archaeology, Genetics, and Language in the Steppes: A Comment on Bomhard. It's part of a series of articles dealing with Allan R. Bomhard's "Caucasian substrate hypothesis" in the latest edition of The Journal of Indo-European Studies. It's also available, without any restrictions, here.

Any thoughts? Feel free to share them in the comments below. Admittedly, I found this part somewhat puzzling (emphasis is mine):

It was the faint trace of WHG, perhaps 3% of whole Yamnaya genomes, that identified this admixture as coming from Europe, not the Caucasus, according to Wang et al. (2018). Colleagues in David Reich’s lab commented that this small fraction of WHG ancestry could have come from many different geographic places and populations.

I think that's highly optimistic. It really should be obvious by now thanks to archeological and ancient genomic data, including both uniparental and genome-wide variants, that the Yamnaya people were practically entirely derived from Eneolithic populations native to the Pontic-Caspian (PC) steppe. So, in all likelihood, this was also the source of their minor WHG ancestry.

Indeed, they clearly weren't some mishmash of geographically, culturally and genetically disparate groups that had just arrived in Eastern Europe, but the direct descendants of closely related and already significantly Yamnaya-like peoples associated with long-standing PC steppe archeological cultures such as Khvalynsk and Sredny Stog. I discussed this earlier this year, soon after the Wang et al. paper was published:

On Maykop ancestry in Yamnaya

I hope I'm wrong, but I get the feeling that the scientists at the Reich Lab are finding this difficult to accept, because it doesn't gel with their theory that archaic Proto-Indo-European (PIE) wasn't spoken on the PC steppe, but rather south of the Caucasus, and that late or rather nuclear PIE was introduced into the PC steppe by migrants from the Maykop culture who were somehow involved in the formation of the Yamnaya horizon.

Inexplicably, after citing Wang et al. on multiple occasions and arguing against any significant gene flow between Maykop and Yamnaya groups, Anthony fails to mention Steppe Maykop. But the Steppe Maykop people are an awesome argument against the idea that there was anything more than occasional mating between the Maykop and Yamnaya populations, because they were wedged between them, and yet clearly distinct from both, with a surprisingly high ratio of West Siberian forager-related ancestry (see here and here).


Despite all the talk lately about the potential cultural, linguistic and genetic ties between Maykop and Yamnaya, including claims that the latter possibly acquired its wagons from the former, my view is that the Steppe Maykop and Yamnaya wagon drivers may have competed with each other and eventually clashed in a big way. Indeed, take a look at what happens after Yamnaya burials rather suddenly replace those of Steppe Maykop just north of the Caucasus around 3,000 BCE.

Yamnaya_RUS_Caucasus
RUS_Progress_En_PG2001 0.808±0.058
RUS_Steppe_Maykop 0.000
UKR_Sredny_Stog_II_En_I6561 0.192±0.058
chisq 13.859
tail prob 0.383882
Full output

Yep, total population replacement with no significant gene flow between the two groups. Apparently, as far as I can tell, there's not even a hint that a few Steppe Maykop stragglers were incorporated into the ranks of the newcomers. Where did they go? Hard to say for now. Maybe they ran for the hills nearby?

Intriguingly, Anthony reveals a few details about new samples from three different Eneolithic steppe burial sites associated with the Khvalynsk culture:

The Reich lab now has whole-genome aDNA data from more than 30 individuals from three Eneolithic cemeteries in the Volga steppes between the cities of Saratov and Samara (Khlopkov Bugor, Khvalynsk, and Ekaterinovka), all dated around the middle of the fifth millennium BC.

...

Most of the males belonged to Y-chromosome haplogroup R1b1a, like almost all Yamnaya males, but Khvalynsk also had some minority Y-chromosome haplogroups (R1a, Q1a, J, I2a2) that do not appear or appear only rarely (I2a2) in Yamnaya graves.

As far as I can tell, he suggests that they'll be published in the forthcoming Narasimhan et al. paper. If so, it sounds like the paper will have many more ancient samples than its early preprint that was posted at bioRxiv last year.

For me the really fascinating thing in regards to these new samples is how scarce Y-haplogroup R1a appears to have been everywhere before the expansion by the putative Indo-European-speaking steppe ancestors of the Corded Ware culture (CWC) people. It's basically always outnumbered by other haplogroups wherever it's found prior to about 3,000 BCE, even on the PC steppe. But then, suddenly, its R1a-M417 subclade goes BOOM! And that's why I call it...

The beast among Y-haplogroups

At this stage, I'm not sure how to interpret the presence of Y-haplogroup J in the Khvalynsk population. It may or may not be important to the PIE homeland debate. Keep in mind that J is present in two foragers from Karelia and Popovo, northern Russia, dated to the Mesolithic period and with no obvious foreign ancestry. So it need not have arrived north of the Caspian as late as the Eneolithic with migrants rich in southern ancestry from the Caucasus or what is now Iran. In other words, for the time being, the steppe PIE homeland theory appears safe.

See also...

Is Yamnaya overrated?

The PIE homeland controversy: January 2019 status report

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Sunday, July 28, 2019

They mixed up Huns with Tocharians


I don't yet have the genomes from the recent Ning et al. paper on the Iron Age nomads from the Shirenzigou site in the eastern Tian Shan. But I do have most of the previously published data featured in the paper, including the Damgaard et al. 2018 Hun and Saka samples from the western Tian Shan.

After reading the Ning et al. paper between the lines and running a few analyses of my own, it's clear to me that most of the supposedly Tocharian-related Shirenzigou individuals actually share a very close relationship with the Tian Shan Huns, and indeed may have been their ancestors.

For instance, Ning et al. found that a large part of the ancestry of the Shirenzigou ancients could be modeled with the Tian Shan Huns, which was an anachronistic approach because the former are older than the latter. They also found that Ulchi-related ancestry was a key part of the genetic structure of eight out of the ten Shirenzigou individuals, and this likewise appears to be an important part of the genetic structure of the Tian Shan Huns.

Note the strong statistical fits in the Global25/nMonte and qpAdm mixture models below, respectively, which characterize these Huns as a two-way mixture between the Ulchi and the earlier Tian Shan Saka. And keep in mind that the Saka also harbor significant Ulchi-related ancestry.

Hun_Tian_Shan
Saka_Tian_Shan,92
Ulchi,8

distance%=1.2553

Hun_Tian_Shan
Saka_Tian_Shan 0.928±0.009
Ulchi 0.072±0.009

chisq 4.409
tail prob 0.992464
Full output

Moreover, the Shirenzigou males belong to Y-haplogroups Q1a and R1b (two instances of each), and they share the latter with one of the Tian Shan Huns. Judging by the data from the relevant BAM files, it's also possible that the Shirenzigou males share a very rare subclade of R1b with the Hun, defined by the PH155 mutation (see here). The Y-haplogroup assignments for the other Tian Shan Huns end at R and R1, but that's almost certainly due to missing data.

On the other hand, two Tian Shan Sakas belong to Y-haplogroup R1a but none to R1b, which fits with the pattern from currently available ancient DNA that R1a was more common than R1b in Saka-related groups, such as the Scythians and Sarmatians (see here).

This is all very interesting, because the Huns replaced the Saka in the western Tian Shan, and, considering their R1b and excess Ulchi-related ancestry, very likely moved into the region from the direction of Shirenzigou. Indeed, in my opinion a strong argument can now be made that the Iron Age population from the Shirenzigou region took part in the formation of the Hunnic confederacy.

So where does that leave the theory presented by Ning et al. that the Shirenzigou ancients may have been closely related, and perhaps even ancestral, to the Tocharians, simply because they packed a lot of Yamnaya-related and possibly proto-Tocharian Afanasievo ancestry, and were living close to the Tarim Basin, where Tocharian languages were subsequently first attested?

I'm not sure, but I now find it difficult to reconcile this theory with the fact that they were closely related, and probably ancestral, to the Tian Shan Huns. As far as I'm aware, Huns cannot be linked to Tocharians in any meaningful way.

Of course it's possible that different Afanasievo-derived groups were living in the Tarim Basin and surrounds, and, as some merged with new populations pushing into the region from the east and adopted non-Indo-European languages, others retained their Tocharian speech and eventually split into communities speaking Tocharian A, B and apparently also C (see here).

But this has to be demonstrated directly with ancient DNA from archeological sites where Tocharian languages were attested. Till then, I'll keep thinking that Ning et al. wrote a paper about Tocharians that really should've been a paper about Huns.

Here's a famous wall painting of Tocharian princes from the cave of the sixteen sword-bearers in the Tarim Basin, dated to 432–538 AD. They don't look like guys with a lot of Ulchi-related admixture to me, but I might be wrong. Feel free to let me know what you think in the comments below.


Update 08/17/2019: The Shirenzigou nomads are now in my dataset. Below are a few successful and not so successful qpAdm mixture models for them. Note that I tried to use a wide range of relevant "right pops", but also retain a lot of markers, specifically to be able to discriminate between different types of steppe and steppe-derived sources of gene flow (refer to the full output). Admittedly, the Shirenzigou nomads can be modeled with Afanasievo-related ancestry, but...

CHN_Shirenzigou_IA
KAZ_Botai 0.161±0.023
KAZ_Wusun 0.490±0.023
NPL_Mebrak_2125BP 0.349±0.019

chisq 5.793
tail prob 0.926172
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.143±0.022
NPL_Mebrak_2125BP 0.295±0.019
Saka_Tian_Shan 0.562±0.024

chisq 6.796
tail prob 0.870794
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.185±0.023
NPL_Mebrak_2125BP 0.428±0.021
RUS_Sintashta_MLBA 0.270±0.026
TJK_Sarazm_En 0.117±0.027

chisq 11.351
tail prob 0.414345
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.032±0.027
KAZ_Zevakinskiy_LBA 0.567±0.025
NPL_Mebrak_2125BP 0.401±0.019

chisq 15.157
tail prob 0.232961
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.452±0.031
RUS_Afanasievo 0.435±0.025
RUS_Okunevo_BA 0.114±0.049

chisq 19.808
tail prob 0.0708003
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.409±0.031
RUS_Okunevo_BA 0.173±0.050
Yamnaya_RUS_Caucasus 0.418±0.026

chisq 20.453
tail prob 0.0589872
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.464±0.033
RUS_Okunevo_BA 0.104±0.053
Yamnaya_RUS_Samara 0.432±0.027

chisq 27.189
tail prob 0.0072566
Full output

Both the Wusun and Saka are generally accepted to have been the speakers of Indo-Iranian languages. So it's possible that the Shirenzigou nomads were Indo-Iranian speakers too, or at least derived from such peoples.

Surprisingly, NPL_Mebrak_2125BP was the key to obtaining the best statistical fits. This is a trio of samples, roughly contemporaneous with the Shirenzigou nomads, from a burial site high up in the Himalayas in what is now Nepal (see here).

To be honest, I'm not quite sure why the Himalayan ancients work so well in my models. Perhaps they're just a really good proxy for an Iron Age population from the northern part of the Tibetan Plateau? By the way, most of the Shirenzigou nomads made it into the latest Global25 datasheets (see here).

See also...

Almost everything you ever wanted to know about the Xiaohe-Gumugou cemeteries

The mystery of the Sintashta people

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, July 26, 2019

Afanasievo people may well have been proto-Tocharian speakers (Ning et al. 2019)


Update 17/08/2019: A surprising twist to the Shirenzigou nomads story

...

During the Early Bronze Age, around 2,900 BCE, a population associated with the Yamnaya archeological culture migrated from the Pontic-Caspian steppe in Eastern Europe deep into Asia, as far as the Minusinsk Basin in South Siberia.

This rapid, long-range expansion was likely to have been the first significant migration of a Yamnaya-related group far to the east of the Ural Mountains, and it resulted in the formation of the Afanasievo archeological culture (see here).

The appearance of Tocharian languages in the Tarim Basin, in what is now western China, is often associated with the Afanasievo culture, mainly because of the confirmed presence of European-related populations in the Tarim Basin during the Bronze Age, as well as the likely highly divergent position of the Tocharian node in the Indo-European language phylogeny.

But the Afanasievo people were separated by considerable distance in space and time from the Tocharians, and can't yet be reliably linked to them with archeological or genetic data. So even though the inference that the former are linguistically ancestral to the latter is quite plausible, it's far from certain.

However, thanks to a new paper at Current Biology by Ning et al., at least we now know that a population with significant Yamnaya/Afanasievo-related ancestry was living in the eastern Tian Shan Mountains just a few hundred years before Tocharian languages were attested nearby [LINK]. Below is the paper summary, emphasis is mine:

Recent studies of early Bronze Age human genomes revealed a massive population expansion by individuals-related to the Yamnaya culture, from the Pontic Caspian steppe into Western and Eastern Eurasia, likely accompanied by the spread of Indo-European languages [1, 2, 3, 4, 5]. The south eastern extent of this migration is currently not known. Modern-day human populations from the Xinjiang region in northwestern China show a complex population history, with genetic links to both Eastern and Western Eurasia [6, 7, 8, 9, 10]. However, due to the lack of ancient genomic data, it remains unclear which source populations contributed to the Xinjiang population and what was the timing and the number of admixture events. Here, we report the first genome-wide data of 10 ancient individuals from northeastern Xinjiang. They are dated to around 2,200 years ago and were found at the Iron Age Shirenzigou site. We find them to be already genetically admixed between Eastern and Western Eurasians. We also find that the majority of the East Eurasian ancestry in the Shirenzigou individuals is-related to northeastern Asian populations, while the West Eurasian ancestry is best presented by ∼20% to 80% Yamnaya-like ancestry. Our data thus suggest a Western Eurasian steppe origin for at least part of the ancient Xinjiang population. Our findings furthermore support a Yamnaya-related origin for the now extinct Tocharian languages in the Tarim Basin, in southern Xinjiang.


Ning et al., Ancient Genomes Reveal Yamnaya-Related Ancestry and a Potential Source of Indo-European Speakers in Iron Age Tianshan, Current Biology, July 25, 2019, DOI: https://doi.org/10.1016/j.cub.2019.06.044

See also...

It was always going to be this way

The mystery of the Sintashta people

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, July 17, 2019

Viking invasion at bioRxiv


A new preprint featuring hundreds of Viking Age genomes has appeared at bioRxiv [LINK]. Titled Population genomics of the Viking world, it looks like a solid effort overall, although I'm skeptical about its conclusions. I might elaborate on that in the comments below, but I'll have a lot more to say on the topic if and when I get to check out the ancient genomes with my own tools. Details about the new samples, including their Y-chromosome haplogroup assignments, are available here. Below is the abstract, emphasis is mine:

The Viking maritime expansion from Scandinavia (Denmark, Norway, and Sweden) marks one of the swiftest and most far-flung cultural transformations in global history. During this time (c. 750 to 1050 CE), the Vikings reached most of western Eurasia, Greenland, and North America, and left a cultural legacy that persists till today. To understand the genetic structure and influence of the Viking expansion, we sequenced the genomes of 442 ancient humans from across Europe and Greenland ranging from the Bronze Age (c. 2400 BC) to the early Modern period (c. 1600 CE), with particular emphasis on the Viking Age. We find that the period preceding the Viking Age was accompanied by foreign gene flow into Scandinavia from the south and east: spreading from Denmark and eastern Sweden to the rest of Scandinavia. Despite the close linguistic similarities of modern Scandinavian languages, we observe genetic structure within Scandinavia, suggesting that regional population differences were already present 1,000 years ago. We find evidence for a majority of Danish Viking presence in England, Swedish Viking presence in the Baltic, and Norwegian Viking presence in Ireland, Iceland, and Greenland. Additionally, we see substantial foreign European ancestry entering Scandinavia during the Viking Age. We also find that several of the members of the only archaeologically well-attested Viking expedition were close family members. By comparing Viking Scandinavian genomes with present-day Scandinavian genomes, we find that pigmentation-associated loci have undergone strong population differentiation during the last millennia. Finally, we are able to trace the allele frequency dynamics of positively selected loci with unprecedented detail, including the lactase persistence allele and various alleles associated with the immune response. We conclude that the Viking diaspora was characterized by substantial foreign engagement: distinct Viking populations influenced the genomic makeup of different regions of Europe, while Scandinavia also experienced increased contact with the rest of the continent.

Margaryan et al., Population genomics of the Viking world, bioRxiv, posted July 17, 2019, doi: https://doi.org/10.1101/703405

See also...

They came, they saw, and they mixed

Who were the people of the Nordic Bronze Age?

Asiatic East Germanics

Monday, July 15, 2019

Asiatic East Germanics


Around a third of the ancient individuals in my dataset associated with East Germanic-speaking cultures show obvious ancestry from Central and/or West Asia.

This shouldn't be too surprising, considering, for instance, the well documented contacts between East Germanic tribes and the Avars, Huns, Sarmatians and other nomadic groups that streamed into Europe from the Asian steppes during the Migration Period. It's a topic that I've raised before at this blog (see here).

But the curious thing is that very little, if any, of this ancestry has percolated down to present-day Europeans.

The easiest way to show this is with a Principal Component Analysis (PCA) based on my Global25 data. The relevant PCA datasheet can be downloaded here. Basic details about the ancient samples in the analysis are available here.

Some of the Northeastern European populations, particularly the Uralic speakers, appear to be attracted to the Hunnic cluster. However, this is mostly an artifact of pre-Migration Period east to west population expansions in the far north of Europe, probably including those of the Proto-Uralians (see here).

So how is it that, despite ruling over vast areas of Europe for hundreds of years, the East Germanics appear not to have contributed significantly to the present-day European gene pool? My theory is that, much like the Avars and Huns, they were militarily and demographically overwhelmed by the ascending groups around them, such as the Slavs, and they simply went extinct.

To wrap things up, here's a basic qpAdm mixture model designed to test for Hunnic-related ancestry in a few Eastern and Northern European populations of interest. Note the significant slice of this type of ancestry in the likely early Goths of the Chernyakhiv culture. Is it real? Feel free to share your thoughts in the comments below.

UKR_Chernyakhiv
DEU_MA 0.863±0.038
Hun_Tian_Shan 0.137±0.038
chisq 12.525
tail prob 0.325466
Full output

Swedish
Baltic_EST_IA 0.126±0.078
DEU_MA 0.849±0.073
Hun_Tian_Shan 0.025±0.020
chisq 8.338
tail prob 0.595877
Full output

Ukrainian
Baltic_EST_IA 0.121±0.064
DEU_MA 0.857±0.060
Hun_Tian_Shan 0.022±0.017
chisq 11.458
tail prob 0.322956
Full output

Estonian
Baltic_EST_IA 0.597±0.069
DEU_MA 0.373±0.064
Hun_Tian_Shan 0.030±0.017
chisq 15.739
tail prob 0.107361
Full output

See also...

Conan the Barbarian probably belonged to Y-haplogroup R1a

More on the association between Uralic expansions and Y-haplogroup N

Uralic-specific genome-wide ancestry did make a signifcant impact in the East Baltic

Friday, July 12, 2019

Getting the most out of the Global25


The first thing you need to know about the Global25 is that I update the relevant datasheets regularly, usually every few weeks, but they're always at these links:

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient

...

Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern

Each sample has a population code and an individual code. The population codes represent the countries, ethnic groups and/or archeological affinities of the samples, and I often modify these codes to suit my needs. On the other hand, the individual codes are unique to most of the samples and I usually don't change them.

So if you'd like to know more details about the samples try searching for their individual codes via a decent online search engine. Basic information about many of the samples is also available in the "anno" files here.

The main purpose of the Global25 is to provide data for mixture modeling. In other words, for estimating ancestry proportions, both ancient and modern (see here). This can be done on your computer with the R program and the nMonte R script, or online with a couple of different tools, which I discuss below.

If you don't have R installed on your computer, you can get it here, while nMonte is available here. For this tutorial please download nMonte and nMonte3, and store them in your main working folder (usually My Documents).

Once you have R set up, make sure its working directory is the same place where you stored nMonte. You can check this in R by clicking on "File" and then "Change dir". Additionally, you'll need two nMonte input files in the working directory titled "data" and "target". Examples of these files are available here. We'll be using them to test the ancient ancestry proportions of a sample set from present-day England.

Before you can begin the analysis you need to first call the nMonte script by typing or copy pasting source('nMonte.R') into the R console window, and then hitting "enter" on your keyboard. This is what you should see in the R console window afterwards.


To start the mixture modeling process, type or copy paste getMonte('data.txt', 'target.txt') into the R console window, hit "enter", and wait for the results. After a short time, probably less than a minute or two, you should see this output.


The data and target files contain population averages. And, as you can see, the results that these population averages have produced are in line with what one would expect from such a model focusing on the genetic shifts in Northern Europe during the Late Neolithic. Very similar ancient ancestry proportions have been reported for the English and other Northern Europeans recently in scientific literature.

However, when focusing on exceptionally fine-scale genetic variation that isn't reflected too well in the Global25 population averages, a more effective strategy might be to use multiple individuals from each reference population and let nMonte3 aggregate and average the inferred ancestry proportions.

This is often the case when attempting to model ancestry proportions for more recent periods, such as the Middle Ages. So let's try this with the English sample set using a modified data file, which is available here.

Replace the old data file with the new one in your working directory, and, like before, copy paste into the R console window the following two commands, hitting "enter" after each one: source('nMonte3.R') and getMonte('data.txt', 'target.txt'). This is what you should eventually see.


It's difficult to say how accurate these estimates are. But they look more or less correct considering the limited and less than ideal reference samples. For instance, the individuals labeled SWE_Viking_Age_Sigtuna are supposed to be stand ins for Danish and Norwegian Vikings, but they're a relatively heterogeneous group from Sweden, possibly with some British or Irish ancestry, so they might be skewing the results.

However, I'll be adding many more ancient samples to the Global25 datasheets as they become available, including lots of new Vikings, which should greatly improve the accuracy of these sorts of fine-scale mixture models.

An alternative to the R-based approach is the online Global25 nMonte Runner [LINK]. This is a free tool, and easy to work with via several drop down menus, but users must become sponsors to unlock all of its available features. To run an analysis follow these three steps:
1) use the first drop down menu to pick the reference populations of your choice (up to four are allowed for free users)

2) move down to the second set of the drop down lists and either pick a test population that is already in the system or copy paste a set of Global25 coordinates into the space labeled "Enter/Paste Sets of Coordinates - Scaled and Comma-separated"

3) feel free to experiment with the additional options if you're game and willing to part with a little cash to help pay for the site.


Another exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is freely available HERE. And it works offline too, after downloading the web page onto your computer. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen cap below shows me doing just that.


However, it's important to note that the Global25 is a Principal Component Analysis (PCA), so it makes good sense to also use it for producing PCA graphs. To do this just plot any combination of two or three of its Principal Components (PCs) to create 2D or 3D graphs, respectively. This can be done with a wide variety of programs, including PAST, which is freely available here.

To produce a 2D graph, open a Global25 datasheet in PAST, choose comma as the separator, highlight any two columns of data, click on the "Plot" tab and, from the drop down list, pick "XY graph". Below is a series of graphs that I created in exactly this way. I also color coded the samples according to their geographic origins. This was done by ticking the "Row attributes" tab.


PAST can also be used to run PCA on subsets of the Global25 scaled data to produce remarkably accurate plots of fine-scale population structure. For instance, here's a plot based on present-day populations from north of the Alps, Balkans and Pyrenees.


To try this create a new text file with your choice of populations from the Global25 scaled datasheet, open it with PAST and choose Multivariate > Ordination > Principal Components Analysis. I've already put together several datasheets limited to European, Northern European, West Eurasian and South Asian populations. They're available at the links below along with more details on how to run them with PAST.

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

And if you're fond of tree-like structures as a means to describe fine-scale genetic variation, please check out this blog post...

Global25 workshop 4: a neighbour joining tree

Wednesday, July 10, 2019

Global25 workshop 4: a neighbour joining tree


Phylogenetic trees are easy to produce, but there's an infinite number of ways to run them, and, depending on the input data you're using, some methods are a lot more effective than others. In this tutorial I'm going to demonstrate one method that has worked well for me when looking at the fine scale genetic relationships between ancient and present-day human populations with my Global25 data.

To get started download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Clustering > Neighbour joining. Here's a screen cap of me doing just that...


Then, from the tabs on the right, choose Chord as the similarity index and MAR_Iberomaurusian, the most distinct unit in the datasheet, as the root. PAST offers an exceptionally large range of similarity indices and they generally produce similar results, but, in my experience, Chord creates among the most visually pleasing outcomes when dealing with fine scale genetic substructures.


This is the tree you should see after exporting the image via the graph settings tab in PAST, and, if you like, rotating it 90 degrees with an image editing software of your choice. Note the fairly substantial differences between the populations from Northwestern Europe, which are often difficult to tease apart in such analyses.


If you have your own Global25 coordinates you can add them to my PAST-compatible datasheet to see where you cluster in this tree. And, of course, you can design your own PAST-compatible datasheets and trees with any combination of populations and/or individuals from the Global25 text files at the links below. It's easy; just copy paste the coordinates of your choice into an empty text file, open it with PAST and then save it with the dat extension to create a new PAST datasheet. But make sure never to mix up the scaled and non-scaled coordinates.

Global25 datasheet (scaled)

Global25 pop averages (scaled)

Global25 datasheet

Global25 pop averages

An important point to keep in mind when running these sorts of analyses is that PAST and other such programs need enough genetic differentiation to latch onto in order to produce meaningful results. Thus, even when studying the relationships between very closely related populations, it's not just useful to include a root population or individual, but also some near and far related groups to help the analysis algorithm flesh out the key genetic substructures.

To be honest, I don't really know whether using the Chord index and rooting the tree with MAR_Iberomaurusian is the best way to run a neighbour joining tree analysis of ancient and present-day West Eurasian genetic variation. What do you think? Feel free to let me know in the comments.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)