search this blog

Tuesday, August 20, 2019

Roopkund Lake dead (updated)


Fifteen of the Roopkund Lake samples from the Harney et al. paper published today at Nature Communications made it into the Global25 datasheets. Look for the prefix IND_Roopkund here...

Global25 datasheet (scaled)

Global25 datasheet

Global25 pop averages (scaled)

Global25 pop averages

Their genotypes are freely available in a ~590K SNP dataset via the Reich Lab here. I might be able to run more of the samples at some point if and when they're released in a dataset with more SNPs.

In any case, much like everyone else, I don't have a clue how those Mediterranean migrants ended up in the Himalayas back in the 1800s, but I do know where they came from. Most appear to have been from Crete, while others from mainland Greece. However, one of the individuals that I was able to analyze with the Global25 was almost certainly an Anatolian Greek. Below are a couple of Principal Component Analyses (PCA) based on the Global25 data. The relevant datasheet is available here.


I don't yet have a strong opinion about the origins of the earlier, typically South Asian Roopkund dead. They may have been visitors from all over India, or members of different castes from northern India. A PCA with six of these individuals can be seen here and the relevant datasheet gotten here. Any thoughts? Feel free to share them in the comments below.

Update 23/08/2019: A new ~1240K SNP genotype dataset with the Roopkund Lake samples is now available here. More markers means that I can produce more accurate PCA and run almost twice as many of the samples. I've updated all of the datasheets accordingly. The links are the same.


See also...

Getting the most out of the Global25

A surprising twist to the Shirenzigou nomads story

The Poltavka outlier

Saturday, August 17, 2019

A surprising twist to the Shirenzigou nomads story


Remember those potentially Afanasievo-derived and Tocharian-related Shirenzigou nomads from the Ning et al. paper? Well, in my opinion, they're probably neither. The genotypes and other data for these Iron Age individuals from the eastern Tian Shan are available here.

Below are a few successful and not so successful qpAdm mixture models for them. Note that I tried to use a wide range of relevant "right pops", but also retain a lot of markers, specifically to be able to discriminate between different types of steppe and steppe-derived sources of gene flow (refer to the full output). Admittedly, the Shirenzigou nomads can be modeled with Afanasievo-related ancestry, but...

CHN_Shirenzigou_IA
KAZ_Botai 0.161±0.023
KAZ_Wusun 0.490±0.023
NPL_Mebrak_2125BP 0.349±0.019

chisq 5.793
tail prob 0.926172
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.143±0.022
NPL_Mebrak_2125BP 0.295±0.019
Saka_Tian_Shan 0.562±0.024

chisq 6.796
tail prob 0.870794
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.185±0.023
NPL_Mebrak_2125BP 0.428±0.021
RUS_Sintashta_MLBA 0.270±0.026
TJK_Sarazm_En 0.117±0.027

chisq 11.351
tail prob 0.414345
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.032±0.027
KAZ_Zevakinskiy_LBA 0.567±0.025
NPL_Mebrak_2125BP 0.401±0.019

chisq 15.157
tail prob 0.232961
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.452±0.031
RUS_Afanasievo 0.435±0.025
RUS_Okunevo_BA 0.114±0.049

chisq 19.808
tail prob 0.0708003
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.409±0.031
RUS_Okunevo_BA 0.173±0.050
Yamnaya_RUS_Caucasus 0.418±0.026

chisq 20.453
tail prob 0.0589872
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.464±0.033
RUS_Okunevo_BA 0.104±0.053
Yamnaya_RUS_Samara 0.432±0.027

chisq 27.189
tail prob 0.0072566
Full output

Both the Wusun and Saka are generally accepted to have been the speakers of Indo-Iranian languages. So it's possible that the Shirenzigou nomads were Indo-Iranian speakers too, or at least derived from such peoples.

Surprisingly, NPL_Mebrak_2125BP was the key to obtaining the best statistical fits. This is a trio of samples, roughly contemporaneous with the Shirenzigou nomads, from a burial site high up in the Himalayas in what is now Nepal (see here).

To be honest, I'm not quite sure why the Himalayan ancients work so well in my models. Perhaps they're just a really good proxy for an Iron Age population from the northern edge of the Tibetan Plateau?

By the way, most of the Shirenzigou nomads made it into the latest Global25 datasheets (see here). They can be analyzed in a variety of ways described in this blog post: Getting the most out of the Global25. Below is a screen cap of me comparing the effectiveness of Afanasievo, Sintashta and Wusun samples as proxies for the steppe ancestry in the Shirenzigou nomads with an online tool freely available here. As expected, the algorithm picks Sintashta ahead of Afanasievo, and the Wusun ahead of both.


See also...

They mixed up Huns with Tocharians

Some myths die hard

The mystery of the Sintashta people

Wednesday, August 14, 2019

Did South Caspian hunter-fishers really migrate to Eastern Europe?


The idea that most of the Near Eastern-related ancestry in the ancient populations of the Pontic-Caspian (PC) steppe is, one way or another, sourced from the territory of present-day Iran is a fairly popular one nowadays (for instance, see here). It might turn out to be correct, once there are enough relevant samples to test it properly, but in my opinion the chances of this are slim.

My skepticism is based on literally hours of analyses with the currently available ancients from the Caucaso-Caspian region, like, for instance, the admixture graphs below featuring foragers and early farmers from Russia, Georgia and Iran. The relevant qpGraph and dot files are available here.

Note that the further I move away from Eastern Europe in these graphs when looking for the source of the southern ancestry in the Eneolithic population from the southernmost part of the PC steppe (Piedmont_En), the more difficult it is for me to create a statistically sound model. What might this tell us about the provenance of this so called southern ancestry?




See also...

The PIE homeland controversy: August 2019 status report

Some myths die hard

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, August 2, 2019

The PIE homeland controversy: August 2019 status report


Archeologist David Anthony has a new paper on the Indo-European homeland debate titled Archaeology, Genetics, and Language in the Steppes: A Comment on Bomhard. It's part of a series of articles dealing with Allan R. Bomhard's "Caucasian substrate hypothesis" in the latest edition of The Journal of Indo-European Studies. It's also available, without any restrictions, here.

Any thoughts? Feel free to share them in the comments below. Admittedly, I found this part somewhat puzzling (emphasis is mine):

It was the faint trace of WHG, perhaps 3% of whole Yamnaya genomes, that identified this admixture as coming from Europe, not the Caucasus, according to Wang et al. (2018). Colleagues in David Reich’s lab commented that this small fraction of WHG ancestry could have come from many different geographic places and populations.

I think that's highly optimistic. It really should be obvious by now thanks to archeological and ancient genomic data, including both uniparental and genome-wide variants, that the Yamnaya people were practically entirely derived from Eneolithic populations native to the Pontic-Caspian (PC) steppe. So, in all likelihood, this was also the source of their minor WHG ancestry.

Indeed, they clearly weren't some mishmash of geographically, culturally and genetically disparate groups that had just arrived in Eastern Europe, but the direct descendants of closely related and already significantly Yamnaya-like peoples associated with long-standing PC steppe archeological cultures such as Khvalynsk and Sredny Stog. I discussed this earlier this year, soon after the Wang et al. paper was published:

On Maykop ancestry in Yamnaya

I hope I'm wrong, but I get the feeling that the scientists at the Reich Lab are finding this difficult to accept, because it doesn't gel with their theory that archaic Proto-Indo-European (PIE) wasn't spoken on the PC steppe, but rather south of the Caucasus, and that late or rather nuclear PIE was introduced into the PC steppe by migrants from the Maykop culture who were somehow involved in the formation of the Yamnaya horizon.

Inexplicably, after citing Wang et al. on multiple occasions and arguing against any significant gene flow between Maykop and Yamnaya groups, Anthony fails to mention Steppe Maykop. But the Steppe Maykop people are an awesome argument against the idea that there was anything more than occasional mating between the Maykop and Yamnaya populations, because they were wedged between them, and yet clearly distinct from both, with a surprisingly high ratio of West Siberian forager-related ancestry (see here and here).


Despite all the talk lately about the potential cultural, linguistic and genetic ties between Maykop and Yamnaya, including claims that the latter possibly acquired its wagons from the former, my view is that the Steppe Maykop and Yamnaya wagon drivers may have competed with each other and eventually clashed in a big way. Indeed, take a look at what happens after Yamnaya burials rather suddenly replace those of Steppe Maykop just north of the Caucasus around 3,000 BCE.

Yamnaya_RUS_Caucasus
RUS_Progress_En_PG2001 0.808±0.058
RUS_Steppe_Maykop 0.000
UKR_Sredny_Stog_II_En_I6561 0.192±0.058
chisq 13.859
tail prob 0.383882
Full output

Yep, total population replacement with no significant gene flow between the two groups. Apparently, as far as I can tell, there's not even a hint that a few Steppe Maykop stragglers were incorporated into the ranks of the newcomers. Where did they go? Hard to say for now. Maybe they ran for the hills nearby?

Intriguingly, Anthony reveals a few details about new samples from three different Eneolithic steppe burial sites associated with the Khvalynsk culture:

The Reich lab now has whole-genome aDNA data from more than 30 individuals from three Eneolithic cemeteries in the Volga steppes between the cities of Saratov and Samara (Khlopkov Bugor, Khvalynsk, and Ekaterinovka), all dated around the middle of the fifth millennium BC.

...

Most of the males belonged to Y-chromosome haplogroup R1b1a, like almost all Yamnaya males, but Khvalynsk also had some minority Y-chromosome haplogroups (R1a, Q1a, J, I2a2) that do not appear or appear only rarely (I2a2) in Yamnaya graves.

As far as I can tell, he suggests that they'll be published in the forthcoming Narasimhan et al. paper. If so, it sounds like the paper will have many more ancient samples than its early preprint that was posted at bioRxiv last year.

For me the really fascinating thing in regards to these new samples is how scarce Y-haplogroup R1a appears to have been everywhere before the expansion by the putative Indo-European-speaking steppe ancestors of the Corded Ware culture (CWC) people. It's basically always outnumbered by other haplogroups wherever it's found prior to about 3,000 BCE, even on the PC steppe. But then, suddenly, its R1a-M417 subclade goes BOOM! And that's why I call it...

The beast among Y-haplogroups

At this stage, I'm not sure how to interpret the presence of Y-haplogroup J in the Khvalynsk population. It may or may not be important to the PIE homeland debate. Keep in mind that J is present in two foragers from Karelia and Popovo, northern Russia, dated to the Mesolithic period and with no obvious foreign ancestry. So it need not have arrived north of the Caspian as late as the Eneolithic with migrants rich in southern ancestry from the Caucasus or what is now Iran. In other words, for the time being, the steppe PIE homeland theory appears safe.

See also...

Did South Caspian hunter-fishers really migrate to Eastern Europe?

The PIE homeland controversy: January 2019 status report

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Sunday, July 28, 2019

They mixed up Huns with Tocharians


I don't yet have the genomes from the recent Ning et al. paper on the Iron Age nomads from the Shirenzigou site in the eastern Tian Shan. But I do have most of the previously published data featured in the paper, including the Damgaard et al. 2018 Hun and Saka samples from the western Tian Shan.

After reading the Ning et al. paper between the lines and running a few analyses of my own, it's clear to me that most of the supposedly Tocharian-related Shirenzigou individuals actually share a very close relationship with the Tian Shan Huns, and indeed may have been their ancestors.

For instance, Ning et al. found that a large part of the ancestry of the Shirenzigou ancients could be modeled with the Tian Shan Huns, which was an anachronistic approach because the former are older than the latter. They also found that Ulchi-related ancestry was a key part of the genetic structure of eight out of the ten Shirenzigou individuals, and this likewise appears to be an important part of the genetic structure of the Tian Shan Huns.

Note the strong statistical fits in the Global25/nMonte and qpAdm mixture models below, respectively, which characterize these Huns as a two-way mixture between the Ulchi and the earlier Tian Shan Saka. And keep in mind that the Saka also harbor significant Ulchi-related ancestry.

Hun_Tian_Shan
Saka_Tian_Shan,92
Ulchi,8

distance%=1.2553

Hun_Tian_Shan
Saka_Tian_Shan 0.928±0.009
Ulchi 0.072±0.009

chisq 4.409
tail prob 0.992464
Full output

Moreover, the Shirenzigou males belong to Y-haplogroups Q1a and R1b (two instances of each), and they share the latter with one of the Tian Shan Huns. Judging by the data from the relevant BAM files, it's also possible that the Shirenzigou males share a very rare subclade of R1b with the Hun, defined by the PH155 mutation (see here). The Y-haplogroup assignments for the other Tian Shan Huns end at R and R1, but that's almost certainly due to missing data.

On the other hand, two Tian Shan Sakas belong to Y-haplogroup R1a but none to R1b, which fits with the pattern from currently available ancient DNA that R1a was more common than R1b in Saka-related groups, such as the Scythians and Sarmatians (see here).

This is all very interesting, because the Huns replaced the Saka in the western Tian Shan, and, considering their R1b and excess Ulchi-related ancestry, very likely moved into the region from the direction of Shirenzigou. Indeed, in my opinion a strong argument can now be made that the Iron Age population from the Shirenzigou region took part in the formation of the Hunnic confederacy.

So where does that leave the theory presented by Ning et al. that the Shirenzigou ancients may have been closely related, and perhaps even ancestral, to the Tocharians, simply because they packed a lot of Yamnaya-related and possibly proto-Tocharian Afanasievo ancestry, and were living close to the Tarim Basin, where Tocharian languages were subsequently first attested?

I'm not sure, but I now find it difficult to reconcile this theory with the fact that they were closely related, and probably ancestral, to the Tian Shan Huns. As far as I'm aware, Huns cannot be linked to Tocharians in any meaningful way.

Of course it's possible that different Afanasievo-derived groups were living in the Tarim Basin and surrounds, and, as some merged with new populations pushing into the region from the east and adopted non-Indo-European languages, others retained their Tocharian speech and eventually split into communities speaking Tocharian A, B and apparently also C (see here).

But this has to be demonstrated directly with ancient DNA from archeological sites where Tocharian languages were attested. Till then, I'll keep thinking that Ning et al. wrote a paper about Tocharians that really should've been a paper about Huns.

Here's a famous wall painting of Tocharian princes from the cave of the sixteen sword-bearers in the Tarim Basin, dated to 432–538 AD. They don't look like guys with a lot of Ulchi-related admixture to me, but I might be wrong. Feel free to let me know what you think in the comments below.


Update 08/17/2019: The Shirenzigou nomads are now in my dataset. Below are a few successful and not so successful qpAdm mixture models for them. Note that I tried to use a wide range of relevant "right pops", but also retain a lot of markers, specifically to be able to discriminate between different types of steppe and steppe-derived sources of gene flow (refer to the full output). Admittedly, the Shirenzigou nomads can be modeled with Afanasievo-related ancestry, but...

CHN_Shirenzigou_IA
KAZ_Botai 0.161±0.023
KAZ_Wusun 0.490±0.023
NPL_Mebrak_2125BP 0.349±0.019

chisq 5.793
tail prob 0.926172
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.143±0.022
NPL_Mebrak_2125BP 0.295±0.019
Saka_Tian_Shan 0.562±0.024

chisq 6.796
tail prob 0.870794
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.185±0.023
NPL_Mebrak_2125BP 0.428±0.021
RUS_Sintashta_MLBA 0.270±0.026
TJK_Sarazm_En 0.117±0.027

chisq 11.351
tail prob 0.414345
Full output

CHN_Shirenzigou_IA
KAZ_Botai 0.032±0.027
KAZ_Zevakinskiy_LBA 0.567±0.025
NPL_Mebrak_2125BP 0.401±0.019

chisq 15.157
tail prob 0.232961
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.452±0.031
RUS_Afanasievo 0.435±0.025
RUS_Okunevo_BA 0.114±0.049

chisq 19.808
tail prob 0.0708003
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.409±0.031
RUS_Okunevo_BA 0.173±0.050
Yamnaya_RUS_Caucasus 0.418±0.026

chisq 20.453
tail prob 0.0589872
Full output

CHN_Shirenzigou_IA
NPL_Mebrak_2125BP 0.464±0.033
RUS_Okunevo_BA 0.104±0.053
Yamnaya_RUS_Samara 0.432±0.027

chisq 27.189
tail prob 0.0072566
Full output

Both the Wusun and Saka are generally accepted to have been the speakers of Indo-Iranian languages. So it's possible that the Shirenzigou nomads were Indo-Iranian speakers too, or at least derived from such peoples.

Surprisingly, NPL_Mebrak_2125BP was the key to obtaining the best statistical fits. This is a trio of samples, roughly contemporaneous with the Shirenzigou nomads, from a burial site high up in the Himalayas in what is now Nepal (see here).

To be honest, I'm not quite sure why the Himalayan ancients work so well in my models. Perhaps they're just a really good proxy for an Iron Age population from the northern part of the Tibetan Plateau? By the way, most of the Shirenzigou nomads made it into the latest Global25 datasheets (see here).

See also...

Almost everything you ever wanted to know about the Xiaohe-Gumugou cemeteries

The mystery of the Sintashta people

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, July 26, 2019

Afanasievo people may well have been proto-Tocharian speakers (Ning et al. 2019)


Update 17/08/2019: A surprising twist to the Shirenzigou nomads story

...

During the Early Bronze Age, around 2,900 BCE, a population associated with the Yamnaya archeological culture migrated from the Pontic-Caspian steppe in Eastern Europe deep into Asia, as far as the Minusinsk Basin in South Siberia.

This rapid, long-range expansion was likely to have been the first significant migration of a Yamnaya-related group far to the east of the Ural Mountains, and it resulted in the formation of the Afanasievo archeological culture (see here).

The appearance of Tocharian languages in the Tarim Basin, in what is now western China, is often associated with the Afanasievo culture, mainly because of the confirmed presence of European-related populations in the Tarim Basin during the Bronze Age, as well as the likely highly divergent position of the Tocharian node in the Indo-European language phylogeny.

But the Afanasievo people were separated by considerable distance in space and time from the Tocharians, and can't yet be reliably linked to them with archeological or genetic data. So even though the inference that the former are linguistically ancestral to the latter is quite plausible, it's far from certain.

However, thanks to a new paper at Current Biology by Ning et al., at least we now know that a population with significant Yamnaya/Afanasievo-related ancestry was living in the eastern Tian Shan Mountains just a few hundred years before Tocharian languages were attested nearby [LINK]. Below is the paper summary, emphasis is mine:

Recent studies of early Bronze Age human genomes revealed a massive population expansion by individuals-related to the Yamnaya culture, from the Pontic Caspian steppe into Western and Eastern Eurasia, likely accompanied by the spread of Indo-European languages [1, 2, 3, 4, 5]. The south eastern extent of this migration is currently not known. Modern-day human populations from the Xinjiang region in northwestern China show a complex population history, with genetic links to both Eastern and Western Eurasia [6, 7, 8, 9, 10]. However, due to the lack of ancient genomic data, it remains unclear which source populations contributed to the Xinjiang population and what was the timing and the number of admixture events. Here, we report the first genome-wide data of 10 ancient individuals from northeastern Xinjiang. They are dated to around 2,200 years ago and were found at the Iron Age Shirenzigou site. We find them to be already genetically admixed between Eastern and Western Eurasians. We also find that the majority of the East Eurasian ancestry in the Shirenzigou individuals is-related to northeastern Asian populations, while the West Eurasian ancestry is best presented by ∼20% to 80% Yamnaya-like ancestry. Our data thus suggest a Western Eurasian steppe origin for at least part of the ancient Xinjiang population. Our findings furthermore support a Yamnaya-related origin for the now extinct Tocharian languages in the Tarim Basin, in southern Xinjiang.


Ning et al., Ancient Genomes Reveal Yamnaya-Related Ancestry and a Potential Source of Indo-European Speakers in Iron Age Tianshan, Current Biology, July 25, 2019, DOI: https://doi.org/10.1016/j.cub.2019.06.044

See also...

It was always going to be this way

The mystery of the Sintashta people

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, July 17, 2019

Viking invasion at bioRxiv


A new preprint featuring hundreds of Viking Age genomes has appeared at bioRxiv [LINK]. Titled Population genomics of the Viking world, it looks like a solid effort overall, although I'm skeptical about its conclusions. I might elaborate on that in the comments below, but I'll have a lot more to say on the topic if and when I get to check out the ancient genomes with my own tools. Details about the new samples, including their Y-chromosome haplogroup assignments, are available here. Below is the abstract, emphasis is mine:

The Viking maritime expansion from Scandinavia (Denmark, Norway, and Sweden) marks one of the swiftest and most far-flung cultural transformations in global history. During this time (c. 750 to 1050 CE), the Vikings reached most of western Eurasia, Greenland, and North America, and left a cultural legacy that persists till today. To understand the genetic structure and influence of the Viking expansion, we sequenced the genomes of 442 ancient humans from across Europe and Greenland ranging from the Bronze Age (c. 2400 BC) to the early Modern period (c. 1600 CE), with particular emphasis on the Viking Age. We find that the period preceding the Viking Age was accompanied by foreign gene flow into Scandinavia from the south and east: spreading from Denmark and eastern Sweden to the rest of Scandinavia. Despite the close linguistic similarities of modern Scandinavian languages, we observe genetic structure within Scandinavia, suggesting that regional population differences were already present 1,000 years ago. We find evidence for a majority of Danish Viking presence in England, Swedish Viking presence in the Baltic, and Norwegian Viking presence in Ireland, Iceland, and Greenland. Additionally, we see substantial foreign European ancestry entering Scandinavia during the Viking Age. We also find that several of the members of the only archaeologically well-attested Viking expedition were close family members. By comparing Viking Scandinavian genomes with present-day Scandinavian genomes, we find that pigmentation-associated loci have undergone strong population differentiation during the last millennia. Finally, we are able to trace the allele frequency dynamics of positively selected loci with unprecedented detail, including the lactase persistence allele and various alleles associated with the immune response. We conclude that the Viking diaspora was characterized by substantial foreign engagement: distinct Viking populations influenced the genomic makeup of different regions of Europe, while Scandinavia also experienced increased contact with the rest of the continent.

Margaryan et al., Population genomics of the Viking world, bioRxiv, posted July 17, 2019, doi: https://doi.org/10.1101/703405

See also...

They came, they saw, and they mixed

Who were the people of the Nordic Bronze Age?

Asiatic East Germanics

Monday, July 15, 2019

Asiatic East Germanics


Around a third of the ancient individuals in my dataset associated with East Germanic-speaking cultures show obvious ancestry from Central and/or West Asia.

This shouldn't be too surprising, considering, for instance, the well documented contacts between East Germanic tribes and the Avars, Huns, Sarmatians and other nomadic groups that streamed into Europe from the Asian steppes during the Migration Period. It's a topic that I've raised before at this blog (see here).

But the curious thing is that very little, if any, of this ancestry has percolated down to present-day Europeans.

The easiest way to show this is with a Principal Component Analysis (PCA) based on my Global25 data. The relevant PCA datasheet can be downloaded here. Basic details about the ancient samples in the analysis are available here.

Some of the Northeastern European populations, particularly the Uralic speakers, appear to be attracted to the Hunnic cluster. However, this is mostly an artifact of pre-Migration Period east to west population expansions in the far north of Europe, probably including those of the Proto-Uralians (see here).

So how is it that, despite ruling over vast areas of Europe for hundreds of years, the East Germanics appear not to have contributed significantly to the present-day European gene pool? My theory is that, much like the Avars and Huns, they were militarily and demographically overwhelmed by the ascending groups around them, such as the Slavs, and they simply went extinct.

To wrap things up, here's a basic qpAdm mixture model designed to test for Hunnic-related ancestry in a few Eastern and Northern European populations of interest. Note the significant slice of this type of ancestry in the likely early Goths of the Chernyakhiv culture. Is it real? Feel free to share your thoughts in the comments below.

UKR_Chernyakhiv
DEU_MA 0.863±0.038
Hun_Tian_Shan 0.137±0.038
chisq 12.525
tail prob 0.325466
Full output

Swedish
Baltic_EST_IA 0.126±0.078
DEU_MA 0.849±0.073
Hun_Tian_Shan 0.025±0.020
chisq 8.338
tail prob 0.595877
Full output

Ukrainian
Baltic_EST_IA 0.121±0.064
DEU_MA 0.857±0.060
Hun_Tian_Shan 0.022±0.017
chisq 11.458
tail prob 0.322956
Full output

Estonian
Baltic_EST_IA 0.597±0.069
DEU_MA 0.373±0.064
Hun_Tian_Shan 0.030±0.017
chisq 15.739
tail prob 0.107361
Full output

See also...

Conan the Barbarian probably belonged to Y-haplogroup R1a

More on the association between Uralic expansions and Y-haplogroup N

Uralic-specific genome-wide ancestry did make a signifcant impact in the East Baltic

Friday, July 12, 2019

Getting the most out of the Global25


The first thing you need to know about the Global25 is that I update the relevant datasheets regularly, usually every few weeks, but they're always at these links:

Global25 datasheet (scaled)

Global25 pop averages (scaled)

Global25 datasheet

Global25 pop averages

Each sample has a population code and an individual code. The population codes represent the countries, ethnic groups and/or archeological affinities of the samples, and I often modify these codes to suit my needs. On the other hand, the individual codes are unique to most of the samples and I usually don't change them.

So if you'd like to know more details about the samples try searching for their individual codes via a decent online search engine. Basic information about many of the samples is also available in the "anno" files here.

The main purpose of the Global25 is to provide data for mixture modeling. In other words, for estimating ancestry proportions, both ancient and modern (see here). This can be done on your computer with the R program and the nMonte R script, or online with the Global25 nMonte Runner, which I discuss below.

If you don't have R installed on your computer, you can get it here, while nMonte is available here. For this tutorial please download nMonte and nMonte3, and store them in your main working folder (usually My Documents).

Once you have R set up, make sure its working directory is the same place where you stored nMonte. You can check this in R by clicking on "File" and then "Change dir". Additionally, you'll need two nMonte input files in the working directory titled "data" and "target". Examples of these files are available here. We'll be using them to test the ancient ancestry proportions of a sample set from present-day England.

Before you can begin the analysis you need to first call the nMonte script by typing or copy pasting source('nMonte.R') into the R console window, and then hitting "enter" on your keyboard. This is what you should see in the R console window afterwards.


To start the mixture modeling process, type or copy paste getMonte('data.txt', 'target.txt') into the R console window, hit "enter", and wait for the results. After a short time, probably less than a minute or two, you should see this output.


The data and target files contain population averages, and, as you can see, the results that these population averages produced were in line with what one would expect from such a model focusing on the genetic shifts in Northern Europe during the Late Neolithic. Very similar ancient ancestry proportions have been reported for the English and other Northern Europeans recently in scientific literature.

However, when focusing on exceptionally fine-scale genetic variation that isn't reflected too well in the Global25 population averages, a more effective strategy might be to use multiple individuals from each reference population and let nMonte3 aggregate and average the inferred ancestry proportions.

This is often the case when attempting to model ancestry proportions for more recent periods, such as the Middle Ages. So let's try this with the English sample set using a modified data file, which is available here.

Replace the old data file with the new one in your working directory, and, like before, copy paste into the R console window the following two commands, hitting "enter" after each one: source('nMonte3.R') and getMonte('data.txt', 'target.txt'). This is what you should eventually see.


It's difficult to say how accurate these estimates are. But they look more or less correct considering the limited and less than ideal reference samples. For instance, the individuals labeled SWE_Viking_Age_Sigtuna are supposed to be stand ins for Danish and Norwegian Vikings, but they're a relatively heterogeneous group from Sweden, possibly with some British or Irish ancestry, so they might be skewing the results.

However, I'll be adding many more ancient samples to the Global25 datasheets as they become available, including lots of new Vikings, which should greatly improve the accuracy of these sorts of fine-scale mixture models.

An alternative to the R-based approach is the online Global25 nMonte Runner [LINK]. This is a free tool, and easy to work with via several drop down menus, but users must become sponsors to unlock all of its available features. To run an analysis follow these three steps:
1) use the first drop down menu to pick the reference populations of your choice (up to four are allowed for free users)

2) move down to the second set of the drop down lists and either pick a test population that is already in the system or copy paste a set of Global25 coordinates into the space labeled "Enter/Paste Sets of Coordinates - Scaled and Comma-separated"

3) feel free to experiment with the additional options if you're game and willing to part with a little cash to help pay for the site.


Another exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is freely available HERE. And it works offline too, after downloading the web page onto your computer. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen cap below shows me doing just that.


However, it's important to note that the Global25 is a Principal Component Analysis (PCA), so it makes good sense to also use it for producing PCA graphs. To do this just plot any combination of two or three of its Principal Components (PCs) to create 2D or 3D graphs, respectively. This can be done with a wide variety of programs, including PAST, which is freely available here.

To produce a 2D graph, open a Global25 datasheet in PAST, choose comma as the separator, highlight any two columns of data, click on the "Plot" tab and, from the drop down list, pick "XY graph". Below is a series of graphs that I created in exactly this way. I also color coded the samples according to their geographic origins. This was done by ticking the "Row attributes" tab.


PAST can also be used to run PCA on subsets of the Global25 scaled data to produce remarkably accurate plots of fine-scale population structure. For instance, here's a plot based on present-day populations from north of the Alps, Balkans and Pyrenees.


To try this create a new text file with your choice of populations from the Global25 scaled datasheet, open it with PAST and choose Multivariate > Ordination > Principal Components Analysis. I've already put together several datasheets limited to European, Northern European, West Eurasian and South Asian populations. They're available at the links below along with more details on how to run them with PAST.

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

And if you're fond of tree-like structures as a means to describe fine-scale genetic variation, please check out this blog post...

Global25 workshop 4: a neighbour joining tree

Wednesday, July 10, 2019

Global25 workshop 4: a neighbour joining tree


Phylogenetic trees are easy to produce, but there's an infinite number of ways to run them, and, depending on the input data you're using, some methods are a lot more effective than others. In this tutorial I'm going to demonstrate one method that has worked well for me when looking at the fine scale genetic relationships between ancient and present-day human populations with my Global25 data.

To get started download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Clustering > Neighbour joining. Here's a screen cap of me doing just that...


Then, from the tabs on the right, choose Chord as the similarity index and MAR_Iberomaurusian, the most distinct unit in the datasheet, as the root. PAST offers an exceptionally large range of similarity indices and they generally produce similar results, but, in my experience, Chord creates among the most visually pleasing outcomes when dealing with fine scale genetic substructures.


This is the tree you should see after exporting the image via the graph settings tab in PAST, and, if you like, rotating it 90 degrees with an image editing software of your choice. Note the fairly substantial differences between the populations from Northwestern Europe, which are often difficult to tease apart in such analyses.


If you have your own Global25 coordinates you can add them to my PAST-compatible datasheet to see where you cluster in this tree. And, of course, you can design your own PAST-compatible datasheets and trees with any combination of populations and/or individuals from the Global25 text files at the links below. It's easy; just copy paste the coordinates of your choice into an empty text file, open it with PAST and then save it with the dat extension to create a new PAST datasheet. But make sure never to mix up the scaled and non-scaled coordinates.

Global25 datasheet (scaled)

Global25 pop averages (scaled)

Global25 datasheet

Global25 pop averages

An important point to keep in mind when running these sorts of analyses is that PAST and other such programs need enough genetic differentiation to latch onto in order to produce meaningful results. Thus, even when studying the relationships between very closely related populations, it's not just useful to include a root population or individual, but also some near and far related groups to help the analysis algorithm flesh out the key genetic substructures.

To be honest, I don't really know whether using the Chord index and rooting the tree with MAR_Iberomaurusian is the best way to run a neighbour joining tree analysis of ancient and present-day West Eurasian genetic variation. What do you think? Feel free to let me know in the comments.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)