search this blog

Monday, July 15, 2019

Asiatic East Germanics

Around a third of the ancient individuals in my dataset associated with East Germanic-speaking cultures show obvious ancestry from Central and/or West Asia.

This shouldn't be too surprising, considering, for instance, the well documented contacts between East Germanic tribes and the Avars, Huns, Sarmatians and other nomadic groups that streamed into Europe from the Asian steppes during the Migration Period. It's a topic that I've raised before at this blog (see here).

But the curious thing is that very little, if any, of this ancestry has percolated down to present-day Europeans.

The easiest way to show this is with a Principal Component Analysis (PCA) based on my Global25 data. The relevant PCA datasheet can be downloaded here. Basic details about the ancient samples in the analysis are available here.

Some of the Northeastern European populations, particularly the Uralic speakers, appear to be attracted to the Hunnic cluster. However, this is mostly an artifact of pre-Migration Period east to west population expansions in the far north of Europe, probably including those of the Proto-Uralians (see here).

So how is it that, despite ruling over vast areas of Europe for hundreds of years, the East Germanics appear not to have contributed significantly to the present-day European gene pool? My theory is that, much like the Avars and Huns, they were militarily and demographically overwhelmed by the ascending groups around them, such as the Slavs, and they simply went extinct.

To wrap things up, here's a basic qpAdm mixture model designed to test for Hunnic-related ancestry in a few Eastern and Northern European populations of interest. Note the significant slice of this type of ancestry in the likely early Goths of the Chernyakhiv culture. Is it real? Feel free to share your thoughts in the comments below.

DEU_MA 0.863±0.038
Hun_Tian_Shan 0.137±0.038
chisq 12.525
tail prob 0.325466
Full output

Baltic_EST_IA 0.126±0.078
DEU_MA 0.849±0.073
Hun_Tian_Shan 0.025±0.020
chisq 8.338
tail prob 0.595877
Full output

Baltic_EST_IA 0.121±0.064
DEU_MA 0.857±0.060
Hun_Tian_Shan 0.022±0.017
chisq 11.458
tail prob 0.322956
Full output

Baltic_EST_IA 0.597±0.069
DEU_MA 0.373±0.064
Hun_Tian_Shan 0.030±0.017
chisq 15.739
tail prob 0.107361
Full output

See also...

Conan the Barbarian probably belonged to Y-haplogroup R1a

More on the association between Uralic expansions and Y-haplogroup N

Uralic-specific genome-wide ancestry did make a signifcant impact in the East Baltic

Friday, July 12, 2019

Getting the most out of the Global25

The first thing you need to know about the Global25 is that I update the relevant datasheets regularly, usually every few weeks, but they're always at these links:

Global25 datasheet (scaled)

Global25 pop averages (scaled)

Global25 datasheet

Global25 pop averages

Each sample has a population code and an individual code. The population codes represent the countries, ethnic groups and/or archeological affinities of the samples, and I often modify these codes to suit my needs. On the other hand, the individual codes are unique to most of the samples and I usually don't change them.

So if you'd like to know more details about the samples try searching for their individual codes via a decent online search engine. Basic information about many of the samples is also available in the "anno" files here.

The main purpose of the Global25 is to provide data for mixture modeling. In other words, for estimating ancestry proportions, both ancient and modern (see here). This can be done on your computer with the R program and the nMonte R script, or online with the Global25 nMonte Runner, which I discuss below.

If you don't have R installed on your computer, you can get it here, while nMonte is available here. For this tutorial please download nMonte and nMonte3, and store them in your main working folder (usually My Documents).

Once you have R set up, make sure its working directory is the same place where you stored nMonte. You can check this in R by clicking on "File" and then "Change dir". Additionally, you'll need two nMonte input files in the working directory titled "data" and "target". Examples of these files are available here. We'll be using them to test the ancient ancestry proportions of a sample set from present-day England.

Before you can begin the analysis you need to first call the nMonte script by typing or copy pasting source('nMonte.R') into the R console window, and then hitting "enter" on your keyboard. This is what you should see in the R console window afterwards.

To start the mixture modeling process, type or copy paste getMonte('data.txt', 'target.txt') into the R console window, hit "enter", and wait for the results. After a short time, probably less than a minute or two, you should see this output.

The data and target files contain population averages, and, as you can see, the results that these population averages produced were in line with what one would expect from such a model focusing on the genetic shifts in Northern Europe during the Late Neolithic. Very similar ancient ancestry proportions have been reported for the English and other Northern Europeans recently in scientific literature.

However, when focusing on exceptionally fine-scale genetic variation that isn't reflected too well in the Global25 population averages, a more effective strategy might be to use multiple individuals from each reference population and let nMonte3 aggregate and average the inferred ancestry proportions.

This is often the case when attempting to model ancestry proportions for more recent periods, such as the Middle Ages. So let's try this with the English sample set using a modified data file, which is available here.

Replace the old data file with the new one in your working directory, and, like before, copy paste into the R console window the following two commands, hitting "enter" after each one: source('nMonte3.R') and getMonte('data.txt', 'target.txt'). This is what you should eventually see.

It's difficult to say how accurate these estimates are. But they look more or less correct considering the limited and less than ideal reference samples. For instance, the individuals labeled SWE_Viking_Age_Sigtuna are supposed to be stand ins for Danish and Norwegian Vikings, but they're a relatively heterogeneous group from Sweden, possibly with some British or Irish ancestry, so they might be skewing the results.

However, I'll be adding many more ancient samples to the Global25 datasheets as they become available, including lots of new Vikings, which should greatly improve the accuracy of these sorts of fine-scale mixture models.

An alternative to the R-based approach is the online Global25 nMonte Runner [LINK]. This is a free tool, and easy to work with via several drop down menus, but users must become sponsors to unlock all of its available features. To run an analysis follow these three steps:
1) use the first drop down menu to pick the reference populations of your choice (up to four are allowed for free users)

2) move down to the second set of the drop down lists and either pick a test population that is already in the system or copy paste a set of Global25 coordinates into the space labeled "Enter/Paste Sets of Coordinates - Scaled and Comma-separated"

3) feel free to experiment with the additional options if you're game and willing to part with a little cash to help pay for the site.

However, it's important to note that the Global25 is a Principal Component Analysis (PCA), so it makes good sense to also use it for producing PCA graphs. To do this just plot any combination of two or three of its Principal Components (PCs) to create 2D or 3D graphs, respectively. This can be done with a wide variety of programs, including PAST, which is freely available here.

To produce a 2D graph, open a Global25 datasheet in PAST, choose comma as the separator, highlight any two columns of data, click on the "Plot" tab and, from the drop down list, pick "XY graph". Below is a series of graphs that I created in exactly this way. I also color coded the samples according to their geographic origins. This was done by ticking the "Row attributes" tab.

PAST can also be used to run PCA on subsets of the Global25 scaled data to produce remarkably accurate plots of fine-scale population structure. To try this create a new text file with your choice of populations from the Global25 scaled datasheet, open it with PAST and choose Multivariate > Ordination > Principal Components Analysis. I've already put together several datasheets limited to European, Northern European, West Eurasian and South Asian populations. They're available at the links below along with more details on how to run them with PAST.

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

And if you're fond of tree-like structures as a means to describe fine-scale genetic variation, please check out this blog post...

Global25 workshop 4: a neighbour joining tree

Wednesday, July 10, 2019

Global25 workshop 4: a neighbour joining tree of ancient and present-day West Eurasian genetic variation

Phylogenetic trees are easy to produce, but there's an infinite number of ways to run them, and, depending on the input data you're using, some methods are a lot more effective than others. In this tutorial I'm going to demonstrate one method that has worked well for me when looking at the fine scale genetic relationships between ancient and present-day human populations with my Global25 data.

To get started download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Clustering > Neighbour joining. Here's a screen cap of me doing just that...

Then, from the tabs on the right, choose Chord as the similarity index and MAR_Iberomaurusian, the most distinct unit in the datasheet, as the root. PAST offers an exceptionally large range of similarity indices and they generally produce similar results, but, in my experience, Chord creates among the most visually pleasing outcomes when dealing with fine scale genetic substructures.

This is the tree you should see after exporting the image via the graph settings tab in PAST, and, if you like, rotating it 90 degrees with an image editing software of your choice. Note the fairly substantial differences between the populations from Northwestern Europe, which are often difficult to tease apart in such analyses.

If you have your own Global25 coordinates you can add them to my PAST-compatible datasheet to see where you cluster in this tree. And, of course, you can design your own PAST-compatible datasheets and trees with any combination of populations and/or individuals from the Global25 text files at the links below. It's easy; just copy paste the coordinates of your choice into an empty text file, open it with PAST and then save it with the dat extension to create a new PAST datasheet. But make sure never to mix up the scaled and non-scaled coordinates.

Global 25 datasheet (scaled)

Global 25 pop averages (scaled)

Global 25 datasheet

Global 25 pop averages

An important point to keep in mind when running these sorts of analyses is that PAST and other such programs need enough genetic differentiation to latch onto in order to produce meaningful results. Thus, even when studying the relationships between very closely related populations, it's not just useful to include a root population or individual, but also some near and far related groups to help the analysis algorithm flesh out the key genetic substructures.

To be honest, I don't really know whether using the Chord index and rooting the tree with MAR_Iberomaurusian is the best way to run a neighbour joining tree analysis of ancient and present-day West Eurasian genetic variation. What do you think? Feel free to let me know in the comments.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)

Sunday, July 7, 2019

Open thread: How did steppe ancestry spread into the Biblical-era Levant?

It's likely that at least two of the Philistines from Feldman et al. 2019 harbor relatively recent steppe ancestry. They're labeled ASH067 and ASH068 in the paper. The former individual is a male who belongs to Y-chromosome haplogroup R1, which appears to be R1b-M269 judging by the data from the relevant BAM file.

This is just the second instance of Y-haplogroup R1 from the pre-Crusades Levant, and, of course, neither R1 nor R1b-M269 appear in the Near Eastern ancient DNA record prior to the expansions of the Yamnaya and other closely related pastoralist groups from the steppes and forest steppes of Eastern Europe.

So how did the Yamnaya-related ancestry spread into the Biblical-era Levant? Did it come via Anatolia, the Caucasus and/or the Mediterranean?

To try and answer this question I analyzed separately the genome-wide data for ASH067 and ASH068 with qpAdm, relying on outgroup and reference populations that weren't featured in the qpAdm runs in the Feldman et al. paper. I also limited the analyses to what were in my view the most proximate two- and three-way solutions in terms of chronology and geography.

The models with the best statistical fits, each labeled with their "tail probs", are available in a zip file here. From my experience with qpAdm, I'd say that the most useful models generally show comparably high tail probs but low chisq values and standard errors. Please note also that I discarded all of the models with at least one standard error higher than 0.2 and/or based on less than 100K SNPs.

As far as I can see, these two are among the very best outcomes. Bell_Beaker_FRA are nine samples associated with the Bell Beaker culture (BBC) from what is now France. Interestingly, the BBC population was rich in Y-haplogroup R1b-M269.

Bell_Beaker_FRA 0.116±0.059
GRC_Minoan 0.507±0.111
Levant_ISR_Ashkelon_LBA 0.377±0.117
tail prob 0.530432
chisq 9.018

Bell_Beaker_FRA 0.237±0.044
GRC_Minoan 0.763±0.044
tail prob 0.943265
chisq 4.736

In my opinion, these models basically confirm that both ASH067 and ASH068 harbor Yamnaya-related ancestry. It's heavily diluted and minor, but it's there. Admittedly, even after looking over the qpAdm output several times, I'm still not quite sure how their ancestors acquired this ancestry. But for the time being, Mediterranean Europe appears to be the most plausible proximate source one way or another. Any thoughts about that? Feel free to share them in the comments below.

See also...

Evidence of European ancestry in the Philistines

R1b-M269 in the Bronze Age Levant

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Wednesday, July 3, 2019

Evidence of European ancestry in the Philistines

The abstract below has just appeared at the European Nucleotide Archive (see here), so I'm guessing that the relevant paper and accompanying ancient genome-wide data will be published within weeks if not days. Emphasis is mine:

The ancient Mediterranean port-city of Ashkelon, identified as “Philistine” during the Iron Age, underwent a dramatic cultural change between the Late Bronze- and the early Iron- Age. It has been long debated whether this change was driven by a substantial movement of people, possibly linked to a larger migration of the so-called “Sea Peoples”. Here, we report genome-wide data of ten Bronze- and Iron- Age individuals from Ashkelon. We find that the early Iron Age population was genetically distinct due to a European related admixture. Interestingly, this genetic signal is no longer detectible in the later Iron Age population. Our results support that a migration event occurred during the Bronze- to Iron- Age transition in Ashkelon but did not leave a long-lasting genetic signature.

Update 4/7/2019: The paper is now available at Science Advances [LINK]. One of the Ashkelon ancients, who also shows a relatively high level of European ancestry, belongs to Y-Chromosome haplogroup R1 (probably R1b-M269). I've updated my Global25 datasheets with the new samples. Look for the Levant_ISR_Ashkelon prefix. Same links as always...

Global 25 datasheet (scaled)

Global 25 pop averages (scaled)

Global 25 datasheet

Global 25 pop averages

This is how they cluster in my Principal Component Analysis (PCA) of ancient West Eurasian genetic variation. The relevant datasheet is available here. Based on these results, it's tempting to think that the European ancestry in the Philistines may have been of Greek provenance. But keep in mind that this is just a two dimensional view and a simplification of reality. I'll have more to say about the ancestry of these individuals and the origins of the Philistines in future blog posts.

See also...

Five foot Philistines

How did steppe ancestry spread into the Biblical-era Levant?

Monday, July 1, 2019

Almost everything you ever wanted to know about the Xiaohe-Gumugou cemeteries

I'm reading an interesting and very comprehensive new archeological thesis about the Tarim Basin mummies. It's freely available via Uppsala University's DiVA portal here:

Shifting Memories: Burial Practices and Cultural Interaction in Bronze Age China: A study of the Xiaohe-Gumugou cemeteries in the Tarim Basin

The author, Yunyun Yang, has some suggestions for the future direction of research on the topic:

1. Analysis of Y chromosomal DNA on the males from 4th-1st layers of the Xiaohe cemetery: it is not clear if they were genetically distinct from the Afanasievo (and Yamnaya) males, and consistent to the Andronovo males.

2. More research on ancient DNA of the six males buried in type I the sun-radiating-spokes graves: the six males were so different in the Gumugou cemetery, and we don't know who they were. In this study, it has been suggested that they came from the parallel Andronovo horizon, and preserved some of their original social identities.

3. Analysis of the white sticky materials painted on the dead’s hair, faces, and bodies: it is not clear what this material is. It might be application of dairy/milk products with some holy functions. And the interesting point is why the dead was painted on such materials, for holy reasons, and/or was embalmed that way for preventing decay of the dead bodies?

4. Research on the use of Ephedra plants: Ephedra twigs were common and important in both cemeteries. Were they related to the “Soma” in ancient India (Vedas) and/or “Haoma” in ancient Iran (Avesta)? Were the Ephedra twigs related to the body painting (whitish sticky materials painting on skins of the dead)? Was there a common use of Ephedra plant in more nomadic groups in the Eurasian Steppe?

5. Research on the comparisons between the Andronovo burials and the stone circular-kerbs with stone-pits in Xinjiang: a major obstacle to such research is the language barriers, with the material published in English, Chinese and Russian. Such research is, however, essential to understand the conjunction of the geographical areas, the expansion of nomadic groups, the spreading of horses and wagons (linked to the noble groups of the Shang Dynasty (1600-1046 BCE) in central China), the formation of the Silk Road in this area (till the expansion of Han Dynasty (206 BCE-220 CE)), the moving of Indo-Iranians, the expansion of Scythians (900 BCE-400 CE), etc.

I agree, but I'd also add that we need a good number of ancient Y-chromosome and genome-wide samples from across space and time in the Tarim Basin, including and especially from attested Tocharian-speaking communities. That's really the only way to figure out whether the Tarim Basin mummies belonged to the speakers of Indo-Iranian or Tocharian languages, and whether the latter were introduced into the region by migrants from the Afanasievo culture.


Yang, Yunyun, Shifting Memories: Burial Practices and Cultural Interaction in Bronze Age China: A study of the Xiaohe-Gumugou cemeteries in the Tarim Basin, URN: urn:nbn:se:uu:diva-386612

Update 2/7/2019: OK, it looks like there's a paper coming soon with Iron Age samples (~200 BCE) from eastern Xinjiang. As far as I know, this was likely to have been a Tocharian-speaking region at the time. In any case, BAM files for the samples have already been uploaded to the European Nucleotide Archive and the accompanying text suggests that they harbor Yamnaya-related ancestry (see here).

See also...

Another look at the ancient mtDNA from Xiaohe, Tarim Basin

On the doorstep of India

The mystery of the Sintashta people

Late PIE ground zero now obvious; location of PIE homeland still uncertain, but...

Friday, June 28, 2019

On the origin of the Gravettians (Bennett et al. 2019 preprint)

Over at bioRxiv at this LINK. No major surprises, as far as I can see. From the preprint, emphasis is mine:

The Gravettian technocomplex was present in Europe from more than 30,000 years ago until the Last Glacial Maximum, but the source of this industry and the people who manufactured it remain unsettled. We use genome-wide analysis of a ~36,000-year-old Eastern European individual (BuranKaya3A) from Buran-Kaya III in Crimea, the earliest documented occurrence of the Gravettian, to investigate relationships between population structures of Upper Palaeolithic Europe and the origin and spread of the culture. We show BuranKaya3A to be genetically close to both contemporary occupants of the Eastern European plain and the producers of the classical Gravettian of Central Europe 6,000 years later. These results support an Eastern European origin of an Early Gravettian industry practiced by members of a distinct population, who contributed ancestry to individuals from much later Gravettian sites to the west.


The mitochondrial haplogroup of BuranKaya3A was determined to belong to an early branch of the N lineage, N1.


In addition, the N1 of BuranKaya3A carries three of the eight mutations occurring prior to N1b, a rare haplogroup most highly concentrated in the Near East, yet appearing broadly from western Eurasia to Africa. The descendants of the N1b node include N1b2, currently found only in Somalia [22], and N1b1b, found in nearly 10% of Ashkenazi Jewish haplogroups [23]. These three mutations allow us to place BuranKaya3A on a lineage apart from that which has been proposed to later enter Europe from Anatolia during the Neolithic (N1a1a) [24]. Among ancient samples, the mitochondrial sequence of an 11,000-year-old Epipalaeolithic Natufian from the Levant (“Natufian9”) [25] is also a later derivative of this N1b branch.


From the reads mapping to the Y chromosome, six out of six Single Nucleotide Polymorphisms (SNPs) that overlap with diagnostic sites for Y-haplogroup BT all carry the derived allele, allowing a minimum assignment to BT, which has origins in Africa, with additional derived alleles suggesting an eventual placement of CT or C, found in Asia and the Epipalaeolithic Near East [25]. Additional ancestral alleles make an assignment of C1a2 or C1b, which appear in UP Europe [1], unlikely (see Table S3 for a summary and comparative placement of Palaeolithic Y-haplogroups, and Supplementary Data 1 for a complete list of Y diagnostic SNPs).

Bennett et al., The origin of the Gravettians: genomic evidence from a 36,000-year-old Eastern European, bioRxiv, posted June 28, 2019, doi:

Monday, June 24, 2019

Genetic substructures and adaptations in Lithuanians (Urnikyte et al. 2019)

Over at Scientific Reports at this LINK. Apparently, the genotype data from this paper will be available at figshare in just over three months (see here). Among other things, the paper makes some interesting points about the relationship between the genetic ancestry of Lithuanians and their language:

Partial genetic isolation of the Lithuanians is a possible explanation for the structure results observed. Until the late Middle Ages, the eastern Baltic region was one of the most isolated corners of Europe [27]. Moreover, after the fall of the Roman Empire in the 5th century, the eastern Baltic region was spared by the subsequent population movements of the Migration Period [26,28], which allowed the most archaic of all the living speaking Indo-European languages [1] to survive. Thus, Lithuanians could retain their cultural identity.

Urnikyte et al., Patterns of genetic structure and adaptive positive selection in the Lithuanian population from high-density SNP data, Scientific Reports volume 9, Article number: 9163 (2019), DOI:

See also...

Fresh off the sledge

Uralic-specific genome-wide ancestry did make a signifcant impact in the East Baltic

It was always going to be this way

Inferring the linguistic affinity of long dead and non-literate peoples: a multidisciplinary approach

Saturday, June 15, 2019

Not Bell Beaker, not Corded Ware, but...the SGBR complex

I'd be very grateful if someone could explain to me what this new paper at the Proceedings of the Prehistoric Society journal was actually about.


Furholt, Martin, Re-integrating Archaeology: A Contribution to aDNA Studies and the Migration Discourse on the 3rd Millennium BC in Europe, Proceedings of the Prehistoric Society, Published online: 10 June 2019, DOI:

See also...

Sunday, June 9, 2019

Genetic continuity across the millennia in central Poland

Apparently, ancient DNA and anthropological research on the populations of what is now central Poland suggests strong genetic continuity in the region since the Neolithic or even Mesolithic. Science in Poland has a news feature about the soon to be published study (see here). Below are a few quotes from the article. Emphasis is mine:

How were the people in Poland changing over the centuries, from the early Middle Ages to the 19th century? Did the Slavs migrate to our territories, or are they indigenous? The 3D scanning project and digital access to skulls, skeletons and DNA from human remains from central Poland is expected to help answer these questions.


Research shows that the shape of the cerebral part of the skull has changed over the centuries - people in the early Middle Ages had more elongated heads. This interesting phenomenon has not been fully explained yet. "There are many theories on this subject, but it is not known whether this was a microevolutionary genetic change, or perhaps an environmentally conditioned one, associated with a reconstruction of the skull as a consequence of the chewing apparatus being relieved" - he adds.

Researchers are also trying to assess the level of diversity of the population living in the territory of present-day Poland during that period and whether migrants from other areas of Europe, for example from Scandinavia, appeared here. "There is the topic of participation of Scandinavian groups in the creation of the Polish State. Such groups indeed penetrated Poland, they could be hired warriors. But I think that, for example, we can probably put aside the hypothesis that Mieszko I was Scandinavian" - the researcher says.

The features, the variability of which anthropologists study, include the height of the body. We already know that, for example, people in the early Middle Ages in Poland were relatively tall, similar to Poles in the 1960s. Later there was a clear decline in body height, lasting until the 19th century.


There are already first conclusions from the research of the team from the Biobank Laboratory and the Department of Anthropology. The researchers believe that in the case of the population living in Kujawy there was a surprisingly strong genetic continuity, dating back to the time of the first farmers, 7.5 thousand years ago.

"It seems that we are dealing with an interesting genetic continuation in the population living in Kujawy from the early Middle Ages to the 19th century. The roots of these populations probably reach the Neolithic, perhaps even the Mesolithic" - the scientist suggests.

Source: 3D scans of skulls and a collection of ancient DNA will be available on the information platform

See also...

They came, they saw, and they mixed