Eurogenes Blog: archaeogenetics

Showing posts with label archaeogenetics. Show all posts

Sunday, November 10, 2019

Etruscans, Latins, Romans and others

I've just added coordinates for more than 100 ancient genomes from the recently published Antonio et al. ancient Rome paper to the Global25 datasheets. Look for the population and individual codes listed here. Same links as always:

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient

Thus far I've only managed to check a handful of the coordinates, so please let me know if you spot any issues. Below is a Principal Component Analysis (PCA) featuring the Etruscan and Italic speakers. I ran the PCA with an online tool specifically designed for Global25 coordinates freely available here.

Can we say anything useful about the origins of the Etruscan and early Italic populations thanks to these new genomes? Also, to reiterate my question from the last blog post, what are the genetic differences exactly between the Etruscans, early Latins, Romans and present-day Italians? Feel free to let me know in the comments below.

Update 13/11/2019: Here's another, similar PCA. This one, however, is based on genotype data, and it also highlights many more of the samples from the Antonio et al. paper. Considering these results, I'm tempted to say that the present-day Italian gene pool largely formed in the Iron Age, and that it was only augmented by population movements during later periods. The relevant datasheet is available here.

Update 13/11/2019: It seems to me that the two Latini-associated outliers show significant ancestry from the Levant, which possibly means that they're in part of Phoenician origin. These qpAdm models speak for themselves:

ITA_Ardea_Latini_IA_o
ITA_Proto-Villanovan 0.547±0.081
Levant_ISR_Ashkelon_IA2 0.453±0.081
chisq 7.573
tail prob 0.87027
Full output

ITA_Prenestini_tribe_IA_o
ITA_Proto-Villanovan 0.679±0.068
Levant_ISR_Ashkelon_IA2 0.321±0.068
chisq 7.222
tail prob 0.89033
Full output

The Proto-Villanovan singleton is also a key part of the models. Dating to the Bronze Age/Iron Age transition, she appears to be of western Balkan origin. Moreover, her steppe ancestry is probably derived directly from the Yamnaya horizon.

ITA_Proto-Villanovan
HRV_Vucedol 0.677±0.031
Yamnaya_RUS_Samara 0.323±0.031
chisq 10.397
tail prob 0.661174
Full output

The cluster made up of four early Italic speakers can be modeled with minor Proto-Villanovan-related ancestry, but, perhaps crucially, it doesn't need to be. Indeed, judging by the qpAdm output below, it's possible that almost all of its steppe ancestry came from the Bell Beaker complex, and, thus, the Corded Ware culture complex before that.

ITA_Italic_IA
Bell_Beaker_Mittelelbe-Saale 0.480±0.055
ITA_Grotta_Continenza_CA 0.411±0.042
ITA_Proto-Villanovan 0.109±0.084
chisq 10.294
tail prob 0.590205
Full output

Two out of the three available Etruscans look very similar to the Italic speakers in the above PCA plots, and yet they show a lot more Proto-Villanovan-related ancestry in my qpAdm run. The statistical fit is also relatively poor, perhaps suggesting that something important is missing.

ITA_Etruscan
Bell_Beaker_Mittelelbe-Saale 0.186±0.081
ITA_Grotta_Continenza_CA 0.283±0.064
ITA_Proto-Villanovan 0.531±0.126
chisq 17.175
tail prob 0.143143
Full output

Interestingly, the Etruscan outlier with significant North African admixture (proxied in my run by MAR_LN) doesn't need to be modeled with any Bell Beaker ancestry.

ITA_Etruscan_o
ITA_Proto-Villanovan 0.675±0.057
MAR_LN 0.325±0.057
chisq 14.864
tail prob 0.315912
Full output

Update 17/11/2019: The spatial maps below show how three groups of ancient Romans (from the Imperial, Late Antiquity and Medieval periods) compare to present-day West Eurasian populations in terms of their Global25 coordinates. The hotter the color, the higher the similarity. More here.

See also...

Getting the most out of the Global25

Thursday, November 7, 2019

What's the difference between ancient Romans and present-day Italians?

The first paper on the genomics of ancient Romans was finally published today at Science [LINK]. It's behind a paywall, but the supplementary info is freely available here. Below is a quick summary of the results courtesy of the accompanying Ancient Rome Data Explorer.

I'm told that the genotype data from the paper will be online within a day or so at the Pritchard Lab website here. I'll have a lot more to say about ancient Romans and present-day Italians after I get my hands on it.

See also...

Etruscans, Latins, Romans and others

Tuesday, November 5, 2019

Modeling your ancestry has never been easier

An exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is freely available HERE. It works offline too, after downloading the web page onto your computer. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen caps below show me doing just that.

Another free, easy to use online tool that works with Global25 coordinates is the Principal Component Analysis (PCA) runner HERE. Below is a screen cap of me checking out one of the many PCA that it offers.

See also...

Getting the most out of the Global25

Saturday, November 2, 2019

Interesting times ahead

The map below made a big impression on me. Can't wait to see all of these ancient samples online. More details here.

Friday, July 12, 2019

Getting the most out of the Global25

The first thing you need to know about the Global25 is that I update the relevant datasheets regularly, usually every few weeks, but they're always at these links:

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient

...

Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern

Global25 data for samples from a variety of papers that have been published recently will eventually be incorporated into the main datasheets linked above, but the process might take several weeks or even months. In the meantime, feel free to use the temporary datasheets below. Thanks for your patience.

Allentoft 2023

Chylenski 2023

Jeong 2024

Koptekin 2022

Olalde 2023

Peltola 2022

Penske 2023

Posth 2023

Sirak 2024

Skourtanioti 2023

Stolarek 2023

Varela 2023

Wang 2023

Yu 2023

Each sample has a population code and an individual code. The population codes represent the countries, ethnic groups and/or archeological affinities of the samples, and I often modify these codes to suit my needs. On the other hand, the individual codes are unique to most of the samples and I usually don't change them.

So if you'd like to know more details about the samples try searching for their individual codes via a decent online search engine. Basic information about many of the samples is also available in the "anno" files here.

The main purpose of the Global25 is to provide data for mixture modeling. In other words, for estimating ancestry proportions, both ancient and modern (see here). This can be done on your computer with the R program and the nMonte R script, or online with a couple of different tools, which I discuss below.

If you don't have R installed on your computer, you can get it here, while nMonte is available here. For this tutorial please download nMonte and nMonte3, and store them in your main working folder (usually My Documents).

Once you have R set up, make sure its working directory is the same place where you stored nMonte. You can check this in R by clicking on "File" and then "Change dir". Additionally, you'll need two nMonte input files in the working directory titled "data" and "target". Examples of these files are available here. We'll be using them to test the ancient ancestry proportions of a sample set from present-day England.

Before you can begin the analysis you need to first call the nMonte script by typing or copy pasting source('nMonte.R') into the R console window, and then hitting "enter" on your keyboard. This is what you should see in the R console window afterwards.

To start the mixture modeling process, type or copy paste getMonte('data.txt', 'target.txt') into the R console window, hit "enter", and wait for the results. After a short time, probably less than a minute or two, you should see this output.

The data and target files contain population averages. And, as you can see, the results that these population averages have produced are in line with what one would expect from such a model focusing on the genetic shifts in Northern Europe during the Late Neolithic. Very similar ancient ancestry proportions have been reported for the English and other Northern Europeans recently in scientific literature.

However, when focusing on exceptionally fine-scale genetic variation that isn't reflected too well in the Global25 population averages, a more effective strategy might be to use multiple individuals from each reference population and let nMonte3 aggregate and average the inferred ancestry proportions.

This is often the case when attempting to model ancestry proportions for more recent periods, such as the Middle Ages. So let's try this with the English sample set using a modified data file, which is available here.

Replace the old data file with the new one in your working directory, and, like before, copy paste into the R console window the following two commands, hitting "enter" after each one: source('nMonte3.R') and getMonte('data.txt', 'target.txt'). This is what you should eventually see.

It's difficult to say how accurate these estimates are. But they look more or less correct considering the limited and less than ideal reference samples. For instance, the individuals labeled SWE_Viking_Age_Sigtuna are supposed to be stand ins for Danish and Norwegian Vikings, but they're a relatively heterogeneous group from Sweden, possibly with some British or Irish ancestry, so they might be skewing the results.

However, I'll be adding many more ancient samples to the Global25 datasheets as they become available, including lots of new Vikings, which should greatly improve the accuracy of these sorts of fine-scale mixture models.

An exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is the VahaduoJS. It's freely available HERE, and it also works offline after downloading the web page. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen caps below show me doing just that.

However, it's important to note that the Global25 is a Principal Component Analysis (PCA), so it makes good sense to also use it for producing PCA graphs. To do this just plot any combination of two or three of its Principal Components (PCs) to create 2D or 3D graphs, respectively. This can be done with a wide variety of programs, including PAST, which is freely available here.

To produce a 2D graph, open a Global25 datasheet in PAST, choose comma as the separator, highlight any two columns of data, click on the "Plot" tab and, from the drop down list, pick "XY graph". Below is a series of graphs that I created in exactly this way. I also color coded the samples according to their geographic origins. This was done by ticking the "Row attributes" tab.

PAST can also be used to run PCA on subsets of the Global25 scaled data to produce remarkably accurate plots of fine-scale population structure. For instance, here's a plot based on present-day populations from north of the Alps, Balkans and Pyrenees.

To try this create a new text file with your choice of populations from the Global25 scaled datasheet, open it with PAST and choose Multivariate > Ordination > Principal Components Analysis. I've already put together several datasheets limited to European, Northern European, West Eurasian and South Asian populations. They're available at the links below along with more details on how to run them with PAST.

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

Another free, easy to use online tool that works with Global25 coordinates is the Vahaduo Global25 Views [LINK]. Below is a screen cap of me checking out one of the many PCA that it offers.

And if you're fond of tree-like structures as a means to describe fine-scale genetic variation, please see this blog post...

Global25 workshop 4: a neighbour joining tree

See also...

New Global25 interpretation tools

search this blog