Tuesday, November 25, 2014

Admixture and migration patterns along the former Silk Road

This Mezzavilla et al. paper is currently up for public comment at bioRxiv. My comment is that we really need ancient genomes to be able to answer the sorts of questions that the authors of this paper are trying to answer. Nevertheless, it's an interesting read.

Background: The ancient Silk Road has been a trading route between Europe and Central Asia from the 2nd century BCE to the 15th century CE. While most populations on this route have been characterized, the genetic background of others remains poorly understood, and little is known about past migration patterns. The scientific expedition "Marco Polo" has recently collected genetic and phenotypic data in six regions (Georgia, Armenia, Azerbaijan, Uzbekistan, Kazakhstan, Tajikistan) along the Silk Road to study the genetics of a number of phenotypes.

Results: We characterized the genetic structure of these populations within a worldwide context. We observed a West-East subdivision albeit the existence of a genetic component shared within Central Asia and nearby populations from Europe and Near East. We observed a contribution of up to 50% from Europe and Asia to most of the populations that have been analyzed. The contribution from Asia dates back to ~25 generations and is limited to the Eastern Silk Road. Time and direction of this contribution are consistent with the Mongolian expansion era.

Conclusions: We clarified the genetic structure of six populations from Central Asia and suggested a complex pattern of gene flow among them. We provided a map of migration events in time and space and we quantified exchanges among populations. Altogether these novel findings will support the future studies aimed at understanding the genetics of the phenotypes that have been collected during the Marco Polo campaign, they will provide insights into the history of these populations, and they will be useful to reconstruct the developments and events that have shaped modern Eurasians genomes.

Massimo Mezzavilla et al., Genetic landscape of populations along the Silk Road: admixture and migration patterns, bioRxiv, Posted November 24, 2014, doi:


Chad Rohlfsen said...

Davidski said...

I saw that paper a few days ago. It looks like it's about ten years out of date.

Matt said...

I agree we'll need adna to actually time and test the population transitions in Central Asia. This paper seems to add a large Central Asian samples, so is cool though.

In terms of population means, if you take the South and Central Asians from many of the internet's ADMIXTURE run projects, extract the components that seem not West Eurasian* and normalise the components so that they sum to 100%, transform them into a set of FST distances from the components and then run a PCA on them, they seem to fill in a bit of the cline between Northeast Europe and the North Caucasus - - based on Eurogenes K13 - based on Eurogenes K15 - based on Globe13

Something like that would be more interesting to see via an individual based PCA, like in - - Reconstructing the Population Genetic History of the Caribbean.

*although of course there might be partial degrees to which these don't separate exactly

Davidski said...

Here's what I did ages ago...

Davidski said...

Heh, I just noticed that my ANI composite clusters in exactly the same place as IR1.

Seinundzeit said...

It's interesting how the Tajikistanis are very similar to Pakistani populations. In fact, the Tajikistanis are most similar to sampled Pakistani populations, rather than to other Central Asian populations (although Afghan samples would definitely take the place of Pakistanis, if they were to be included). I think this might have applied to all of Central Asia (most of the region was probably inhabited by people very similar to modern northern/western Pakistanis), prior to the Turkic and Mongol expansions.

Nirjhar007 said...

David, Is the Samara aDNA publication next week?

Helgenes50 said...

The best solution, like Ötzi, is to hibernate until the publication.

Davidski said...

Apparently the Corded Ware/Yamnaya paper will appear at bioRxiv within weeks.

I have a hunch, and this is just a hunch, that we'll see it there just before Christmas.

Matt said...

Off the topic of the Silk Road, but I was looking again at this old 2012 paper - Genetic characterization of northeastern Italian population isolates in the context of broader European genetic diversity.

It's about " a region from northeastern Italy, which is known for isolated communities", Friuli-Venezia Giulia, where "while covering a total area of only 7858 km2 several distinct dialects are spoken and several villages sport traditions and/or surnames linking them to ethnic groups further away rather than in FVG or in Italy for that matter. All in all, it seems the area is characterized by complex demographic history. For example, people in the village of Resia speak an archaic proto-Slavic language, known as Resian, but their surnames are Italian or Italianized."

In this paper, they also find a split between general populations in a number of the villages they study and homogenous subpopulations within those regions. (Although the general populations in the villages are still pretty isolated - but the samples are often less isolated than other samples like the Druze still and even many samples like the normal Basque samples used).

Anyways, one of things this paper showed was that some of the populations under study, seeming to me in particular the isolates, and in particular the general population of the village of Resia, showed unusual positions on a European PCA and a world PCA. - Europe (has a similar shape to other PCAs of Europe, but with less of a North-South dimension as Northeast and Southeast populations aren't included, compare to Davidski's PCA with the BR1 Hungarian sample on the Human Origins dataset - world

Many of the isolates and the general population of Resia were a) in a similar position to Basques or b) spanned the gap between the Sardinian isolate and Basques and c) in the case of the Resian isolate were out beyond Basques in the "northwest" of the plots.

Given what we know now in terms of the Loschbour sample and early Bronze Age samples seeming to sit further on the northwest on a European plot, this could be quite interesting, and potentially these samples could be tested to see if they shared any particularly high levels of drift with the ancient samples. I don't know if this genetic data is openly available or not, I couldn't find anything either way.

Krefter said...

I emailed Isof Laz asking when his paper should be published a few days ago and he hasn't responded. I think he's tired of people asking, and so we'll have to be patient.

I think it's to optimistic to say it'll be out in a few weeks. The La brana-1 paper was supposed to be out in a few weeks and i took a few months.

Chad Rohlfsen said...

Maybe, they're in here somewhere.

Chad Rohlfsen said...

a few NE Italians!!

Author will share 'meta' data, upon specific request...

Chad Rohlfsen said...

What's up with the K and K2, in Central and Northern Italy? Is that legit?

Nirjhar007 said...

Krefter said...

Italy is rarely mentioned in these forums. So, it's interesting to see a paper on parental markers there. It had to of been involved in the population history the rest of Europe was, Loschbour-Stuttgart like.

On a map it looks isolated by the Mediterranean sea and the Alps. I wonder if Loschbour-like people lived there during the Mesolithic but because of the sea and the Alps Italian WHG genes never left Italy. And so maybe Italy has very basal and unknown WHG mtDNA and Y DNA.

The same could be true for EEF lineages. Farmers like Stuttgart first arrived there through the Mediterranean sea and then became isolated, and there could be an unknown J1c and K1 lineage in Italy.

It would make sense that like in Iberia and France, EEF genes survived pretty well in Italy.

I am confused about why in admixtures and Eurogenes PCAs Italians cluster with people from the Balkans and there appears to be recent East Mediterranean ancestry in Italy. This East Mediterranean ancestry must have come through the Mediterranean sea, or maybe the alps. And I wonder who brought it, because Italy must have been Stuttgart-like during the Neolithic.

Davidski said...

PCA based on the ANE K7. They look pretty solid IMO, despite the fact that the MA-1 sequence at GEDmatch has a bit of contamination (the most out of all the sequences available).

Matt said...

Chad, thanks.

Alternatively, these look like maybe relevant - - DATASET: Whole genome sequencing of Italian genetic isolates -Friuli Venezia Giulia (No access to download - Please log in before attempting to download data from the EGA. If you do not have an EGA account and want to request access, contact information for the DAC responsible for access to this data is on the right under the heading 'Who controls access to this dataset'. Tim Hubbard - Email: It is marked as "Low Coverage" though so I don't know how low that is.

This site "The European Genome-phenome Archive" seems subject to approval though, not open access. It seems like a case where it is for academic / medical access only (I guess for good reasons =/ ).

The original project was (Parco Genetico del Fruili Venezia Giulia - Genetic Park of Fruili Venezia Giulia ) part of the INGI (Italian Network of Genetic Isolates) but their website seems defunct.

There is also present (among others) on the European Genome-phenome Archive a sample for Val Borbera in Italy

(, but I can't see any evidence whether they show any interesting European plot behaviours for the population genetics we're interested in other than local inbreeding ( shows the Valley sample separate from TSI and CEU clustering together, but perhaps just due to isolation and not ancient ancestry, but they didn't we included on the plot on from the earlier paper I posted and plotted in the middle of Italy as expected, as Bo).

By the way found during searches for the above- - new paper with large study of population isolates in Greece, may be interesting. The HELIC-MANOLIS (Minoan isolates) population of the villages of Kentavros / Glafki seem potentially interesting here. This is present on the EGPA under

Shaikorth said...

Matt, the high drift of those Northeast Italian isolates affects their Fst-distances and PCA positions a lot. That being said, relative as opposed to absolute fst-distances might tell something so I checked those.
The Resia isolate is closer to Finns than to Abhkasians, closer to Poles than to Tuscans, closer to Basques than to Sardinians and closer to Kargopol Russians than to South Italians from Carlantino. The other Italian isolates don't share any of these traits (the German-speaking Sauris isolate comes closest but I suspect that they are like drifted Austrians).

So there might be something there, but I can't tell whether it's really ancient or just because Resian isolate is Slovenian-like and distorted by drift.

Matt said...

Interesting approach. FSTs are informative, but have a large of an effect from individuals being grouped together which inflates distinctions for more homogenous subpopulations and falls out to a much greater extent in individual based PCA.

That effect contributes from both directions, so the Resia(I) would have increased fst from all populations and also further increased fst from other isolates.

Shaikorth said...

That is the case for absolute distances, but even with exaggarated distances an isolate should be closer to populations more related to it. That's why I compared relative distances. The result matches their West Eurasian PCA's Middle East-North Europe dimension with Resian isolate being more northern than other Italians.

But the PCA is not guarantee of Resian isolate being an archaic remnant. Isolates can behave wonkily solely because of drift, as seen in this 23andMe PCA where Orcadians greatly deviate from both British and Norse populations.

Matt said...

@ Shaikorth, yeah, sure, I agree a population will behave uniformly to all outgroups in terms of the effect of increased drift / homogenity, and I appreciate that's why you were right to compare them that way.

All I was saying there was that

If you have populations A, B and C, and their distances are like

A-B 0.05, A-C 0.07, B-C 0.07

then you increase any of the distances involving either A and B each by 0.03 (representing a drift / homogenity effect in A and B), then you end up with

A-B 0.11 (as you've increased from 0.05 by 0.03 twice), A-C 0.10, B-C 0.10 (as A-C and B-C are only each increasing by 0.03).

So A-B which was originally forming a clade does not appear to.

Comparing isolates with isolates (even shallow isolates like Basques or Finns) "double dips" the effect on FST of homogenous inbred groups. It's kind of important to cancel all the effects at once. That could be an extreme scenario though. At the very least you need a for sure totally unrelated outgroup to control for this effect if you're using group comparisons and even then I'm not sure how that applies to the FST calculation.

Agree the isolated or homogenous populations can behave unusually purely because of drift, e.g. for another example the paper showing the Val Borbera sample clustering totally away from TSI and CEU.

It's just that this only seems to happen elsewhere when I've seen it before with PCA which include samples which are very homogenous themselves (e.g. Irish to Austrian really doesn't span much genetic distance) and not ones with a wide range of individuals and populations which seem to essentially otherwise completely reproduce the PC dimensions found in Eurogenes - comparing the PC from the paper to Eurogenes West Eurasia (with Stuttgart), or plus some rotation

It's definitely within the realm of the possible that they dimensions in this papers and the Eurogenes PCA are correlated, but are actually not the same thing. I'm also uncertain here. It would be nice to test though, somehow.

Shaikorth said...

That's why I didn't touch the absolute numbers and checked if the Resian pattern holds with multiple populations, that list of comparisons where Resia is closer to a northern population than to a southern population, while the other isolates are not closer to the n. population than to the s. population is not exhaustive.

The paper's West Eurasian PCA seems fairly typical, dim 1 is well-known SW-Asia vs Northeast Europe and dim 2 is lack of something Sardinian/Mediterranean-like which separates Caucasus populations from Jews and Finns (but not Kargopol Russians who are right between Lithuania and Poland) from Balto-Slavs.

It doesn't seem that the Resia isolate defines either dimension, so it could be that there is some genuine distinction to them other than drift but hard to say for sure. The admixture graph doesn't tell much either because Resia's modal component takes over at K=3.

Matt said...

@Shaikorth, OK, that's a good point re: North and South European comparisons. Not going to hammer this to death any further, and I do appreciate your replies.

Only counterpoint I think would be that we are not necessarily looking totally at Northern European like pattern in either PCA, but more like an extreme of the Basque pattern compared to other Southwest Europeans and to south central Europeans.

If you look at the West Eurasia PCA, its very clear that the Resia and Resia isolate samples there aren't always closer to the North European populations when compared to the Southern Europeans, and it depends on where the populations sit on the Southwest Asia->North Italy->Northeast Europe cline.

So the main point would be I guess how well the Resian isolate's pattern correlates with the differences between Basque and Spanish FSTs (which we don't know) or between Basque and an Italian non-isolate. And whether the pattern of closer populations in FST mirrors the PCA rather than how often R-I is closer to the Northern rather than Southern population.

West Eurasian PCA dim 1 is well-known SW-Asia vs Northeast Europe and dim 2 is lack of something Sardinian/Mediterranean-like which separates Caucasus populations from Jews and Finns (but not Kargopol Russians who are right between Lithuania and Poland) from Balto-Slavs.

Yeah, it's hard to say for sure. In these West Eurasian PCA plots where Davidski has run them usually place the ancient samples, WHG samples tend to place at what is here towards the bottom left hand corner (in Davidski's graphs the top right hand side due to the difference in orientation). Where Resia general and the Resia isolate are pointing compared to the other Europeans.

So I think it's plausible that the Resia isolate samples are harboring an extra dose of WHG compared to mainland Italians (and may be a leftover from a Late / Middle Neolithic population which did), but like you've said, hard to say. And these plots do form differently, with Palestinians etc falling more "east" compared to Eurogenes, even though they do have the essentially same parallel clines structure with relatively similar positions for all samples relative to what each dimension measures.

Chad Rohlfsen said...

It might be that they have as much WHG as the Basque, but between the Basque and Sardinians, as far as ANE.

Doesn't the region have higher amounts of E, J2, and I, without a lot of R1b or R1a?

Shaikorth said...

Basques are closer to Tuscans than to Poles, for the Resia isolate it's the opposite. Basques are also closer to HGDP North Italians than to Czech and Slovenians, and again opposite true for the isolated Resians. However this could still just mean that the Resians are more Slavic than other Northeast Italians.

Matt said...

@ Shaikorth Basques are closer to Tuscans than to Poles, for the Resia isolate it's the opposite. Basques are also closer to HGDP North Italians than to Czech and Slovenians, and again opposite true for the isolated Resians.

Yes, that seems pretty consistent with their positions on PCA although I'm assuming these dimensions scale that way and I'm completely eyeballing it -

Poland vs Tuscan comparison -

Red - Resia isolate to Pole
Green - Resia isolate to Tuscan
(Red looks shorter than green)

Purple - Basque to Pole
Orange - Basque to Tuscan
(Orange looks shorter than purple)

Czech / Slovene vs North Italian comparison

Red - Resia isolate to Czech / Slovene
Green - Resia isolate to North Italian
(Red looks shorter than green)

Purple - Basque to Czech / Slovene
Orange - Basque to North Italian
(Orange looks shorter than purple)

Simon_W said...

@ Krefter

It's well known that southern Italy deviates from the Italian average towards Greece and Western Asia while the north deviates towards Iberia and on the whole (but not everywhere!) shows more central European and French admixture.

The east Mediterranean conncetions of southern Italy are also obvious in the y-chromosomes, after all y-haplogroup J2, especially J2a is very common throughout southern Italy.

As for the explanation of these facts: There was of course the Greek and Illyrian colonization in the first millennium BC, but I don't think that it amounts to nothing more than this. According to
there is archeological evidence for earlier connections of southern Italy to the eastern Mediterranean, more concretely speaking two additional eastern waves of influence postdating the early Cardium wave. While the north remained Cardium derived until the Indo-Europeanization.

Of course nothing is known about the language of these later Neolithic waves that affected southern and parts of central Italy.