search this blog

Saturday, June 1, 2019

They came, they saw, and they mixed


Y-chromosome haplogroup N is strongly associated with Uralic-speaking populations. That's probably because it was a salient feature of the gene pool of the earliest Uralic speakers, and it went with them as they migrated across northern Eurasia. However, some of its younger subclades appear to have spread with the speakers of Indo-European and Turkic languages.

For instance, N-Y10931 seems to be a marker of the Rurikids, a Varangian dynasty that, according to most sources, ruled the Kievan Rus in what are now Russia and Ukraine. And the Kievan Rus was a lose medieval political federation in which Slavic, Finnic (west Uralic) and Germanic languages were probably spoken. The latest on the genetic genealogy of the Rurikids was presented a couple of days ago at the Centenary of Human Population Genetics conference in Moscow, and there's an abstract of the talk available here (download the PDF and scroll down to page 84).

I'm not aware of any Rurikids among the thousands of ancients in my dataset, or even of any samples belonging to N-Y10931. But I do have the genome of someone who belongs to N-Y4339, which, as per the abstract linked to above, is proximally ancestral to N-Y10931. Not only does this person come from Viking Age Scandinavia, but he was buried in a crouched position typical of Slavic funerary customs of the time.

The individual in question is vik_84001. His genome was published recently along with a paper on the population structure of the Swedish town of Sigtuna way back when it was a Viking stronghold (see here). This is where his Y-chromosome sequence, labeled ERS2540883, is positioned on the YFull Y-chromosome phylogenetic tree. Click on the image to go to YFull.


However, the result is likely to be compromised to some extent by missing data. If so, it's possible that vik_84001 does indeed belong to N-Y10931 and ought to be sitting near or even among that cluster of Russian samples (Rurik descendants?) at the bottom of the page.

In any case, vik_84001 seems to be the closest individual in the ancient DNA record to a Rurikid. The Principal Component Analysis (PCA) below is based on my Global25 data. It features 18 other Viking Age individuals from Sigtuna alongside vik_84001 (look for the black dots). The relevant datasheet is available here. Interestingly, despite his eastern Y-haplogroup, vik_84001 is one of the few Sigtuna ancients who clusters strongly with present-day Swedes.
But here's what happens when I model his ancestry proportions with the Global25/nMonte method using a wide range of reference populations from Northern and Eastern Europe. The Swedes in this model are the same as those in the PCA.

vik_84001
Swedish,84.6
Ingrian,9.2
Russian_Tver,6.2

Belarusian,0
Estonian,0
Finnish,0
Finnish_East,0
Karelian,0
Latvian,0
Mordovian,0
Russian_Kostroma,0
Russian_Kursk,0
Russian_Orel,0
Russian_Pinega,0
Russian_Smolensk,0
Russian_Voronez,0
Ukrainian,0
Vepsian,0

[1] "distance%=2.3778"

Yep, despite his position in the PCA, vik_84001 shows a strong signal of ancestry related to the present-day populations of northwestern Russia. I'm not sure what this means exactly, but it's certainly fascinating stuff. And, by the way, I usually wouldn't use so many similar reference populations in a single Global25/nMonte model because of the problem of "overfitting", but in some cases it's OK to do so if the nMonte algorithm has enough recent genetic drift to latch onto.

See also...

More on the association between Uralic expansions and Y-haplogroup N

Fresh off the sledge

Uralic-specific genome-wide ancestry did make a signifcant impact in the East Baltic

It was always going to be this way

Conan the Barbarian probably belonged to Y-haplogroup R1a

219 comments:

«Oldest   ‹Older   201 – 219 of 219
Shaikorth said...

@Dragos
Quite a few models for ancients involve samples that are more recent than the population being modeled, like models for CHG which involve Iran_N. Some models commonly include moderns as sources: qpAdm fits modeling ASI in ancient samples, like the ones in the Narasimhan et al. preprint, use Onge or Han as a proxy.

The important bit is that the fit shares a similar amount of drift with pRight pops as the sample being modeled (failure is often caused by recent gene flow between pLeft and pRight). For example if you try to model Sidelkino as WHG+Okunevo with those outgroups the fit would share too much or too little drift with some pRight pops compared to Sidelkino resulting in a bad pValue or failure.

However in this model a WHG+WSHG combo's shared drift stats towards those pRight's are better (more similar to the real Sidelkino) than those of a WHG+AG3 combo, or even those of a WHG+WSHG+AG3 combo so there is something there.

Draft Dozen said...

@Andrzejewski

"I'm pretty sure none of them were Uraloid, that's for sure ;)"

Here's one of them (male).
https://i.postimg.cc/05BB0YQB/Ekaterinovsky-Cape-skull.jpg

Slumbery said...

Shaikorth said...
"However, WSHG might have been around a while before Neolithic because EHG's actually seem to be easier to model as WHG+WSHG in qpAdm than as WHG+Afontova_Gora3"

Isn't that because WSHG had EHG admixture though?

For example in G25 nMonte:

"sample": "RUS_West_Siberia_N:Average",
"fit": 3.8825,
"RUS_AfontovaGora3": 67.5,
"RUS_Karelia_HG": 20.83,
"RUS_Shamanka_N": 11.67,

"sample": "RUS_West_Siberia_N:Average",
"fit": 4.5637,
"RUS_AfontovaGora3": 84.17,
"RUS_Shamanka_N": 11.67,
"WHG": 4.17,

So WSHG had shared drift with EHG that was more recent than the ANE ancestry in the formation of EHG.
And it is coming from an EHG admixture into WSHG, not the other way around, because the extra East Asian ancestry in WSHG (compared to AG3) did not reach the EHG population:

"sample": "RUS_Karelia_HG:Average",
"fit": 5.8683,
"RUS_AfontovaGora3": 68.33,
"WHG": 31.67,
"RUS_Shamanka_N": 0,

Shaikorth said...

@Slumbery
Anzick and Ami are present in pRight so if a WHG+WSHG mix had significant far eastern ancestry that Sidelkino didn't the qpAdm model would not work, and in any case would be worse than AG3+WHG (that's why Sidelkino WHG + Okunevo or something would fail). We see that the distance is better in scaled nMonte for Sidelkino with WSHG instead of AG3 too. The more recent EHG's in qpAdm are Sidelkino with 10-15% extra WHG.

"distance%=6.3721"
RUS_Sidelkino_HG
RUS_AfontovaGora3,65.2
WHG,34.8

"distance%=5.9485"
RUS_Sidelkino_HG
RUS_West_Siberia_N,67.2
WHG,32.8

Slumbery said...

@Shaikorth

"Anzick and Ami are present in pRight..."

Arguably both of them are very distant from BaikalHG in their eastern ancestry while in the same time have some ancestry more western than BHG, but I have to admit I do not understand qpAdm enough to determine whether this is relevant to the argument.


"We see that the distance is better in scaled nMonte for Sidelkino with WSHG instead of AG3 too."

Well, we do not know how well AG3 represent the either the ANE population that participated in the formation of EHG or the population that is directly ancestral to WSHG. It is not like we have a multitude of competing samples, AG3 is pretty lonely. Assuming AG3 itself is not actually ancestral to either of them, this might be caused by common ancestry from a "para-AG3" group, before the increase of BHG-like ancestry in West Siberia.


"The more recent EHG's in qpAdm are Sidelkino with 10-15% extra WHG."

I doubt that. If anything Karelia_HG seems to be a bit less WHG than Sidelkino.

Shaikorth said...

@Slumbery

Baikal_N (minus about 10% ANE it has), Ami and the non-ANE part of Native Americans are parts of the same deep clade so it would show in terms of shared drift if WHG+WSHG was more related to that branch than actual Sidelkino. QpAdm looks specifically at deep ancestry. Similarly a ghost branch of AG3 existing somewhere couldn't make WHG+WSHG work for Sidelkino in qpAdm.

Karelia_HG as a mix of Sidelkino and some WHG's, especially the Balkan ones with a CHG trace, gets tail probs of up to 0.9 so it's extremely likely it has extra WHG.

Just for fun, here is an anachronistic (since UKR_Meso likely has Sidelkino instead of the opposite) but really good G25 fit for Sidelkino:

"distance%=2.1154"

RUS_Sidelkino_HG

UKR_Mesolithic,64.4
RUS_West_Siberia_N,35.6

Again AG3 instead of WSHG works too but the distance isn't quite as good.

Slumbery said...

@Shaikorth

But G25 nMontes picks up a very clear BHG ancestry in WSHG compared to AG3.

"sample": "RUS_West_Siberia_N:Average",
"fit": 4.5637,
"RUS_AfontovaGora3": 84.17,
"RUS_Shamanka_N": 11.67,
"WHG": 4.17,

"sample": "RUS_West_Siberia_N:Average",
"fit": 4.8943,
"RUS_AfontovaGora3": 88.33,
"RUS_Shamanka_N": 11.67,

The WHG can be explained by the tail end of the WHG - ANE cline that formed EHG to begin with, so it could be very ancient in WSHG and not a later EHG admixture, but EHG samples from any time lack this BHG-like ancestry and it must be explained somehow, because that ancestry seems to be real (and Botai had even more of it).
You say the qpAdm test should fail if there is a significant BHG-like ancestry in WSHG, so we have a paradox there. Do you have an explanation?


"Karelia_HG as a mix of Sidelkino and some WHG's, especially the Balkan ones with a CHG trace, gets tail probs of up to 0.9 so it's extremely likely it has extra WHG."

G25 nMonte says no.

"sample": "RUS_Karelia_HG:Average",
"fit": 3.6747,
"RUS_Sidelkino_HG": 100,
"WHG": 0,

"sample": "RUS_Karelia_HG:Average",
"fit": 5.8392,
"RUS_AfontovaGora3": 67.5,
"WHG": 30.83,
"GEO_CHG": 1.67,

"sample": "RUS_Sidelkino_HG:Average",
"fit": 5.379,
"RUS_AfontovaGora3": 60,
"WHG": 35,
"GEO_CHG": 5,

I might be not accurate as a deep ancestry test, but still, it shows the complete opposite of what you are saying.

Shaikorth said...

@Slumbery
If you compare terms of who shares the most total drift with WSHG or AG3:
MA1 closer to AG3
EHG a bit closer to WSHG
WHG much closer to WSHG
Paleoeuropeans (Kostenki, Vestonice etc.) slightly closer to WSHG

In the case of Sidelkino I said the fit would fail if WHG+WSHG mixture would share more drift with the ENA-related branch than Sidelkino, but the fit working shows that it doesn't. WSHG as AG3 + Shamanka_EN won't work because of the WHG shift, same was detected in Botai which is much better fitted as WSHG+BHG than as AG3+BHG.

In the case of comparing very closely related samples from the same metapopulation (Karelia_HG and Sidelkino) G25 gets too caught in their recent shared ancestry and can't detect the WHG extra but qpAdm does it and the experiment should by all accounts be repeatable, f.ex someone with qpAdm can try modeling Karelia_HG as Sidelkino + Iron_Gates_HG and see how it goes.

Slumbery said...


If you compare terms of who shares the most total drift with WSHG or AG3:
MA1 closer to AG3
EHG a bit closer to WSHG
WHG much closer to WSHG
Paleoeuropeans (Kostenki, Vestonice etc.) slightly closer to WSHG"


I can't see how this support your point. MA1 position is largely irrelevant to our discussion, EHG being a bit closer to WSHG is compatible with my argument (and also expected, because ANE arrived into Europe from the western perimeter, not straight from the Altai + it happened millennia after AG3 and those millennia had their own drift), the WHG result is plain weird for me (are you sure?), the Paleoeuropeans are also not relevant here.


"In the case of comparing very closely related samples from the same metapopulation (Karelia_HG and Sidelkino) G25 gets too caught in their recent shared ancestry and can't detect the WHG extra but qpAdm does it and the experiment should by all accounts be repeatable, f.ex someone with qpAdm can try modeling Karelia_HG as Sidelkino + Iron_Gates_HG and see how it goes."

This argument would work if G25 nMonte just could not detect the WHG extra, but this is a misrepresentation of the case. It detects the opposite, strongly.

Let's turn around that two way model a bit:

"sample": "RUS_Karelia_HG:Average",
"fit": 3.0546,
"RUS_Sidelkino_HG": 88.33,
"RUS_AfontovaGora3": 11.67,

Or with your preferred group:

"sample": "RUS_Karelia_HG:Average",
"fit": 3.2361,
"RUS_Sidelkino_HG": 89.17,
"RUS_West_Siberia_N": 10.83,

So no, it does not get caught in their recent shared ancestry, it very clearly sees a difference even when the much older and not very drift-sharing AG3 is used as a counter. BTW there is 3000 years between these two EHG samples + a considerable geographical distance, so it is not self-evident that they are a meta-population with impossible-to-untangle shared drift.

"In the case of Sidelkino I said the fit would fail if WHG+WSHG mixture would share more drift with the ENA-related branch than Sidelkino, but the fit working shows that it doesn't."

I lost the track here a bit I am afraid. Sidelkino sharing more drift with the ENA related branch than WHG + WSHG is the intuitively expected result. However since Sidelkino is essentially WHG + that ANE branch, this actually shows WSHG itself is not a better proxy for the ANE branch.

In any case, when I said that "You say the qpAdm test should fail if there is a significant BHG-like ancestry in WSHG, so we have a paradox there." I meant this:
"Anzick and Ami are present in pRight so if a WHG+WSHG mix had significant far eastern ancestry that Sidelkino didn't the qpAdm model would not work, and in any case would be worse than AG3+WHG (that's why Sidelkino WHG + Okunevo or something would fail).

WSHG does have a BHG(-like) ancestry. See further.


"WSHG as AG3 + Shamanka_EN won't work because of the WHG shift..."

I know, I modeled them as WHG + AG3 + BHG above. But the point is that they do have extra BHG(-like) ancestry above AG3, not that a two way AG3 + BHG model is real good one.


"...same was detected in Botai which is much better fitted as WSHG+BHG than as AG3+BHG."

Of course, because WSHG is much-much closer in time and they almost certainly have recent (much more recent than AG3) common ancestry from the ANE side + WSHG already had the same BHG, only less. So of course WSHG + BHG will be a better fit than AG3 + BHG, but I do not understand what are you trying to say by stating that.

Shaikorth said...

"Sidelkino sharing more drift with the ENA related branch than WHG + WSHG is the intuitively expected result."

How come, ENA is the branch with Dai, Onge etc. The point is that for that qpAdm fit to work Sidelkino has to share some of that BHG-related ancestry WSHG does (if it wouldn't the model would not work because Sidelkino would be more distant from Ami and Anzick than the fit). But not more drift with ENA, that's why both WHG+Botai and WHG+AG3 are worse fits for Sidelkino than WHG+WSHG

QpAdm suggests Karelia_HG doesn't have significant AG3 or WSHG compared to Sidelkino instead of WHG because otherwise it would share additional drift with Anzick and MA-1 included in the pRight. Therefore in this case we should lean towards Global25 model being misleading which can happen due to recent drift.

Slumbery said...

@Shaikorth

"How come,..."

I completely lost in the structure of your sentences in this particular branch of our debate, so I have no idea what is compared to what there. That is how come. :/

"The point is that for that qpAdm fit to work Sidelkino has to share some of that BHG-related ancestry WSHG does (if it wouldn't the model would not work because Sidelkino would be more distant from Ami and Anzick than the fit)."

I understand that is your interpretation of the qpAdm result and I do not have enough knowledge of qpAdm to challenge that. (Although I still think that the East Asian side of Ami and Anzick is very diverged from BHG, but you say that for this purpose they are the same enough and I have to accept you opinion.)
However G25 nMonte still shows a significant BHG in WSHG and it is a too big ancestry from a very divergent group to be some statistical fluke. At the same time EHG does not have this kind of ancestry apparently. So there is a paradox that is still unexplained.


"QpAdm suggests Karelia_HG doesn't have significant AG3 or WSHG compared to Sidelkino instead of WHG because otherwise it would share additional drift with Anzick and MA-1 included in the pRight. Therefore in this case we should lean towards Global25 model being misleading which can happen due to recent drift."

You original claim was that younger EHGs are all more WHG than Sidelkino. We could argue over the question whether + 10% ANE (and therefore minus 3-4% WHG) in an otherwise ANE dominated mixture would have a huge impact on the shared drift with the pretty distant Anzick and MA1 (not necessarily or with other words: what is "significant" means here exactly?), but even if G25 nMonte overestimate the extra AG3, this is still very far from Karelia_HG having extra WHG. I see no support for the claim that Karelia_HG is more WHG rich than Sidelkino_HG.

Shaikorth said...

My point has been that Sidelkino cluster shares the same ancestry components with WSHG, otherwise the statistical qpAdm fit as WHG+WSHG would fail or be worse than WHG+AG3 fits.

Back from Kale's qpAdm runs, the strongest evidence for extra WHG in for Karelia HG:


Karelia_HG

pright
Barcin_N, Iran_N, Levant_N, Kotias, MA1, Anzick, ElMiron, GoyetQ116-1, Vestonice16, Kostenki14, Ami, Australian, Ust_Ishim

83% Sidelkino + 17% Lepenski_Vir_outlier (Balkan WHG with some CHG)
tail prob 0.90

Notice Anzick and MA-1 in pRight. WSHG shares more drift with them than Iron Gates HG's or Sidelkino, the fit wouldn't be that good with close to perfect P-value if Karelia HG was Sidelkino+WSHG with no extra WHG. Karelia_HG as Iron_Gates + Sidelkino + WSHG might be possible but hasn't been tested.

I will add that qpAdm fits are best disproven with other formal tests or haplotype based tests instead of just G25 nMonte. To give an example of what you can get if you aren't careful, I haven't heard of qpAdm support for South Central Asian migrations into the eneolithic Pontic-Caspian steppe, and Davidski had said it doesn't work, but you can make it happen with nMonte:

"distance%=4.375"

RUS_Vonyuchka_Eneolithic

RUS_Sidelkino_HG,39.8
TJK_Sarazm_Eneolithic,32.6(!)
GEO_CHG,25.8
UKR_Mesolithic,1.8
Anatolia_Barcin_N,0
BEL_Loschbour:I0001,0

"distance%=4.6563"

RUS_Progress_Eneolithic

RUS_Sidelkino_HG,48.8
TJK_Sarazm_Eneolithic,26.8
GEO_CHG,24.4
UKR_Mesolithic,0
Anatolia_Barcin_N,0
BEL_Loschbour:I0001,0

Shaikorth said...

@JuanRivera
Lacking a proper PC for doing AdmixTools right now so hopefully someone else will take up the qpAdm, but I can't rule out the Magadan steppe admix, the best fits in the Sikora et al. supplement commonly offer either Afanasievo or Barcin_N on top of Kolyma and any Anatolian farmer couldn't have been brought so far by other means (even if it was some forest zone population moving east it must have gotten Barcin_N from the steppe). Doesn't need to be Afanasievo based on the age of the samples (3000 bp), can be Okunevo or some later population.

E. Donovan said...

Interesting ideas. The Russians have the earliest arrival of Combed Ware, and from east of the Urals, I believe in the Volga-Kama region within centuries of the appearance of the Xinglongwa culture much further east, which they don't mention and simply describe this first wave as transient. There might be some chance N first arrived in Anatolia with Anatolian speakers, so its clades in modern Turkey must be investigated. Even with a link found I would still expect Anatolian to have a partly EHG character. Not to be rude but I think the WSHG source idea, perhaps also via Botai and Baikal HG, strains credulity even considering the N found in this spectrum unless we're willing to give Para-Uralic some kind of near Afro-Asiatic time depth.

Isn't there some ancient N1c in the Balkans? If the Anatolians took that route then we might have a lead there. If it actually got that far within an acceptable time frame I don't see the requirement of much if any East Asian autosomal signature. At least in the case of the Anatolian speakers we can't assume for certain they were R.

E. Donovan said...

We all know that. We also find EHG either raw or via steppe in all western N populations. Proto-steppe could work too. Furthermore, the earliest Para-Uralic speakers in Europe may not have had the Yangshao admixture later western Uralics have, perhaps only something Samoyedic/Evenk-like.

Drago said...

Ok I see if we include WSHG, it will come up modelling EHG

E.g.

EHG (average)
AfontovaGora3:41.8%
Villabruna 27.8%
West_Siberia_N 26.2%
CHG 2.4%
Boncuklu_N 1.8%
Iran_N 0%
Natufian 0%

Distance 5.5976%


Im still not very happy about using something younger to modelize an older; however, I see that might be something to it - perhaps AG-3 is too old and we need something younger, slightly more eastern.
Of course, the psoblem also is that there might have been 2 migrations, one at the LGM, one at the cusp of the Mesolithic.

Shaikorth said...

@Dragos
There probably were multiple migrations, something unsampled seems to have influenced WHG.

@JuanRivera
Perhaps Okunevo migrated east but its genetic impact must have been relatively small (which still fits the qpAdm result). If we look at uniparentals Okunevo's Y-DNA was R1b-L23/Z2103, Q-L56/L54 and N1c-B187, which are still around in lower frequencies near Minusinsk but besides the Q absent in Magadan.

Andrzejewski said...

J came with the CHG, right?

RangoonRhino said...

Re: Naxi/Yi - remember that the Dian culture of western Yunnan had strong links to Inner Asia, there being little physical barrier between western Yunnan, western Sichuan (historically Kham Tibet) and the Gansu corridor. There has long been speculation (though no real evidence I think) for a Yuezhi (Indo-European) migration southward towards Yunnan as well as westward (to the Oxus and Bactria). Colonial-era ethnographers often remarked on the "Caucasian" physiognomy of the "Cool-mountain" Yi aristocracy (who owned slaves of lowland Han Chinese descent).

«Oldest ‹Older   201 – 219 of 219   Newer› Newest»