search this blog

Thursday, June 23, 2016

A moment of clarity


A lot of things now make so much more sense thanks to all of the recently published ancient DNA. For instance, in the Principal Component Analysis (PCA) below, South Central Asians (SC_Asia) finally look like a three-way mixture of Bronze Age steppe pastoralists, early farmers from Iran and surrounds, and indigenous South Asians, which is exactly what they are.


By the way, I also ran a global analysis but didn't get the chance to make a decent plot. However, the datasheet is available for download here. The samples are from a variety of recent DNA papers and freely available at the Reich Lab site here.

127 comments:

Davidski said...

Here's a plot with a couple of Samaritans (black squares) and an Egyptian Copt (black star).

https://drive.google.com/file/d/0B9o3EYTdM8lQcTBXQ2tHbEtnY3M/view?usp=sharing

Rob said...

The world was imploding c. 4000 BC
;)

Rob said...

Looks awesome btw, Dave

Olympus Mons said...

@Rob,
"The world was imploding c. 4000 BC"

Can you elaborate just a little bit?...

For the king said...

@Davidski

Awesome PCA! can you highlight the Iranian populations ? also the Kurdish and Balochis/Brahuis/Makranis ?

Regards

Jijnasu said...

What do you think lived originally in S C Asia ?

Karl_K said...

@Romulus

"apparently you know exactly what sc_asia is but despite having access to european hg genomes for years you couldnt detect whg contained ane

all your work is trash"

Pretty bold statement, but I'm not sure that anyone knows what your point is.

Could you be more specific? Are you saying that because Davidski doesn't have all the possible data that all analyses are garbage?

That sounds dumb.

Ariel said...

For comparison https://s32.postimg.org/qi9jt7u0z/wykres_PCA.png

Ryukendo K said...
This comment has been removed by the author.
anthrospain said...

David, what software you use for making this PCA plots ?
I think your PCA looks more clear than the Laziridis one

Alberto said...

Thanks, looks very informative indeed. And thanks for sharing the PCA data. I'll give that a try too as soon as I can.

Samaritan DNA said...

Thank you David for the Samaritan plot. It seems that the unadmixed Behar Samaritan and the Roman gladiator 3DRIF-26 are in fact nearly identical to the Early Bronze Age Levantines.
The remarkable thing is that I1730 AG_84_3083_116 is J2b1-M205, just like 3DRIF-26. J2b1-M205 is quite common today among Adnani (North Arabian, non-Yemeni) Arabs from Jordan. The Egyptian Copt is nearby, but not as close as 3DRIF-26 and the Samaritans, therefore that might rule out an Egyptian origin for 3DRIF-26.

If you get the chance, can you do a closeup including our three Samaritans, 3DRIF-26, and the Bronze Age Levantines?

Matt said...

Differences in shape from the PCA in the paper:

http://i.imgur.com/NU1chXR.png

- greater distance of Iran_N from West Eurasia
- generally more compression of Euro_HG together and Anatolia_Neolithic and Levan_N together
- more displacement of WHG towards recent central European samples.
- Anatolian and Levant Neolithic are less close to (overlapping) recent ME generally.

Presumably this may be to some extent due to the a) inclusion of BedouinB, and b) lack of South Central Asians and lack of Volga populations, in the paper's PCA, leading to slightly different compression / pull in different parts of the PCA area.

Would it be burdensome for you to re-run your PCA with the same set of recent populations as Laziridis, to see what the different effect of these recent populations are?

Very much the same pattern though, and cool to see the Fu et al WHG samples together.

Lank said...

PCA looks great.

Will you try to estimate mixture proportions based on these ancient populations in modern groups?

Anonymous said...

What population, or populations, are meant by South Central Asian? Also, I cannot find the South Asian component on that PCA. What is the proxy?

Ariel said...

It would be interesting to know how that western anatolian chalcolithic sample compares with modern greek islanders (more caucasus and less red sea?). I don't understand why we have so few samples from the bronze age eastern mediterranea, and in general we need more late bronze age dna IMO (hittites, phoenicians, minoans...). The IE question is settled, now it's time for something else.

Samuel Andrews said...

@Everyone,

Anatolia_Chl is clustering with Cyriot. Italy and Balkans cluster north of AnatoliaChl. IMO, it is probably know migrations from Anatolia during Bronze age or earlier or later made a big impact on Southern Europe and brought most of the Y DNA J2 and E1b we see there today.

I had modeled Southern Europeans as part Cyriot with ANE K8 and D-stats for a while. I think I was right about Cypriot-like ancestry. The effect extends to Portugal, so not a land-based movement of genes.

Fanty said...

@Olympus:

@Rob,
"The world was imploding c. 4000 BC"

Can you elaborate just a little bit?...

-------------------

I think with "implosion" he means, that all those people move closer together. Wich is, because they all assimilate similiar populations and equalize their genepools.

Iranocentrist said...

I'm afraid we will need advance quantum computers, and alot more samples from all regions and time periods, before we solve the complex history of population movements. There are just too many factors and scenarios.

Alberto said...

Just starting to look at models using the PCA data, but one thing that I wanted to test is how did Natufian related to BedouinB, and if it will do away with the SSA admixture:

BedouinB:HGDP00607
"Israel_Natufian:I1072" 81.7
"Iran_Neolithic:I1945" 18.3

These PCA models tend to simplify things (or maybe D-stats based models tend to overcomplicate them, don't really know), so I'll wait to check with D-stats too when they're available. But the result does look interesting. For comparison, BedouinA:

BedouinA:HGDP00609
"Levant_Neolithic:I1699" 46.5
"Iran_Neolithic:I1945" 25.5
"Anatolia_Neolithic:I0707" 18.55
"Yoruba:HGDP00920" 8.1
"Israel_Natufian:I1072" 1.2
"Dai:HGDP01307" 0.15

Or Spanish Extremadura:

Spanish_Extremadura:HG01509
"Anatolia_Neolithic:I0707" 56.4
"Yamnaya_Samara:I0231" 18.55
"Villabruna_Loschbour:Loschbour" 14.35
"Satsurblia_Kotias:KK1" 6.95
"Yoruba:HGDP00920" 2.55
"Andronovo:RISE505" 1.15
"Afanasievo:RISE511" 0.05
"Israel_Natufian:I1072" 0

So it's not like Natufian is a substitute for SSA.

Samuel Andrews said...

@Alberto,

Put Anatolia_Chl in the model for Spanish and see how it turns out.

Rob said...

Matt or Alberto

Can you do a PCA plot but rotate 90' clockwise ?

Olympus Mons said...

Fanty,
Tks. I agree. it actually started earlier (around 5500 bc) but by 4900 BC down, it was like an implosion/explosion.

Unknown said...

There are obviously two invasions going on in Europe, one in Chalcolithic Anatolia - Bronze Age Hungary axis, and another one in the Yamnaya - Beaker Germany axis, which one spoke Indoeuropean languages is unclear, but the first group is 100% within modern European variation while the second group is mostly outside modern European variation, so that may be an indication.

Alberto said...

Another thing I wanted to check was if Iran_Neolithic has ASI. It's not easy to check directly with the populations available, but starting from a model for Kalash (no Onge, so using Paniya):

Kalash:HGDP00267
"Iran_Neolithic:I1945" 47.2
"Yamnaya_Samara:I0231" 38
"Paniya:PNYD1" 14
"EHG_Karelia_HG:I0061" 0.8
"Anatolia_Neolithic:I0707" 0
"Israel_Natufian:I1072" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Mal'ta_MA1:MA1" 0
"Levant_Neolithic:I1699" 0
"Villabruna_Loschbour:Loschbour" 0
"Yoruba:HGDP00920" 0
"Ami:NA13607" 0
"Satsurblia_Kotias:KK1" 0
"Andronovo:RISE505" 0
"Dai:HGDP01307" 0
distance=0.003709

Paniya is quite low compared to what we're used to see. And then removing Paniya:

Kalash:HGDP00267
"Iran_Neolithic:I1945" 62.45
"Yamnaya_Samara:I0231" 15.9
"Mal'ta_MA1:MA1" 14.05
"Villabruna_Loschbour:Loschbour" 6.2
"Dai:HGDP01307" 0.9
"EHG_Karelia_HG:I0061" 0.5
"Anatolia_Neolithic:I0707" 0
"Israel_Natufian:I1072" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Levant_Neolithic:I1699" 0
"Yoruba:HGDP00920" 0
"Ami:NA13607" 0
"Satsurblia_Kotias:KK1" 0
"Andronovo:RISE505" 0
distance=0.006984

Iran_Neolithic roughly takes the Paniya, and there is hardly any ENA in the model, just traces of Dai. The model is worse, but still quite good.

Trying to model Paniya:

Paniya:PNYD1
"Iran_Neolithic:I1945" 69.15
"Mal'ta_AfontovaGora3:I9050.damage" 16.85
"Dai:HGDP01307" 14
...
distance=0.043548

A poor model, but still surprising. It seems like Iran_Neolithic does have some good amount of ASI. We'd need to check this with some stats.

Alberto said...

@Samuel

No change including Anatolia_ChL:

Spanish_Extremadura:HG01509
"Anatolia_Neolithic:I0707" 56.4
"Yamnaya_Samara:I0231" 18.6
"Villabruna_Loschbour:Loschbour" 14.35
"Satsurblia_Kotias:KK1" 6.95
"Yoruba:HGDP00920" 2.55
"Andronovo:RISE505" 1.15
"Israel_Natufian:I1072" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Mal'ta_MA1:MA1" 0
"Levant_Neolithic:I1699" 0
"Ami:NA13607" 0
"EHG_Karelia_HG:I0061" 0
"Iran_Neolithic:I1945" 0
"Dai:HGDP01307" 0
"Paniya:PNYD1" 0
"Anatolia_Chalcolithic:I1584" 0

Alberto said...

@Rob

I don't have a software to make a plot out of this datasheet. And the data is the same used in Davidski's plot, so the end result would be the same as simply rotating that image? Not sure if that's what you want or something else.

Ariel said...

Alberto

What? No Anatolia ChL for Spain! That's huge!
Maybe anatolia ChL will show up in Greeks, Italians, Cyprus or Bulgaria. You should try...

Unknown said...

@samaritan DNA, based on his isotope and oracle information once you include a Egyptian copt reference, it seems likely he was of at least partial Egyptian origins.

Population data has been read. 207 populations found.
Personal data has been read. 20 approximations mode.
Person: Test1
Threshold of components set to 0.4%


Least-squares method.

Using 1 population approximation:
1 Egyptian_copt @ 7.157934
2 Palestinian @ 7.892861
3 Bedouin @ 9.735329
4 Jordanian @ 10.699103
5 Samaritan @ 10.882444
6 Egyptian @ 12.261471
7 Yemenite_Jewish @ 13.881736
8 Lebanese_Christian @ 14.134447
9 Syrian @ 15.455195
10 Saudi @ 15.633462
11 Libyan_Jewish @ 16.488793
12 Lebanese_Druze @ 16.849952
13 Tunisian_Jewish @ 16.947318
14 Lebanese_Muslim @ 17.665131
15 Cyprian @ 18.849272
16 Sephardic_Jewish @ 21.690977
17 Kurdish_Jewish @ 21.923758
18 Algerian_Jewish @ 21.983456
19 Iranian_Jewish @ 22.708555
20 Italian_Jewish @ 22.898936
207 iterations.

Using 2 populations approximation:
1 Egyptian_copt+Palestinian @ 4.284012
2 Egyptian_copt+Samaritan @ 4.59182
3 Tunisian_Jewish+Yemenite_Jewish @ 4.730545
4 Samaritan+Yemenite_Jewish @ 5.433585
5 Cyprian+Yemenite_Jewish @ 5.435184
6 Libyan_Jewish+Yemenite_Jewish @ 5.54292
7 Bedouin+Egyptian_copt @ 5.627492
8 Egyptian_copt+Lebanese_Christian @ 5.709157
9 Egyptian_copt+Jordanian @ 5.790701
10 Syrian+Yemenite_Jewish @ 5.863652
11 Samaritan+Saudi @ 6.055764
12 Jordanian+Yemenite_Jewish @ 6.058507
13 Lebanese_Muslim+Yemenite_Jewish @ 6.412908
14 Palestinian+Yemenite_Jewish @ 6.546099
15 Egyptian_copt+Syrian @ 6.594724
16 Sephardic_Jewish+Yemenite_Jewish @ 6.605741
17 Algerian_Jewish+Yemenite_Jewish @ 6.804541
18 Lebanese_Christian+Saudi @ 7.025647
19 Saudi+Tunisian_Jewish @ 7.061216
20 Bedouin+Yemenite_Jewish @ 7.087139
21528 iterations.

Using 3 populations approximation:
1 50% Egyptian_copt +25% Cyprian +25% Saudi @ 2.647178
2 50% Egyptian_copt +25% Lebanese_Christian +25% Saudi @ 2.976048
3 50% Samaritan +25% Egyptian_copt +25% Saudi @ 3.129694
4 50% Egyptian_copt +25% Cyprian +25% Yemenite_Jewish @ 3.139905
5 50% Egyptian_copt +25% Samaritan +25% Saudi @ 3.177979
6 50% Egyptian_copt +25% Lebanese_Muslim +25% Saudi @ 3.495424
7 50% Egyptian_copt +25% Lebanese_Muslim +25% Yemenite_Jewish @ 3.510308
8 50% Egyptian_copt +25% Lebanese_Druze +25% Saudi @ 3.610887
9 50% Samaritan +25% Egyptian_copt +25% Yemenite_Jewish @ 3.690019
10 50% Egyptian_copt +25% Syrian +25% Yemenite_Jewish @ 3.742132
11 50% Yemenite_Jewish +25% Samaritan +25% Tunisian_Jewish @ 3.790951
12 50% Samaritan +25% Egyptian +25% Yemenite_Jewish @ 3.835664
13 50% Egyptian_copt +25% Bedouin +25% Samaritan @ 3.862547
14 50% Egyptian_copt +25% Bedouin +25% Lebanese_Christian @ 3.91185
15 50% Yemenite_Jewish +25% Cyprian +25% Egyptian @ 3.970079
16 50% Egyptian_copt +25% Saudi +25% Tunisian_Jewish @ 3.979665
17 50% Egyptian_copt +25% Palestinian +25% Samaritan @ 4.053569
18 50% Egyptian_copt +25% Samaritan +25% Yemenite_Jewish @ 4.079284
19 50% Egyptian_copt +25% Sephardic_Jewish +25% Yemenite_Jewish @ 4.091116
20 50% Palestinian +25% Egyptian_copt +25% Saudi @ 4.102646
1457761 iterations.

Unknown said...

Compared to when Egyptian_copt is not included as a reference population in the oracle

Using 1 population approximation:
1 Palestinian @ 7.892861
2 Bedouin @ 9.735329
3 Jordanian @ 10.699103
4 Samaritan @ 10.882444
5 Egyptian @ 12.261471
6 Yemenite_Jewish @ 13.881736
7 Lebanese_Christian @ 14.134447
8 Syrian @ 15.455195
9 Saudi @ 15.633462
10 Libyan_Jewish @ 16.488793
11 Lebanese_Druze @ 16.849952
12 Tunisian_Jewish @ 16.947318
13 Lebanese_Muslim @ 17.665131
14 Cyprian @ 18.849272
15 Sephardic_Jewish @ 21.690977
16 Kurdish_Jewish @ 21.923758
17 Algerian_Jewish @ 21.983456
18 Iranian_Jewish @ 22.708555
19 Italian_Jewish @ 22.898936
20 Assyrian @ 26.329748
206 iterations.

Using 2 populations approximation:
1 Tunisian_Jewish+Yemenite_Jewish @ 4.730545
2 Samaritan+Yemenite_Jewish @ 5.433585
3 Cyprian+Yemenite_Jewish @ 5.435184
4 Libyan_Jewish+Yemenite_Jewish @ 5.54292
5 Syrian+Yemenite_Jewish @ 5.863652
6 Samaritan+Saudi @ 6.055764
7 Jordanian+Yemenite_Jewish @ 6.058507
8 Lebanese_Muslim+Yemenite_Jewish @ 6.412908
9 Palestinian+Yemenite_Jewish @ 6.546099
10 Sephardic_Jewish+Yemenite_Jewish @ 6.605741
21321 iterations.

Using 3 populations approximation:
1 50% Yemenite_Jewish +25% Samaritan +25% Tunisian_Jewish @ 3.790951
2 50% Samaritan +25% Egyptian +25% Yemenite_Jewish @ 3.835664
3 50% Yemenite_Jewish +25% Cyprian +25% Egyptian @ 3.970079
4 50% Yemenite_Jewish +25% Syrian +25% Tunisian_Jewish @ 4.141381
5 50% Yemenite_Jewish +25% Samaritan +25% Sephardic_Jewish @ 4.156833
6 50% Yemenite_Jewish +25% Algerian_Jewish +25% Samaritan @ 4.168106
7 50% Yemenite_Jewish +25% Kurdish_Jewish +25% Moroccan @ 4.197818
8 50% Yemenite_Jewish +25% Jordanian +25% Tunisian_Jewish @ 4.22339
9 50% Yemenite_Jewish +25% Kurdish_Jewish +25% Tunisian @ 4.26225
10 50% Yemenite_Jewish +25% Lebanese_Christian +25% Tunisian @ 4.274044
1550395 iterations.

Atriðr said...

Question: how is the third South Asian component inferred from this chart?
The grey-xs high up there? Half look like a two-way mix.

postneo said...

David Why not plot south Asians too since you have expanded the plot

Samuel Andrews said...

@Alberto,

You should take out CHG and see what happens. I'm confident Anatolia_Chl-relatives made an impact on Spain. Extra CHG would have come with EEF-related admixture.

@Everyone,

I think we should use stats of the form D(row, column)(X, Mbuti) for measuring admixture. A new spreadsheet with the new Mid East genomes will do wonders.

Seinundzeit said...

David,

Thanks!

This is pretty awesome, one can finally produce sensible models with this nMonte sheet (ones that match qpAdm and TreeMix).

Previously, it was almost impossible to model South Central Asians as having substantial LN/EBA European admixture, using PCA (even though all the formal methods pointed to this).

Not anymore. I suppose Iran_N truly is the missing piece which allows us to properly explicate Central/South Asian genetic history.

As noted previously, this is no surprise, since it shows a strong relationship to South Asia with something as basic as IBS (South Asian affinity goes quite down, and West Asian affinity goes way up, with Iran_Chalcolithic), it appears predominately "South Asian/Indian" in some K with ADMIXTURE, and clusters next to South Central Asia with PCA. Basically, "ANI" is definitely Eastern Europe/Steppe + Iran_N.

Kalash (HGDP00281)
39.6% BA Steppe (26.05% Andronovo + 13.55% Afanasievo)
34.8% Iranian Neolithic
25.6% Paniya

Pashtun (HGDP00224)
40.05% BA Steppe (25.45% Andronovo + 14.60% Afanasievo)
35% Iranian Neolithic
24.95% Paniya

Tajik (Ishkashim)
54.2% BA Steppe (mostly Andronovo)
27.2% Iranian Neolithic
16.6% Paniya
2% Ulchi

Tajik (Shugnan)
55.4% BA Steppe (mostly Andronovo)
35.3% Iranian Neolithic
5.9% Paniya
3.40% Ulchi

All of these models are also great fits, in terms of distance.

The Iranian results are pretty cool:

Iranian (Mazandarani):
63.7% Iranian_Chalcolithic
14.45% Iranian_Neolithic
13.35% BA Steppe
8.5% LBK_EN (I used LBK_EN, since it lacks the slight Iranian/CHG affinity seen in some Neolithic Anatolians. Also, it's basically 90%-95% Near Eastern, with only 10%-5% WHG. It's Near Eastern ancestry is already predominately WHG, so that's not even an issue)

Iranian (Lor)
49.7% Iranian_Chalcolithic
19.4% LBK_EN
10.9% BA Steppe
7.65% Iranian_Neolithic
6.75% Natufian
2.9% Paniya
2.7% Ulchi

Iranian (Bandari)
38.25% Iranian_Neolithic
19.2% LBK_EN
16.75% Iranian_Chalcolithic
12.85% BA Steppe
10.5% Paniya
2.35% Yoruba
0.1% Ulchi

Interesting differences within Iran.

The Paniya model as 75% Iranian_Neolithic + 25% Australian/Ami, or 70% Iranian_Neolithic + 20% Australian/Ami + 10% Afontova Gora3.

Although I think the identity of ANI has been solved for people like the Kalash/Pashtuns (ancient Eastern Europe/steppe + Iran_N, but with a bias towards the former), ASI is still a mystery. For that, we need aDNA from Upper Paleolithic/Mesolithic South Asia.

I don't think ASI is ENA, but who really knows. I'm sure we aren't far from figuring this out as well.

Samuel Andrews said...

What I'm confused about us how can Cypriot fit in that PCA as something like 70% Anatolia_N but score only 16% West_Med in Eurogenes K15.

Seinundzeit said...

On the topic of ASI, we have this though.

On David's West Eurasian PCA, South Central Asians seem to be in between Iran_N and the steppe, that's obvious.

But they also deviate (especially a subset) from both Iran_N and the steppe, in the same direction as the UP Siberian (ANE). I think that gives us a clue about Upper Paleolithic/Mesolithic South Asians.

Gill said...

You could also model some South Central Asians, based on that PCA, as Iran_N + MA1 (assuming Siberia_UP is MA1).

Gill said...

Or, if ASI/ASE is where we expect it (top right somewhere), then ASI + Caucasus or Southern Europe.

Gill said...

David, could you specify the different South Central Asian groups? Which is the furthest to the top?

Open Genomes said...

Here's the 3-D plot: :)
Eurogenes Lazaridis (2016) Ancient Near East Interactive 3-D PCA Plot

The Natufians appear to be partway between North Africans (Libyans) Eurasians. Some draw toward the Anatolian Neolithic, while others draw toward CHG.
It seems that the Natufians do in fact originate with an African population who admixed with Upper Paleolithic Near Easterners just after the LGM.

We know that two of the Natufians are E-Z830. One other Natufian possible haplogroup E-M123 sample, I1685, is negative for the E-Z1515 subclade of E-Z830 which contains the E-M293 subclade that is common in East Africa and Namibia.
YFull E-Z830 tree

The tMRCAs of E-PF1962 and its subclade E-M123 are both 18,900 ybp.
E-M123* (xM34) was found in a Bronze Age Armenian, RISE423.

A screenshot overview of the 3-D PCA plot showing the vector from North Africans to Eurasians with the Natufians in between
(The grey areas are the z-axis surfaces for the modern samples.)

A closeup of the 3-D PCA plot with annotations

Here we can see from the Radiocarbon Context Database that there was a separate African-linked culture, the Ramonian/Mushabian, in the Sinai and Negev 16,000-12,500 BCE (c. 18,000-14,700 ybp) contemporary with the Geometric Kebaran to the north.
Map of radiocarbon dated sites and cultures in the Near East 16,000-12,500 BCE

Perhaps it was the Ramonian/Mushabian culture which brought both the proto-Afro-Asiatic language and E-M123 to the Near East?

Karl_K said...

@Open Genomes

"It seems that the Natufians do in fact originate with an African population who admixed with Upper Paleolithic Near Easterners just after the LGM."

That is a possibility, but there is not enough data to talk about the direction of migrations or the locations of the people before those migrations.

There have obviously been a lot of people moving around North and East Africa and the Middle and Near East for a very long time.

We need a lot more samples, and especially earlier samples, to say who was where at what time.

The time depth of some of these splits were tens of thousands of years ago. We do not have any hard data for the genetics yet.

Tobus said...

I couldn't find any pigmentation data in the paper so I've summarised the two SLC's from the Reich dataset myself. I note there were zero heterozygous samples here, despite Karelia previously being flagged as such for SLC45A2 so I suspect these have been haploidised at some point (and so the %ages might be off for smaller sample sizes). Also note that one of the WHG samples (K01) is now showing a derived SLC24A5 which I've never seen before.
SLC24A5 SLC45A42
Anatolia_ChL 0% (0/0) 0% (0/1)
Anatolia_N 100% (16/16) 38% (8/21)
Armenia_ChL 0% (0/0) 100% (4/4)
Armenia_EBA 100% (1/1) 0% (0/3)
Armenia_MLBA 100% (2/2) 50% (2/4)
CHG 100% (2/2) 0% (0/2)
EHG 100% (2/2) 50% (1/2)
Europe_EN 86% (18/21) 37% (7/19)
Europe_LNBA 100% (31/31) 76% (28/37)
Europe_MNChL 100% (13/13) 33% (6/18)
Iberia_BA 100% (1/1) 0% (0/1)
Iran_ChL 100% (1/1) 0% (0/5)
Iran_N 100% (1/1) 0% (0/1)
Iran_recent 100% (1/1) 0% (0/1)
Levant_BA 100% (2/2) 0% (0/3)
Levant_N 50% (1/2) 33% (1/3)
SHG 75% (3/4) 67% (4/6)
Steppe_EMBA 100% (15/15) 31% (5/16)
Steppe_Eneolithic 100% (1/1) 0% (0/3)
Steppe_IA 100% (1/1) 100% (1/1)
Steppe_MLBA 100% (11/11) 71% (10/14)
Switzerland_HG 0% (0/1) 0% (0/1)
Ust_Ishim 0% (0/1) 0% (0/1)
WHG 33% (1/3) 0% (0/2)

Alberto said...

@Samuel

Removing Kotias still Anatolia_Chalcolithic is not working:

Spanish_Extremadura:HG01509
"Anatolia_Neolithic:I0707" 56.6
"Yamnaya_Samara:I0231" 15.45
"Villabruna_Loschbour:Loschbour" 14
"Andronovo:RISE505" 7.35
"Iran_Neolithic:I1945" 4.15
"Yoruba:HGDP00920" 2.35
"Anatolia_Chalcolithic:I1584" 0.1
"Israel_Natufian:I1072" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Mal'ta_MA1:MA1" 0
"Levant_Neolithic:I1699" 0
"Ami:NA13607" 0
"EHG_Karelia_HG:I0061" 0
"Dai:HGDP01307" 0
"Paniya:PNYD1" 0

However, including Iran_Chalcolithic (even leaving Kotias), does work:

Spanish_Extremadura:HG01509
"Anatolia_Neolithic:I0707" 53.75
"Villabruna_Loschbour:Loschbour" 14.3
"Yamnaya_Samara:I0231" 13.25
"Andronovo:RISE505" 9.35
"Iran_Chalcolithic:I1661" 6.95
"Yoruba:HGDP00920" 2.4
"Israel_Natufian:I1072" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Mal'ta_MA1:MA1" 0
"Levant_Neolithic:I1699" 0
"Ami:NA13607" 0
"Satsurblia_Kotias:KK1" 0
"EHG_Karelia_HG:I0061" 0
"Iran_Neolithic:I1945" 0
"Dai:HGDP01307" 0
"Paniya:PNYD1" 0
"Anatolia_Chalcolithic:I1584" 0

But I think we should wait for D-stats based sheets (if Davidski and Chad can include these new samples) to have a more definitive take on these kind of details.

Alberto said...

@Sam

Cypriots do take a good amount of Anatolia_ChL, though:

Cypriot:CYP19
"Anatolia_Neolithic:I0707" 30.5
"Anatolia_Chalcolithic:I1584" 25.6
"Iran_Chalcolithic:I1661" 21.7
"Israel_Natufian:I1072" 10.95
"Andronovo:RISE505" 5.55
"Satsurblia_Kotias:KK1" 4.1
"Villabruna_Loschbour:Loschbour" 0.8
"Dai:HGDP01307" 0.45
"Yoruba:HGDP00920" 0.2
"Iran_Neolithic:I1945" 0.15
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Mal'ta_MA1:MA1" 0
"Levant_Neolithic:I1699" 0
"Ami:NA13607" 0
"EHG_Karelia_HG:I0061" 0
"Yamnaya_Samara:I0231" 0
"Paniya:PNYD1" 0

But not Greeks. So again, let's see with D-stats what happens.

Alberto said...

@Gill

You could also model some South Central Asians, based on that PCA, as Iran_N + MA1 (assuming Siberia_UP is MA1).

You can. That's what happened when I first try to model an Indian population without Paniya:

Brahmin_UP:BR008
"Iran_Neolithic:I1945" 60.45
"Mal'ta_MA1:MA1" 32.85
"Ami:NA13607" 6.7
"Anatolia_Neolithic:I0707" 0
"Israel_Natufian:I1072" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Levant_Neolithic:I1699" 0
"Villabruna_Loschbour:Loschbour" 0
"Yoruba:HGDP00920" 0
"Satsurblia_Kotias:KK1" 0
"EHG_Karelia_HG:I0061" 0
"Yamnaya_Samara:I0231" 0
"Yamnaya_Kalmykia:RISE552" 0
"Afanasievo:RISE511" 0
"Andronovo:RISE505" 0
"Dai:HGDP01307" 0
distance=0.019377

But the model is not good (by the distance). Adding Paniya makes things much better:

Brahmin_UP:BR008
"Paniya:PNYD1" 46.55
"Iran_Neolithic:I1945" 27.45
"Yamnaya_Kalmykia:RISE552" 26
"Anatolia_Neolithic:I0707" 0
"Israel_Natufian:I1072" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Mal'ta_MA1:MA1" 0
"Levant_Neolithic:I1699" 0
"Villabruna_Loschbour:Loschbour" 0
"Yoruba:HGDP00920" 0
"Ami:NA13607" 0
"Satsurblia_Kotias:KK1" 0
"EHG_Karelia_HG:I0061" 0
"Yamnaya_Samara:I0231" 0
"Afanasievo:RISE511" 0
"Andronovo:RISE505" 0
"Dai:HGDP01307" 0
distance=0.00612

Ryukendo K said...
This comment has been removed by the author.
MfA said...

Open Genomes said...
E-M123* (xM34) was found in a Bronze Age Armenian, RISE423.


Both BA Armenians are M84+

RISE423 is E-L795(M84 level) xPF6751,
RISE416 Ä°S E-L788(M84 level)

Alexandros said...

After taking some time to absorb the shock of the Brexit.. let's get back to population genetics..

@Samuel Andrews 1
Indeed Anatolia_Chl appears to kind of cluster with Cypriots, but also with SE Europeans (mainly South Italians/Sicilians I suppose), in the PCA plot. I am not sure how robust is this finding and how much emphasis we should give on it. I am thinking if Chalcolithic Anatolians are genetically so close to modern Cypriots, then probably Chalcolithic Cypriots are also close to modern Cypriots, which would mean that genetically Cypriots did not change substantially over the past 5500 years. That would be totally unexpected and bizarre!

@Samuel Andrews 2
I am not sure about 70% ‘Anatolia_Neolithic’ in Cypriots. I have seen a breakdown by Alberto in this post which suggests something like 30% ‘Anatolia_ Neolithic’ in Cypriots (see below), which makes more sense and is also more consistent with their 16% West Med in Eurogenes K15. Having said that, note that although West Med K15 and Anatolia_N overlap substantially they are not identical. My impression is that Anatolia_N also contains some part of K15 East Med, while K15 West Med may contain a small proportion of WHG. I might be wrong though.

Cypriot:CYP19
"Anatolia_Neolithic:I0707" 30.5
"Anatolia_Chalcolithic:I1584" 25.6
"Iran_Chalcolithic:I1661" 21.7
"Israel_Natufian:I1072" 10.95
"Andronovo:RISE505" 5.55
"Satsurblia_Kotias:KK1" 4.1

Alexandros said...

@MfA
Are you sure that two BA Armenians are M123+ (M34+, M84+)? Can you provide a credible source for this? Thanks.

Samuel Andrews said...

@Alberto,

You shouldn't use so many possible ancestors for Cypriot. They're Iran_Chl, CHG, Natufian, Anatolia_N scores are basically a recreation of Anatolia_Chl.

I was told Anatolia_Chl is clustering with Cypriot maybe this person was wrong. Ive succesfully modelled Southern Europeans as MN+Steppe+Cypriot with several different methods, so i suspet Anatolia_Chl is an even more realistic reference. I do know that most of Southern Europe is directly North of Anatolia_Chl.

@Alberto, modeling,

I don't think you should use so many possible ancestors because they're all made up of the same components. Natufian and Anatolia_N can be used for largely the same purpose. Sometimes you should use a small collection of realistic ancestors.

MfA said...

@Alexandros

FTDNA M35 Project forum: http://community.haplozone.net/index.php?topic=3961.msg37508#msg37508
http://community.haplozone.net/index.php?topic=3961.msg37514#msg37514

Alberto said...

@Samuel

Cypriots do cluster with Anatolia_ChL, as show in the PCA. They only need a bit of extra Natufian ancestry:

Cypriot:CYP19
"Anatolia_Chalcolithic:I1584" 88.2
"Israel_Natufian:I1072" 11.8
distance=0.002211

And Anatolia_ChL itself:

Anatolia_Chalcolithic:I1584
"Anatolia_Neolithic:I0707" 55.55
"Satsurblia_Kotias:KK1" 44.45
distance=0.002333

The choice of populations always depends on the purpose. Here I was adding many populations simply to avoid making my own choices and let the algorithm choose by itself. There isn't a right way and a wrong way. It always depends on what you're trying to look at.

Davidski said...

OK, I updated the plot slightly. I took out the Bandari Iranians because they had a lot of Sub-Saharan ancestry.

@Samaritan DNA

https://drive.google.com/file/d/0B9o3EYTdM8lQUGdUTGY2Q05rYUk/view?usp=sharing

https://drive.google.com/file/d/0B9o3EYTdM8lQM3QtbGdBS0JmSUk/view?usp=sharing

@For the king & Gill

https://drive.google.com/file/d/0B9o3EYTdM8lQTlhjXzVHb2hSSTg/view?usp=sharing

Dimgray X Indian
Dimgray Plus Sindhi
Dimgray Fill square Pathan
Dimgray O Kalash
Dimgray Dot Brahui
Dimgray Star Balochi
Dimgray Diamond Tajik
Black Fill square Iranian_Lor
Black O Iranian_Mazandarani
Black X Iranian_Persian
Black Star Kurdish

@Matt

https://drive.google.com/file/d/0B9o3EYTdM8lQOWhrcnBrQS1DZ00/view?usp=sharing

vs

https://drive.google.com/file/d/0B9o3EYTdM8lQWVp0R0JfUW1QeWM/view?usp=sharing

@Lank

Yeah, you can use the Global PCA datasheet to get fairly accurate ancestry estimates with nMonte. I'll upload a new version of the sheet with more African populations tomorrow.

@Samuel & rk

I'm working on the D-stats sheets now.

Davidski said...

I think we should use stats of the form D(row, column)(X, Mbuti) for measuring admixture.

Huh, what? Don't you mean D(Chimp, row)(Mbuti, column)?

Ryukendo K said...
This comment has been removed by the author.
Davidski said...

It's not possible with the plot above. It has to be a much more simplified version. But it won't tell you what you want to know.

https://drive.google.com/file/d/0B9o3EYTdM8lQclUwMW1VdFI5WHc/view?usp=sharing

Anonymous said...

Like Samaritan DNA I especially appreciate the PCAtest2 plot with Samaritans and an Egyptian Copt. I did this edit the image to include the legend: http://j2-m172.info/wp-content/uploads/sites/3/2016/06/PCAtest2_Eurogenes_2016-06-23_detail-Levant.png
I was not able to find out which one of the three Jordan_EBA is nearest to 3DRIF-26 (Roman_Brits), would be nice to know if it is I1730. I also looked at the Interactive 3-D PCA Plot by open-genomes.organd there it seems the nearest is I1706 but not sure?
Out of interest: what are the outliers (isles) of the S_Europe samples on the PCA? I would guess those with Hungary_CA among them are Sardinians, but what are the other to the left of most other samples? I'm not sure about my assumption, is Italy_CA Ötzi?

Ryukendo K said...
This comment has been removed by the author.
Samuel Andrews said...

@Davidski,

D(Chimp, row)(Mbuti, column) is what Lazardis used to display differences between ancient Middle Easterners. It did a very good job.

D(Chimp, row)(Mbuti, column) doesn't do a good job at displaying Middle Eastern diversity. Cypriot is as close to CHG as EEF is. That doesn't make sense. Middle easterners aren't close to each other using that method and there's hardly any differences between stats amoung different middle easterners.

I think you should try that method soon. First make whatever D-stat spreadsheet you're planning on making then I'll give you my ideas for the other method.

Ryukendo K said...
This comment has been removed by the author.
Ryukendo K said...
This comment has been removed by the author.
André de Vasconcelos said...

@ J2-M172 Y-Hg Research

Yes, the bottom outliers in S_Europe (isle) are indeed Sardinians. The leftmost cluster is the Basque's. The group just to the right of the Basques are other Iberians, and on top of those you have South Slavs.

Samuel Andrews said...

Can someone post a link to the global PCA spreadsheet for nMonte that includes the new genomes?

David, what's with Iberia_BA in the PCA on this post? I thought he was of very coverage. His position makes sense none the less.

Samuel Andrews said...

@Ryu,
"but how do you propose to tackle the problem when it becomes Chimp Column Mbuti Column'

I don't know what u mean. BTW I am not ignoring the old method I'm suggesting we do both.

Alberto said...

@ J2-M172 Y-Hg Research

was not able to find out which one of the three Jordan_EBA is nearest to 3DRIF-26 (Roman_Brits), would be nice to know if it is I1730

The one that's overall (in all dimensions closer) seems to be I1730:

England_Roman:3DRIF-16
"Jordan_EBA:I1730" 100
"Jordan_EBA:I1705" 0
"Jordan_EBA:I1706" 0
distance=0.086801

But the one closer in the first 2 dimensions is probably I1706. In any case, notice that the overall distance is pretty high, so in this case the PCA is quite misleading. When all dimensions are taken into account these individuals are not that close, it seems:

England_Roman:3DRIF-16
"Yoruba:HGDP00920" 42.85
"Jordan_EBA:I1705" 28.8
"Jordan_EBA:I1730" 28.35
"Jordan_EBA:I1706" 0
distance=0.07296

England_Roman:3DRIF-16
"Palestinian:HGDP00675" 69.4
"Yoruba:HGDP00920" 30.6
"Jordan_EBA:I1705" 0
"Jordan_EBA:I1706" 0
"Jordan_EBA:I1730" 0
distance=0.067632

All are quite bad models, though. Not sure which populations would really make a good model for this 3Drif Brit.

@Samuel

The link for downloading the PCA data is in the post, under the PCA.

huijbregts said...

@Alberto, @ J2-M172 Y-Hg Research

distance to "England_Roman_outlier:3DRIF-26"
"Jordan_EBA:I1730",0.00851704173994703
"BedouinA:HGDP00615",0.00929623579735368
"Palestinian:HGDP00675",0.00981172767661231
"Palestinian:HGDP00677",0.0100628027904754
"Yemenite_Jew:YemeniteJew4675",0.0103150375665821
"Jordan_EBA:I1705",0.0103464969917359
"BedouinA:HGDP00614",0.0109068785635488
"Jordan_EBA:I1706",0.0113877126763894

Labayu said...

I noticed something interesting that might be worth looking at further. There are two Canary Islanders on the PCA in this new Lazaridis et al paper who cluster with the Sardinians pretty much. They’re even within the range of the outliers from the Europe_EN samples. Canary Islanders spoke what is believed to have been a Berber language and have been fairly isolated for over 2000 years. They should have a lot of recent Spanish admixture, but is that really where we should expect them to be if they’re intermediate between Spanish and say Moroccans without sub-Saharan admixture?

Matt said...

Davidski: https://drive.google.com/file/d/0B9o3EYTdM8lQOWhrcnBrQS1DZ00/view?usp=sharing
vs
https://drive.google.com/file/d/0B9o3EYTdM8lQWVp0R0JfUW1QeWM/view?usp=sharing


Thanks Davidski. The PCA with the same populations as in Laziridis matches all the features of that from the publication, including the "bowing" of the Anatolia Neolithic towards the recent ME (slightly off the line between Natufian and WHG) that doesn't seem as present on the PCA with South Central Asian populations. Also the whole Euro_HG cline (WHG->SHG->EHG->ANE) shifted West, and different relative distances of Iran_N to recent Caucasus. That strong relatedness of Iran_N to SCA really pulls it in that direction, and a similar thing seems to happen with the ANE sample.

(Putting them both side by side and matching the orientation and scale as close as I can to see the differences - http://i.imgur.com/xhR6yMP.png / http://i.imgur.com/LAuoUjZ.png (with rotation).

Anonymous said...

Thank you @André de Vasconcelos, Alberto, huijbregts for helping with my questions. The time gap between 3DRIF-26 and Jordan_EBA would not allow that they cluster totally close but at least some genetic continuity regarding the admixture and Y-Haplogroup seems to be there. Jordan_IA and other Levant_IA (and from other adjacent locations) samples would give the observations more confidence.

Atriðr said...

@Seinundzeit
Basically, "ANI" is definitely Eastern Europe/Steppe + Iran_N.
Yes. This right here is going to fill in all the blanks. Before the year's out, the I-E question (language) will be solved in the comments section of the internet. Probably on this blog.

Atriðr said...

@Davidski
Well done! This chart is fantastic. It's looking at history.

Davidski said...

I've updated the global PCA datasheet with more Africans, including the new sequence of Mota.

This is the link...

https://drive.google.com/file/d/0B9o3EYTdM8lQaU1aZmEwWktXNGs/view?usp=sharing

Samaritan DNA said...

David, thank you for adding the Samaritans to the plot. Would these be helpful for your Global9 datasheet as well?

@nee4speed111

The calculators that include Copts do show that 3DRIF-26 may be a mix of an Egyptian and a "Levantine" rather than something between a Samaritan and a Yemenite Jew. However, out of the three Behar Samaritans, only one clusters with the known non-admixed Samaritans:
Closeup of Eurogenes Global9 PCA showing 6 Samaritans and an Egyptian Copt

The cluster in the center, includes 149532, 168723, and 149533 along with Behar's GSM537032. These other Samaritans represent three of the four surviving male Samaritan lineages. The other two Behar Samaritans, GSM537033 and GSM537034 appear to be admixed, and are likely the descendants of two marriages that took place with Ashkenazi Jewish women in 1924 and 1926. The calculators are based on these Behar Samaritans, but to have accurate results, the calculators should be based on the four in the main cluster.


The significance of this is that Roman York gladiator 3DRIF-26 now clusters very closely with the two Early Bronze Age samples from Jordan. In fact, I1730 is now determined to be Y-DNA J2b1-M205, just like 3DRIF-26. J2b1-M205 does not exist among the Samaritan population today and this individual was likely not a Samaritan, but he does represent continuity between the Levantine Neolithic (I1701 just below) and Levantines who are unlikely to have recent Sub-Saharan African admixture like the Samaritans and Lebanese Christians. Notice that these Bronze Age Jordanians do not cluster with present-day Jordanians.

It is possible that 3DRIF-26 is a mix between a Levantine and a Copt, but given the nearly identical position of the 3DRIF-26 and the Bronze Age Jordanians, and the fact that one of them shares a Y haplogroup with him, a Y haplogroup that has a tMRCA only 1500 years before the Bronze Age, it would seem that 3DRIF-26 is a near-unadmixed representative of this Middle-Late Bronze Age Levantine population.

Perhaps someone can check for IBD between the 3DRIF-26 and the Bronze Age Jordanians, the Samaritans, the close Bedouin A, and the Copt?

That should settle the question of the origins of 3DRIF-26, as well as whether the Bronze Age Jordanian population has descendants in the region today.

Seinundzeit said...

RK,

Looking at that graph, and taking into consideration your points concerning Levantines and the Paniya, I think you're absolutely right.

ASI, for the most part, will turn out to be something which preceded the differentiation of West Eurasia and ENA (although probably with some ENA/Onge-related admixture).

If I'm not mistaken, Australians/Papuans can be modeled as a mixture between Onge-like ENA, and an unidentified population that was very rich in Denisovan admixture.

Perhaps ASI will turn out to be related to the non-ENA portion of Australasian ancestry, but without the heavy Denisovan admixture? A lot of possibilities, I suppose.

Chad said...

I think there may be a compression issue here. Anatolians should probably come from a point between Natufians and Iranians, going towards WHG.

For instance... Going from Natufians to Anatolians, WHG is no better an admixing pop than Iranians or CHG.

result: Natufian Anatolia_N WHG Iran_N -0.0001 -0.023 22890 22894 444721
result: Natufian Anatolia_N WHG CHG 0.0079 2.058 25716 25310 503017

Also, Natufians certainly aren't shifted towards UP Europeans. That could be another compression issue.

result: Mbuti.DG Natufian WHG Kostenki14 -0.0388 -7.635 24061 26003 472189

Rob said...

Im not sure if any one has pointed this out befor, but at a global scale, the apparent distance; or perhaps lack of obvious African affinities, of natufians, and what IMHO suggests "Basal" was concentrated around the gulf might suggest a much deeper split of AMHs ex-Africa. Perhaps the 100 Kya dispersal only failed toward the north Levant, but survived around the Gulf ?

Has anyone tried to use these late UP genomes from the Levant to re-date AMH dispersal formally ?

Unknown said...

You may be right Samaritan DNA, it may simply be that all of these groups share a broad similarity in ancestry. The main difference between 3DRIF-26 and the Bronze age levantine samples is that 3DRIF-26 has some African ancestry that the Bronze age Jordanians don't based on the study, which is why I thought he might have been a copt.

The Eurogenes k15 results of 3DIRF-26

ID 3DRIF-26
North_Sea 0.02
Atlantic 4.06
Baltic 0
Eastern_Euro 0
West_Med 11.24
West_Asian 10.99
East_Med 46.16
Red_Sea 20.98
South_Asian 0
Southeast_Asian 0
Siberian 0
Amerindian 0
Oceanian 0
Northeast_African 6.54
Sub-Saharan 0.02

The average Eurogenes k15 results based on 14 copt samples, some had higher and some had lower Northeast African Admixture


North_Sea 0.17
Atlantic 2.07
Baltic 0
Eastern_Euro 0
West_Med 11.42
West_Asian 6.67
East_Med 46.87
Red_Sea 20.38
South_Asian 0
Southeast_Asian 0.24
Siberian 0
Amerindian 0
Oceanian 0
Northeast_African 11.79
Sub-Saharan 0

The only real difference between 3DRIF-26 and the copts is that we have roughly double the East African admixture he does, other than that we are basically identical.

Here is the Samaritan reasult for eurogenes k15 provided for in the spreadsheet


North_Sea 1.11
Atlantic 3.28
Baltic 1.59
Eastern_Euro 0
West_Med 13.52
West_Asian 17.20
East_Med 45.51
Red_Sea 13.84
South_Asian 0
Southeast_Asian 0
Siberian 0
Amerindian 0
Oceanian 0
Northeast_African 2.32
Sub-Saharan 0.03

Now this could be based on a admixed Samaritan, as you said, but from the looks of it he seems to be much more similar to a copt than a samaritan. I'm also sharing with 2 copts on 23andme who have his same Y-DNA, J2b1 so I don't know

Davidski said...

@Chad

Keep in mind that the first two dimensions of a PCA won't show all of the intricacies that can be picked up with individual stats.

Also, Natufians certainly aren't shifted towards UP Europeans. That could be another compression issue.

I'm not sure what you mean here? Natufians are the least UP Euro shifted population apart from Neolithic Iranians and some modern Near Easterners with considerable Sub-Saharan admixture.

Chad said...

The UP argument was one in Anthrogenica, in a convo from posters here. Not on your blog. Sorry for the confusion.

Gill said...

Thanks!

I'm getting the feeling ASI, if we ever sequence it, is going to be is its own unique hunter gatherer type thing genetically drifted to the point of looking not much like the non-ENA part of Southeast Asians, albeit right where we expect it on the Eurasian family tree (like a cousin to the non-ENA part of Australasians).

That said, Yamnaya+Iran_N is basically Tajik/Pathan on the PCA so Brahmin_UP being half that and half Paniya makes sense. Paniya would look like half Iran_N and half something to the top left of Indians on the PCA (a little more in the direction of MA1 too).

But historically, and from the formal stats, it seems Indians would be a mix of a population to the top right of Indians on the PCA (where the Xs seem to lead, like an arrow) which itself could be modeled as half Iran_N and half something way up to the top (as far from MA1 as MA1 is from EHG and EHG is from SHG and SHG is from WHG, almost like a neat spread of HGs). That would require a three way combination, the other two being Steppe and the last a combination of Iran_N/Caucasus/Near_East/Mediterranean (there's a blank spot between Caucasus and Volga-Ural and Tajiks seem to be leading to it... Volga-Ural itself looks like Steppe slightly shifted towards Caucasus, so maybe these last two are one population occupying that blank spot).

Gill said...

I mean, this is the pattern that first jumps out at me when I look at it:

http://i.imgur.com/kC55gXF.png

Gill said...

And accordingly, Steppe just looks like Northern Europe shifted a little bit in the direction of the hypothetical South Eurasian (should have written South Eurasian rather than South Asian) HGs or Central Asian HGs (which would be a mix of South Eurasian HGs and ANE (North Eurasian HGs) and be slightly to the right of ANE on the plot, which is where the line seems to be leading from Corded Ware... slightly to the right of ANE).

Ryukendo K said...
This comment has been removed by the author.
Open Genomes said...

Here is an updated version of the 3-D PCA plot, including the complete set of Africans:

Eurogenese Lazaridis (2016) 3-Dimensional PC plot (updated to include North and East Africans, Mota, and the Yoruba)

I think we can see that the Natufians share drift with North Africans, leading toward East Africans. Two Natufians, I0861 and I1690 lie precisely between the Africans and the Anatolian Neolithic. Others appear to be "above the plane" of modern human variation, for some reason. However, another Natufian, 10861, appears to be halfway beween the Africans and CHG.

It seems that the Natufians were mixes between a "North African-like" (proto-Afro-Asiatic?) population and two separate Near Eastern hunter-gatherer populations, one "Anatolian Neolithic-like" and the other, CHG.

We know about CHG (and the Iranian Hotu Cave J2a* hunter-gatherer).
We haven't seen any sign yet of a "proto-Anatolian Neolithic" Mesolithic hunter-gatherer population. This ancestral hunter-gatherer population was very different than any of the Mesolithic European or northern Near Eastern hunter-gatherers. and equally distant from both WHG-SHG-EHG and CHG-IHG, and more distant from than any of these from Kostenki K14, Ust'-Ishim, and ANE, and East Asia.

I think it's accurate to describe this "proto-Anatolian Neolithic" population as "Basal Eurasian" because it's symmetrically related to everyone else in Eurasia - it's closer to no one.

From the PCA, it doesn't seem possible that the Anatolian Neolithic is a three-way mix of WHG, CHG, and Natufian.
One Levantine PPNB sample from 'Ain Ghazal, from a few hundred years before Barcin in Northwest Anatolia, looks like it has a higher percentage of this "ghost population" than the other PPNB Levantines.
It doesn't seem possible that this kind of admixture came from Northwest or even Central Anatolia to the Levant. It seems likely that it originated among hunter-gatherers along the Middle Euphrates, a region with no autosomal aDNA sequences. (mtDNA sequences are available from Tell Halula and presumably these samples are going to be sequenced soon.)

Here is the map of the radiocarbon-dated sites in the Levant and Anatolia during the LGM:
Radiocarbon dated sites in the Near East during the LGM, 21,500-16,000 calBCE (23,500-18,000 ybp)

This is the group that the Ramonian/Mushabians must have encountered when they left Africa and arrived in the Near East c. 18,000 ybp.
Perhaps these people moved north when it became warmer and that's why they were mostly replaced by the recently arrived African-shifted Natufians.

Is there any other explanation for this "third pole of Eurasian diversity" aside from an undiscovered isolated and highly-drifted Upper Paleolithic hunter-gatherer population?

Unknown said...

@nee4speed111 Perhaps the Roman is a southern levantine man and Copts share significant ancestry with this population? Otherwise what could be the reason the Bronze Age Jordanians cluster with Egyptians? I would love to see the K15 of these samples from Ain Ghazal. Also wish the Roman made its way to gedmatch.

Matt said...

@ Gill, the direction of the clines you've established makes sense, but I would say for ASI to literally be at that cline intersection you've placed it at, it would then a huge % in the Indian SCA populations proximate to that cline intersection. Something like 80% ASI. If the "Indians" (grey X) in the plot are North Indians / Brahmins and such that might be unlikely? Davidski would have to tell us who the samples are for us to sense check that.

Since these samples are fairly close to Sindhi and Pathan, it seems unlikely that they are close to the end point of the ANI-ASI cline, since that wouldn't leave much space for extant populations with much more ASI. And even the most ASI populations like Chenchu and Mala were only estimated as 59% and 62% ASI by Reich / Moorjani. (As much as this is old stuff).

If the assumptions about the level of ASI in the Indians in the sample change, the implied ASI changes: http://i.imgur.com/AvRFQdL.png for around 60% ASI in the Indians (puts ASI twice as much further on than in your South Asian cline), and then a cline intersections of Iran_N and ASI to impute a theoretical South Asian HG would change - http://i.imgur.com/mFwBv4G.png

Davidski said...

@rk

Its not a problem because we have so many natufians and anatolia_Neolithics and Iran_Ns and so on that we can always split the columns groups and have a population at both the rows and columns, so no big deal.

The only ancient sample set from Laz 2016 that I can put in both the rows and columns is Iran Chalcolithic. The others don't have enough data, so if I try and split them, they'll end up with D-stats based on less than 200K SNPs, which might result in wobbly outcomes. No point in doing that.

@Matt

The SC_Asia samples furthest up the plot have about 25% ASI, if that. They're from northern India.

The Siberia UP sample is AfontovaGora3.

Gill seems to think that the Neolithic Iranians are significantly ASI. But this is not so.

Grey said...

Alexandros

"which would mean that genetically Cypriots did not change substantially over the past 5500 years. That would be totally unexpected and bizarre!"

I wonder if Islands with a mountainous interior might behave differently to elsewhere - especially if they had a malarial coast.

1st wave farmers
2nd wave farmers - 1st wave hide out in the mountains
malaria
3rd wave farmers - 1st wave still in the mountains
malaria
etc

Alberto said...

@Matt

Yes, the perfect theoretical ANI would be almost exactly there. I would just make the line a tiny bit shorter at the ANI end so that ANI is a perfect match for the other half of Yamnaya (close to Lezgins and Tajiks). And basically an ANE/EHG shifted version of Armenia_ChL. If such population existed in Central Asia is a mystery, but the existence of Armenia_ChL not far in space and time makes it theoretically possible.

Another clear thing from that graph is that Balochi/Brahui could not possibly represent the IVC people. Steppe admixture would only work for Tajiks in that case. North India and Pakistan would need some variable mix of EHG/ANE and Paniya to get to their current positions.

Instead of using Brahui/Balochi as a base for modelling the Kalash, Pathan, GujaratiA, etc..., one can use GujaratiD instead, and that way Sintashta/Andronovo works at the other end. But for whatever reason the models are not good. It still prefers, for example, Armenia_ChL + a bit of EHG/AG3. So something closer to that theoretical ANI position.

Unknown said...

While reading the comments above, I especially noticed:
Anatolia_Chalcolithic:I1584
"Anatolia_Neolithic:I0707" 55.55
"Satsurblia_Kotias:KK1" 44.45
distance=0.002333
in a comment by Alberto from 24 June 4.28 AM.
Is it evidence of a migration into Anatolia from the east between the Neolithic and Chalcolithic? Is there further information on this subject?

Alberto said...

To show with models what I mean. If one uses Brahui as a base for modelling the Kalash, this is what you get:

Kalash:HGDP00267
"Brahui:HGDP00011" 72.55
"EHG_Karelia_HG:I0061" 10.1
"Paniya:PNYD1" 9.85
"Mal'ta_AfontovaGora3:I9050.damage" 7.5
"Sintashta:RISE395" 0
distance=0.004761

So basically it requires a mix of Paniya and ANE, and the distance is quite acceptable. But if instead of Brahui you use GujaratiD:

Kalash:HGDP00267
"GujaratiD:NA20847" 65.1
"Sintashta:RISE395" 34.9
"EHG_Karelia_HG:I0061" 0
"Paniya:PNYD1" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
distance=0.013079

Now it needs to go in the Sintashta direction, as expected. But the model is far from good. If I add Armenia_ChL to those source pops:

Kalash:HGDP00267
"GujaratiD:NA20847" 50.9
"Armenia_Chalcolithic:I1407" 35.8
"EHG_Karelia_HG:I0061" 10.6
"Mal'ta_AfontovaGora3:I9050.damage" 2.7
"Sintashta:RISE395" 0
"Paniya:PNYD1" 0
distance=0.005766

So it prefers a mix of Armenia_ChL + EHG/ANE, and the distance becomes acceptable again.

But this is all theoretical, especially using the PCA based data. Let's see how this goes with D-stats which I think are more reliable for these models.

Alberto said...

@Martin Clifford Styan

Is it evidence of a migration into Anatolia from the east between the Neolithic and Chalcolithic?

I think this sample clearly shows a migration of CHG-like people into Anatolia by the Chalcolithic. But it's only one sample for now, so we'd need more samples to confirm this with more certainty.

Grey said...

Open Genomes

"Is there any other explanation for this "third pole of Eurasian diversity" aside from an undiscovered isolated and highly-drifted Upper Paleolithic hunter-gatherer population?"

This may or may not be relevant to your question but I've wondered for a while if Bantu expansion then who was there before?

Matt said...

@ Davidski, thanks for the information. Not sure those North Indians will be 25% ASI exactly, but certainly I think far from anything like implied by the ASI position on Gill's annotation.

@ Alberto, yeah, if you build a simple cline through the Indian cline populations, the end point is always closer to Lezgin than Sintashta or Andronovo. True with simple D-stats as well: I found when using the D-stats to build a PCA space for Indian populations, Onge, Kharia and Dai included*, Indians form a simple cline, then pick a point past the end point of the cline and transform from PCA back to D-stats, result is closest to Lezgin, and there's some affinity to Andronovo and Sintashta, and some less to CHG.

I think this is an indication that a pre-migration to India "ANI" population picked up quite a bit of admixture similar to present day Iranians like admixture (maybe via female ancestors?) prior to migration to South Asia. It seems North Caucasus here are pretty much exactly between Sintashta and the Iran_Chalcolithic, and resemble an admixture of them.
(Although mtdna and ydna history may be quite different.)

So the Indo-Aryans (if we are comfortable identifying ANI with that) were both somewhat like Sintashta, somewhat like Chalcolithic Iranians, in equal measure, averaging out to mostly similar to Lezgins in deep ancestry.
Kind of looks like admixture in Iran, relative to Iran_Chalcolithic, is not that large minimal based on Iranian_Persian (Black X) - 10-20% Sintashta, 80-90% Iran_Chalcolithic.

*Including Onge, Kharia, Dai, allows you to intersect the ANI-ASI cline to the Dai-Kharia-ASI cline to find an end point ASI model. Btw, such an the ASI model doesn't seem to be particularly ANE like overall, but does have clear asymmetries towards EHG over SHG and WHG, and to CHG over Anatolia_N...

Alberto said...

@Matt

Thanks. So D-stats also place ANI in that area. Nice. We'll see now with the new datasheets with these samples how it goes.

That ANI-ASI cline is really interesting. Assuming it stands all the way to South India. Because it has several consequences. The one I pointed out above: that Balochi/Brahui can't be IVC relics, because it would make it impossible to explain modern variation. But extending the reasoning, it almost makes it necessary that ANI arrived as one population, rather than being a construct of many different migrations, which would break the cline (as -probably- "recent" input from Iran to Balochi and a bit to Sindhi breaks it, or different migrations of Iran_Neolithic would break it too).

Or maybe with more population we'd see more variation and the cline is only a mirage of having those few ones? From GujaratiD to Tajiks it seems to be valid. Not sure what would happen south of GujaratiD, though.

BTW, I tried adding Iran Chalcolithic to the mix, but still prefers Armenia_ChL + EHG/AG3 to Iran_ChL + Sintashta.

Kalash:HGDP00267
"GujaratiD:NA20847" 50.9
"Armenia_Chalcolithic:I1407" 35.8
"EHG_Karelia_HG:I0061" 10.6
"Mal'ta_AfontovaGora3:I9050.damage" 2.7
"Sintashta:RISE395" 0
"Paniya:PNYD1" 0
"Iran_Chalcolithic:I1661" 0

Matt said...

@ Alberto, seems valid all the way to Dravidian India. Balochi, Brahui, Makrani all break it, and form a cluster in D-stats, also the Sindhi, and then Burusho to a lesser degree (Burusho as they're North Asian admixed).

The Makrani look interesting because they always showed this pattern of having much less shared drift with South Asian populations and Steppe, and I had assumed this was due to recent Middle East ancestry, or even African ancestry. As they do have some in ADMIXTURE. Yet they seem to top out the IBS statistics that Sein provided for Iran_N, along with Balochi and Brahui. These populations really do seem plausible now to be to some degree like ASI+Iran_N / Iran_Cha relicits (as I think Sein may have proposed before), which would be quite cool.

PCA examples - http://i.imgur.com/jxptaGA.png

(Sintashta is not that bad as "ANI" though! - http://i.imgur.com/sYxfhhH.png / http://i.imgur.com/yi08n5I.png, only seems slightly more off cline in higher dimensions than a Lezgin like population, and much less input from this population would be required)

Neighbour joining with the D-stats - http://i.imgur.com/kyx4h1w.png

Thanks for trying the Iran_ChL + Sintashta mix, not sure what the divergence is due to, I suspect possibly some excess Anatolia_Neolithic like signal, might be worth cross testing all the different LNBA steppe (Andronovo etc) to see if there are any which work, if you have time / interest for that.

Alberto said...

@Matt

Wow, wonderful work! Amazing what you can do with those stats. It makes everything so clear.

What is so cool about ANI is that due to its centrality it not only fits perfectly for S-C Asia, but also theoretically for Europe as the other half of Yamnaya.

Nirjhar always remarks that new people arrived to India around 3800 BC. So as Rob put it: "The world was imploding bu 4000 BC!".

Pity that this is only theory and we don't have genomes from Central Asia from around 4000 BC. And then we have the case of R1a-Z93, which if the latest estimate is correct (some 5400 yo) it's too young to be south of the steppe at 4000 BC. For R1b-L23 things could look better, or even for R1a-M417, but r1a-Z93 complicates things :)

I tried adding all the Iran_Chalcolithic samples and all the Sintashta/Andronovo/Srubnaya ones, to see if any of them clicked:

Kalash:HGDP00267
"GujaratiD:NA20847" 40.6
"Iran_Chalcolithic:I1670" 29.55
"Srubnaya_outlier:I0354" 25.85
"Andronovo:RISE505" 4
"EHG_Karelia_HG:I0061" 0
"Paniya:PNYD1" 0
"Mal'ta_AfontovaGora3:I9050.damage" 0
"Armenia_Chalcolithic:I1407" 0
"Iran_Chalcolithic:I1661" 0
"Iran_Chalcolithic:I1662" 0
"Iran_Chalcolithic:I1665" 0
"Iran_Chalcolithic:I1674" 0
"Andronovo:RISE500" 0
"Andronovo:RISE503" 0
"Andronovo:RISE512" 0
"Srubnaya:I0232" 0
"Srubnaya:I0234" 0
"Srubnaya:I0235" 0
"Srubnaya:I0358" 0
"Srubnaya:I0359" 0
"Srubnaya:I0361" 0
"Srubnaya:I0422" 0
"Srubnaya:I0424" 0
"Srubnaya:I0430" 0
distance=0.00379

So yes, this is a pretty good model, but it's that Srubnaya_Outlier female that is a completely different thing from the R1a-Z93 guys we have.

https://drive.google.com/file/d/0B9o3EYTdM8lQc2dJWG5XSll0U1k/view

Removing the 2 Srubnaya outliers:

Kalash:HGDP00267
"GujaratiD:NA20847" 45.65
"Iran_Chalcolithic:I1670" 32.1
"EHG_Karelia_HG:I0061" 12.9
"Andronovo:RISE505" 6.9
"Mal'ta_AfontovaGora3:I9050.damage" 2.45
"Paniya:PNYD1" 0
"Armenia_Chalcolithic:I1407" 0
"Iran_Chalcolithic:I1661" 0
"Iran_Chalcolithic:I1662" 0
"Iran_Chalcolithic:I1665" 0
"Iran_Chalcolithic:I1674" 0
"Andronovo:RISE500" 0
"Andronovo:RISE503" 0
"Andronovo:RISE512" 0
"Srubnaya:I0232" 0
"Srubnaya:I0234" 0
"Srubnaya:I0235" 0
"Srubnaya:I0358" 0
"Srubnaya:I0359" 0
"Srubnaya:I0361" 0
"Srubnaya:I0422" 0
"Srubnaya:I0424" 0
"Srubnaya:I0430" 0
distance=0.004904

So that works, but with little left from Andronovo.

Alberto said...

Oops, I actually missed the Sintashta samples above. Here, though no change:

Kalash:HGDP00267
"GujaratiD:NA20847" 45.65
"Iran_Chalcolithic:I1670" 32.1
"EHG_Karelia_HG:I0061" 12.9
"Andronovo:RISE505" 6.9
"Mal'ta_AfontovaGora3:I9050.damage" 2.45
"Paniya:PNYD1" 0
"Armenia_Chalcolithic:I1407" 0
"Iran_Chalcolithic:I1661" 0
"Iran_Chalcolithic:I1662" 0
"Iran_Chalcolithic:I1665" 0
"Iran_Chalcolithic:I1674" 0
"Andronovo:RISE500" 0
"Andronovo:RISE503" 0
"Andronovo:RISE512" 0
"Srubnaya:I0232" 0
"Srubnaya:I0234" 0
"Srubnaya:I0235" 0
"Srubnaya:I0358" 0
"Srubnaya:I0359" 0
"Srubnaya:I0361" 0
"Srubnaya:I0422" 0
"Srubnaya:I0424" 0
"Srubnaya:I0430" 0
"Sintashta:RISE386" 0
"Sintashta:RISE392" 0
"Sintashta:RISE394" 0
"Sintashta:RISE395" 0

Alberto said...

Ok, here removing all the other populations to force a GuajatiD + Iran_ChL + Z93-steppe and see how good the model can get:

Kalash:HGDP00267
"GujaratiD:NA20847" 48.65
"Iran_Chalcolithic:I1670" 23.2
"Srubnaya:I0361" 14.15
"Andronovo:RISE512" 14
"Iran_Chalcolithic:I1661" 0
"Iran_Chalcolithic:I1662" 0
"Iran_Chalcolithic:I1665" 0
"Iran_Chalcolithic:I1674" 0
"Andronovo:RISE500" 0
"Andronovo:RISE503" 0
"Andronovo:RISE505" 0
"Srubnaya:I0232" 0
"Srubnaya:I0234" 0
"Srubnaya:I0235" 0
"Srubnaya:I0358" 0
"Srubnaya:I0359" 0
"Srubnaya:I0422" 0
"Srubnaya:I0424" 0
"Srubnaya:I0430" 0
"Sintashta:RISE386" 0
"Sintashta:RISE392" 0
"Sintashta:RISE394" 0
"Sintashta:RISE395" 0
distance=0.00547

So that's quite an acceptable model.

Seinundzeit said...

Albeto,

On the topic of Iran_N in relation to South Asia, Srkz did some new IBS comparisons. As was the case with Kurd's IBS work, the closest modern populations to Iran_N are Brahui, Baloch, Makrani, Kurd, Iranian, Pakistani Pashtun, Kalash, Sindhi, etc.

For comparison, Kotias affinity peaks in the Caucasus, with Georgians, Abkhazians, Ossetes, Adygei, etc.

It's pretty obvious from all the basic non-formal methods that the Near Eastern component in Central/South Asia emanates from a population very closely related (if not perhaps identical) to Iran_N

I'm noting all of this because, if I'm not mistaken, you disagree with the notion that Iran_N + ancient Eastern Europe/steppe is basically what ANI constitutes.

But that is the only evident conclusion, looking at the extremely strong relationship between Iran_N and South Asia, and taking into account that Pashtuns/Kalash/Pamiri can basically be modeled as an almost two-way mixture between Iran_N and ancient Eastern Europe/steppe on the PCA.

Even the Paniya come out overwhelmingly Iran_N with the global PCA datasheet, so there isn't much room for ASI in South Central Asians using PCA. Not sure if that's accurate, or again just a muddying of waters due to the relationship that holds between South Central Asia and Iran_N.

Probably the latter, because the steppe component is also underestimated using PCA. The d-stats n-Monte sheet has South Central Asians at 50%-60% BA steppe, in line with the pre-print. I'll share some models at the nMonte thread.

Side note: scheduled caste Gujaratis have more than their fair share of Z93, and are much more European-shifted compared to non-Brahmin South Indian Hindu people. So, I don't know why one would model South Central Asians as admixed with GujaratiD, since GujaratiD are products of the same admixture processes as Kalash, but with very different levels of the same broad genetic components. Your modelling would make one assume that GujaratiD are a relic population, which certainly isn't the case.

Alberto said...

@Seinundzeit

No, it seems I didn't express myself correctly.

On the topic of Brahui/Balochi as relic populations, it could well be the case. But relic from what? Early Neolithic South Asia? Maybe. My main point is not so much if they are relic or not (though I think they probably have "recent" Iranian admixture, since Balochi speak NW Iranian), but rather they can't be representative of the IVC people. This I think should be quite clear from the plot we've been talking about. If Brahui/Balochi were representative of the IVC, then we would need that after the IVC either one or two population had a big impact on North India and Pakistan. If they were 2 populations one would be Paniya-like, which is perfectly possible. But the other one would have to be EHG/AG3-like. Which seems quite impossible. Those are Paleolithic/Mesolithic populations that didn't exist by 1500 BC, except maybe a small bungh in remote mountainous areas.

Alternatively, it could be one diverse population with varying levels of EHG and Paniya. But I think this scenario is also impossible.

So we can agree that the IVC people would have to be somewhere in the ANI-ASI cline, because otherwise it wouldn't be possible to explain modern variation in South Asia.

The use of GujaratiD was only convenient because it's in that cline. It's in no way intended to represent a relic population. Just a baseline (which is not intended to even be realistic). Since Paniya is also in that cline, it could be also used for the same purpose.

Regarding what's ANI, we can also agree at where it would be placed. Matt's graphs are quite clear about it. The only question is the genesis of ANI. One possibility shown in my last model would be Andronovo/Srubnaya + Iran Chalcolithic, which I think would be quite in agreement with your own expectations. So I don't see any problem there. As you said, Iran_N seems high in Paniya already (without the steppe part), so while it might be part of ANI, it needn't have arrived late. On the contrary, it probably is a very old part of the Indian genetic makeup.

The only thing I see about this theoretical model is that for the steppe hypothesis to work, the IVC people would almost necessarily have to be like Paniya (Onge+Iran_N, more or less), and the BMAC people like Iran Chalcolithic. That way things would match.

In my own pet theory, it's pre-IVC people who would have to be like Paniya, while pre-BMAC people would be similar to Lezgins. And these people would have moved to India in the early 4th millennium (like to the steppe).

Of course this last part of my pet theory I don't expect you to agree with it. And anyway like most pet theories it will probably be disproved when we get ancient DNA from the area. No big deal, for me, since I'm not personally attached to it, nor did I invest years of research into it. I just went for what seemed to make sense at the time, and now just waiting for it to be disproved (or not!).

But for the rest, I think we can only basically agree, because the genetic components and the clines are there and it's hard to argue against them.

Dilawer (Eurasian DNA) said...


Graphs for the srkz IBS comparisons are posted at AG http://www.anthrogenica.com/showthread.php?7489-Lazaridis-et-al-The-genetic-structure-of-the-world-s-first-farmers-%28pre-print%29&p=166233#post166233

They show that Kurds have the highest IBS similarity to Iran Neolithic, next to Brahui, corroborating what I have been seeing with IBS and ADMIXTURE. So far Iraqi Kurd sample C2, 23andMe V4 genotyped has shown the longest shared segment of 6.6 cM at 50 SNP/ 1 cM thresholds using only 264K overlapping SNPs. I am quite certain that if the genomes were 100% overlapping at 550K SNPs, relatively long segment sharing could be observed at higher thresholds.

Iran N is another piece of the ancient Kurd - Brahui - Balochi connection, something that I have analyzed and discussed in depth all along. This is consistent with the NW Iran area being the ancient homeland of the Brahui and Baloch. qpAdm runs have shown many Brahui/Baloch and some Kurds to be near clades

Gill said...

@ Matt, your second image is what I was going for.

"ASI" should have been labeled "Neolithic South Asian" to be more specific and if you force Admixture to split South Asian components into ENA and non-ENA part, the non-ENA part gets taken up by Iran_N to the tune of 70+% total Iran_N in some South Asians. In typical admixture runs with a South Asian component (including all the old Eurogenes calculators), it hits 80% in Southeast India and over 40% in North Indians (under 40% mostly in Pakistan area).

The actual ASI as we popularly call it would be what I labeled as South Asian HG.

I think we might have to do away with "ASI" as a label. Because Ancestral South Indians are just ancestral Indians. The real originals would be South Asian Hunter Gatherers, and assuming they mixed with a cousin of Neolithic Iranian that, too, would be "Ancestral Indian" to modern South Asians.

I chose to model it that way because I think it's likely all South Asians today have ancestry from Neolithic Indians who had substantial ancestry related to a cousin of Neolithic Iranian (whose DNA we don't have yet, maybe Rakigarhi... I imagine it's similar to Iran_N with some ASI/SouthAsianHG). Those Neolithic Indians are probably ancestral to Dravidians and the majority of today's South Indians' admixture.

Gill said...

Kurd: How do modern Kurds compare to Iranian Chalcolithic? In terms of being modeled as Iran_Chl or IBS/IBD comparisons

Because when you dissect the South Asian component in Admixture calculators by hand for multiple individuals, the West Eurasian part looks more like modern Kurdish than anything (not even Gedrosian/Balochi/Brahui), so I wonder if there's some connection between modern Kurds and Iranian Chalcolithic?

Gill said...

So, the two scenarios I see:

1. It's possible that "ANI" is so thoroughly disseminated in the subcontinent (and has been for so long) that part of it gets captured as South Asian in Admixture. This fits with multiple later invasions/migrations from the Steppe at the end of the Bronze Age specific to North India on top of a more basal ANI layer.

This also explains why the West Eurasian part of South Asian components looks more Mediterranean/Steppe leaning than just pure Gedrosian/Balochi (i.e, like modern Kurds) and why this ANI is in the South Asian component, but Gedrosian gets pulled apart into a separate component (strong Iranian signal in the Indus valley).

The only issue then is separating and dating the original ANI as opposed to the Steppe migration which left a clear WHG signal in Haryana. Either that or the Jats in Haryana have just mixed less and better preserved the same WHG signal that the Indo-Aryans originally had. That's still tough to reconcile because there's high WHG signals in Nepalese Brahmins but not South Indian Brahmins. If they're all descended from the same stock, the South Indian Brahmins presumably were as endogamous as the Northerners. The WHG pattern strongly lends itself to a small, specific additional migration, not the ANI tidal wave that changed the face of India.

But the minimal WHG pattern overlaps with the descendants of Indo-Aryan civilization (peaking in Haryana, then nearby Brahmins and Jatts). How could Indo-Aryan civilization not be started by the ANI tidal wave which brought Caucasus+Steppe-like admixture into every nook and crevice of the subcontinent?

Successfully modeling Paniya as having Steppe with formal stats is a bit of proof in this regard.

2. The first South Asian HGs mixed with an Iran_Chl-like population (or Iran_N + ANE-heavy Central Asian) in the Neolithic. Later there was just one Steppe wave in North India for the Indo-Europeans.

Seinundzeit said...

Alberto,

But your'e making things far too complicated, in ways that just aren't really justified by the data, be it genetic or archaeological.

As Matt noted:

"Sintashta is not that bad as "ANI" though!"

Anyway, in so far as the Makrani/Brahui/Baloch cluster is concerned, they are unique because they've preserved more Iranian Neolithic ancestry than any other modern populations.

Also, the Baloch are simply Iranic speaking Brahui. Pakistani Baloch are quite genetically distinct from even nearby Bandari Iranians, but identical to the Dravidian Brahui (again, a Dravidian people! We should let that sink in, considering the whole Elamo-Dravidian notion).

Now, considering the fact that the roots of IVC lie in populations which came from the ancient Iranian plateau (as per craniometric analyses), and considering that the whole agro-pastoralist complex in northwestern South Asia is derivable from West Asian antecedents, we have a beautifully parsimonious scenario. I mean, Mehrgarh, the oldest Neolithic site in South Asia, lies in modern Balochistan!

Also, BMAC has long been construed by archaeologists as having roots in the Iranian plateau, that's just a basic fact about it's origins.

When one takes all of this into consideration, and when one notes that Iran_N is the primary vector for Near Eastern ancestry in South Asia (as per the genetic data), it becomes obvious that BMAC/IVC are going to be predominately Iran_N, and that the Balochistanis are the closest living populations to IVC.

Regardless, IVC was far too out in the east (and much too isolated in it's early development) to be affected by genetic movements related to Levant_N and Anatolia_N.

Also, no one is claiming that Balochistanis are identical to IVC. Just as they have the most Iran_N admixture out of all living populations, they have the most IVC-related ancestry out of all living populations. That's all.

They do have substantial BA steppe admixture, and some Arab-related admixture. We shouldn't expect to find BA steppe and Arab-related admixture in IVC.

Besides, I doubt it's reasonable to pin the "Indian cline" notion on one PCA plot. I can imagine at least 5 different ways of creating an "Indian cline" on that PCA. Creating fictive populations on a PCA seems very ad hoc + meaningless, especially when the actual populations on the PCA can account for things in an adequate manner.

In addition, the PCA isn't everything. We have formal methods, ones which show that ANI is constrained to be ancient Eastern Europe/steppe + Iran_N, it really can't be anything else (like Armenian_Chalcolithic or something, I'm not sure where someone at Anthrogenica got that notion from).

For this, one only has to refer to the Lazaridis et al. preprint, where this issue is explored.

And even ignoring that, the PCA itself lends credence to what's been done in the Lazaridis et al. preprint.

At the end of the day, IVC aDNA is going to be awesome, but it really won't hold many surprises.

Unknown said...

Gill, I have not looked at Iran Chl yet, neither have I done any formal analysis on anything yet either. I do plan on running dstats and qpAdm using the same outgroups as Lazaridis 2016. With regards to ANI in S Asians, Lazardis 2016 showed that it is most likely a combination of Iran N and Steppe. Their highest p values for ANI in qpAdm were for the combination Iran N and Steppe, as opposed to CHG and Steppe

Davidski said...

@Samaritan DNA

https://drive.google.com/file/d/0B9o3EYTdM8lQMVBxTkFFNjNKaUk/view?usp=sharing

Rob said...

Sein;
This PCA modified by Matt situates where ANI might lie.

Why are you so confident that IVC "can't be" anything other than Iran Neolithic ?
Do you not think it possible that the Indus Valley will have different settlement dynamics to Zagros Iran ? So of course they will have a hefty chuck of Iran Neolithic, but might harbour considerably more archaic-type ANE. ?

Rob said...

(I'm just asking out of interest rather than conviction, as South Asia isn't my forte ;) )

Seinundzeit said...

Rob,

The primary reason for why I think IVC will be predominately Iran_N is to be found in "Supplementary Information 9: Constraints on the origin of Ancestral North Indians", from "The Genetic Structure of the World's First Farmers" preprint.

It's a fairly rigorous examination, using the aDNA we have. Basically, they find that the West Eurasian portion of South Asian ancestry must be a combination of ancient Eastern Europe/Steppe + Iran_N. Other combinations are weaker, statistically speaking.

Also, Iran_N is just too similar to South Asians. Other Near Eastern genomes have proven to be rather poorly linked to South Asia, and some seem to be very un-South Asian, if I'm allowed to contort the English language (Anatolia_Neolithic comes to mind). Even CHG was very underwhelming, in terms of what it told us about South Asian genetic structure.

But Iran_N is totally different, it clearly appears to be directly ancestral to modern South Asians, as per many formal analyses. And even non-formal analyses show a very robust link. For example, in ADMIXTURE, Iran_N belongs to the "Indian/South Asian" modal cluster, at certain K. With PCA, Iran_N clusters alongside South Central Asia. In the Ganj Dareh paper, that sample clusters among the Brahui. And with IBS, South Asian populations appear to be the closest populations to Iran_N, alongside Iranians/Kurds.

With Iran_Chalcolithic, none of this is operative. Modern Iranians are clearly derived from the broad populational dynamics which led to Iran_Chalcolithic, as they seem to be like 90% Iran_Chalcolithic. I doubt that this is literally true, but it means something.

Honestly, I don't think IVC will be identical to Iran_N. IVC was a geographically huge phenomenon, stretching from colonies in Central Asia/northern Afghanistan all the way to contemporary northern India. But, it will be predominately Iran_N, that will definitely be the largest component to it.

Basically, Iran_N is to Central/South Asia what Anatolia_N is to Europe.

Rob said...

Thanks Sein
Makes solid sense; not that one doubts that IVC will have heavy N_Iran, but the question is the proportions, especially what other admixture exists, and where IVC might sit regionally .
I hope we confirm our hypotheses soon

Seinundzeit said...

Rob,

It'll be fun to see how things turn out.

I'm sure the Rakhigarhi data is going to be released very soon. That'll give us a solid idea of how things stand.

Unknown said...

"But Iran_N is totally different, it clearly appears to be directly ancestral to modern South Asians, as per many formal analyses. And even non-formal analyses show a very robust link. For example, in ADMIXTURE, Iran_N belongs to the "Indian/South Asian" modal cluster, at certain K. With PCA, Iran_N clusters alongside South Central Asia. In the Ganj Dareh paper, that sample clusters among the Brahui. And with IBS, South Asian populations appear to be the closest populations to Iran_N, alongside Iranians/Kurds."

Let me add to the above. In the admixture graph of Gallego-Llorento et al, from k=13 onwards, the GD13A sample persistently shows a small purple component which peaks among Mala in South Asia. This purple component correlates very well with the hypothetical ASI. So we have the ASI component in the Iranian Neolithic already at 10000 YBP. ASI is clearly a South Asian component and its origins lie in South Asia. How could this ASI be present in Iranian_Neolithic ?

One has to understand a few things to make sense of this. The Iranian Zagros Neolithic is quite out of place in the Near East as it is very different from the Levantine-Anatolian Neolithic. All the affinities of the Zagros Neolithic lie eastward and upto Mehrgarh in Baluchistan, Pakistan. The architecture at Ganj Dareh is similar to that of Mehragrh. But not only that, the burial is under the houses, just as in Mehrgarh. There is evidence of red ochre just as at Mehrgarh. It is therefore quite amazing that the present day inhabitants of Baluchistan are the closest to GD13A from Ganj Dareh, the site with strong cultural links to Mehrgarh.

To my mind, South Asian Neolithic is not simply derived from Iranian Neolithic. Iranian Neolithic has slightly older dates because it is much more studied and researched as compared to South Asian Neolithic. But note this - Zagros Neolithic people were mostly goat herders with perhaps some evidence of domestic sheep. Cattle only makes a sudden appearance around 5500 BC. On the other hand South Asian Neolithic, as exemplified by the sites of Mehrgarh and now Bhiranna in Haryana, have from the earliest levels dating to 7500 BC, the presence of Zebu cattle, river buffalo, sheep and goats. Zebu cattle & river buffalo have been proven to be domesticated in South Asia - based on genetic evidence. Iranian native cattle is Zebu and it also has an old presence of river buffalo. Both of these are clearly derived from South Asia. In addition, one line of domestic sheep, represented mtDNA A, which is more East Eurasian in its spread, has its greatest presence and highest diversity in South Asia. So the domestic sheep in South Asian Neolithic is also most likely of local origins and not derived from the Near East.

Considering all this, it looks rather strange that South Asian Neolithic is derived from Zagros Neolithic when South Asian Neolithic is much more complex than Zagros Neolithic. To my mind, it is equally plausible than Zagros Neolithic is just a westward expansion of South Asian Neolithic people from around Baluchistan. This would neatly explain the small but significant amount of ASI-like purple component in GD13A.

In other words, rather than a migration from Iranian Neolithic people into South Asia, I am in favour of South Asian Neolithic people from western regions of South Asia, migrating into Iran and forming the Zagros Neolithic. This would also explain why Zagros Neolithic looks so out of place in the Near East in comparison to Levantine & Anatolian Neolithic.

Alberto said...

@Sein

You could say that I'm oversimplifying things, because I am. But making them overcomplicated?

Placing Paniya (Iran_N + Onge-like) at the lower end of the ANI-ASI cline and placing Andronovo/Srubnaya + Iran_Chalcolithic at the higher end seems pretty simple to me, and quite in agreement with your hypothesis. So I still can't see what you find complicated or problematic. Sintashta indeed is not that bad ANI, but it's just not good enough. It needs something similar to Iran_ChL to make it good enough (which, from a geographical and temporal point of view seems realistic enough).

As for Brahui being representative of IVC, I think that you should try to model Brahmin_UP as Brahui + "Something else". And when you see what that "Something else" needs for a good model you will understand that the model doesn't work because it's completely unrealistic for that "Something else" to have arrived to India ca. 1500 BC.

Now I'll take a look at the D-stats based sheets to see how these stands there. And ultimately, yes, ancient DNA will tell us the truth and maybe destroy all the theory. But until then we can only have the theoretical approach. And at least we have to make it work.

Davidski said...

@Jaydeepsinh

Iran Neolithic doesn't have any ASI as defined in the Reich & Moorjani et al. papers. In fact, because it's more basal than Anatolia Neolithic, it's Anatolia Neolithic that is closer, albeit not significantly (Z less than 3), to Southeast Asians.

Mbuti.DG Bougainville Iran_Neolithic Anatolia_Neolithic 0.0062 1.982 737521
Mbuti.DG Dai Iran_Neolithic Anatolia_Neolithic 0.0035 1.238 736221
Mbuti.DG Papuan Iran_Neolithic Anatolia_Neolithic 0.0023 0.727 736220

Close descendants of Iranian Neolithic in South Central Asia, like the Balochis, on the other hand, clearly do pack a lot of ASI compared to Anatolia Neolithic (Z is significantly negative here).

Mbuti.DG Bougainville Balochi Anatolia_Neolithic -0.0146 -7.337 964608
Mbuti.DG Dai Balochi Anatolia_Neolithic -0.0209 -11.553 963034
Mbuti.DG Papuan Balochi Anatolia_Neolithic -0.0163 -8.231 963033

But, interestingly, this doesn't stop them from showing basically as much affinity to Iran Neolithic as modern Iranians, except those from Mazandaran.

Mbuti.DG Iran_Neolithic Balochi Iranian_Lor 0.0021 1.338 460609
Mbuti.DG Iran_Neolithic Balochi Iranian_Mazandarani 0.008 5.16 460609
Mbuti.DG Iran_Neolithic Balochi Iranian_Persian -0.0022 -1.371 460609

Mbuti.DG Iran_Neolithic Brahui Iranian_Lor 0.0039 2.381 460609
Mbuti.DG Iran_Neolithic Brahui Iranian_Mazandarani 0.0098 6.2 460609
Mbuti.DG Iran_Neolithic Brahui Iranian_Persian -0.0004 -0.274 460609

So, obviously, when Iranian Neolithic groups got to nowadays Pakistan and India, they eventually mixed with populations that were very Southeast Asian-like. It'll be interesting to see when this admixture happened when South Asian Neolithic and IVC ancient DNA comes it. I'm betting that it was a slow process, so some samples from IVC might have very low levels of this admixture, and thus show more affinity to Iran Neolithic than Balochis and Brahuis, or even Iranians.

@Alberto

The chances of any ancient Central Asians that don't have a lot of steppe admixture clustering with or near Lezgins on a PCA like the one above is zero. In fact, I'd put it at negative zero if that was actually possible.

Alberto said...

@Davidski

Of course you give that zero chances, no surprise about that. And as I said above, that hypothesis already has a big problem with R1a-Z93 being 5400 yo. and present in Sintashta. So no big deal about that. It was my old hypothesis where ANI would have arrived to India before the IVC instead of after. If it's wrong it's wrong. Soon enough we'll know.

But I do expect you to agree about the other option. That is, BMAC being similar to Iran_Chalcolithic (if it came from the Iranian Plateau, that seems to make sense) and therefor ANI being a mix of Andronovo and Iran_Chalcolithic. And the IVC people being South Indian-like. Because that's the option that makes sense for the steppe hypothesis in the way you and Seinundzeit have been arguing for, with a huge impact of Andronovo people.

And therefor I also expect you to agree with Brahui not being a good proxy for IVC, because otherwise the above scenario wouldn't work (and basically no realistic scenario would work at all).

Davidski said...

Iran_Chalcolithic is a new population in Iran, probably in large part from west or northwest of Iran. We can see this on the PCA.

So I don't expect BMAC to be like Iran_Chalcolithic. I think BMAC will be like Iran_Neolithic, but with significant admixture from the steppe and maybe Pamirs in the later layers, with some Z93 showing up as well.

This is essentially what Brahui are, although with a lot of ASI that the BMAC people probably lacked, and probably less steppe ancestry than late BMAC groups.

Btw, here's an interesting old paper with some nice diagrams and maps.

Biological Affinities and Adaptations of Bronze Age Bactrians: IV. A Craniometric Investigation of Bactrian Origins

https://www.researchgate.net/publication/13301833_Biological_affinities_and_adaptations_of_Bronze_Age_Bactrians_IV_A_craniometric_investigation_of_Bactrian_origins

Alberto said...

Ok, Iran_Neolithic works too, even better than Iran_Chalcolithic:

Kalash:HGDP00267
"Iran_Neolithic:I1945" 34.65
"GujaratiD:NA20847" 29.85
"Andronovo:RISE505" 22.2
"Andronovo:RISE512" 7.75
"Srubnaya:I0422" 5.55
distance=0.00306

Or using Paniya instead of GujaratiD:

Kalash:HGDP00267
"Iran_Neolithic:I1945" 42.5
"Srubnaya:I0422" 32.7
"Paniya:PNYD1" 19.1
"Andronovo:RISE512" 3.6
"Iran_Neolithic:I1949" 2.1
distance=0.006276

Gill said...

Roughly averaged estimate of West Eurasian part (approx half) of HarappaWorld S-Indian (this is probably ANI):

28 Baloch, 41 Caucasian, 17 NE-Euro, 2.0 Med, 12 SW-Asian

Iran_Chl:

2.69 S-Indian, 35.07 Baloch, 48.00 Caucasian, 0.10 NE-Asian, 0.92 Papuan, 0.15 American, 11.98 SW-Asian, 1.09 W-African

1.76 S-Indian, 28.58 Baloch, 49.12 Caucasian, 0.39 NE-Asian, 0.29 Papuan, 0.83 Med, 18.05 SW-Asian, 0.99 W-African

Bronze age Armenia:

23.13 Baloch, 38.66 Caucasian, 18.57 NE-Euro, 0.54 Siberian, 0.92 Papuan, 1.30 American, 0.59 Beringian, 9.34 Med, 4.94 SW-Asian, 1.99 W-African

Keep in mind in this calculator, there is a Baloch component that's approximately 35% ANE, 9% ASE, and rest ENF.

So as long as you're modeling South Asia with some roots in Neolithic Balochistan/Pakistan with a high-ANE component, then it could look as if something similar to Steppe+Iran_Chl is what brought Indo-Aryans and R1a-Z93 to all of India, and it is distinct from Iran_N or a relative of Iran_N which still predominates from around Balochistan.

The Steppe part could include extra ANE, East Eurasian, or whatever, but FWIW, the other half of the South Indian component was roughly:

5.8 ANE, 17.8 ASE, 49.8 E-Eurasian, 23.4 Papuan, 0.4 San, 1 Pygmy, 1.4 W-African, 0.4 E-African

I think that ANE should have gone into the West Eurasian half but it was hard to tell.

The other option is that there was no high-ANE population in Balochistan and it's Iran_N-like + something from the Steppe

Gill said...

The spread of L657 also favors a route through Iran since it's present in Arabs and Iranians more than Z2124, the one found in Sintashta, and which is still present in Afghans and South Central Asians.

Gill said...

Matt, I went back and added a couple lines to yours:

http://i.imgur.com/roVLGQM.png

Steppe is mostly responsible for pulling everything to the left, whether it be in a straight axis (yellow line) from Iran_N to Steppe with Balochis, Pathans, and Tajiks in the middle or the major deviation to the left of the perhaps older axis (thin purple line going from the base of Indians through Sindhis, Balochis to Iranians and Kurds) represented by that bold red line you drew headed from ASI to ANI. It would seem most of the subcontinental populations formed along that ASI->ANI axis or ASI->Iranian axis in actual history.

And ANI's location there fits with my post above. Bronze Age Armenia with a tad more ANE/Steppe or Chalcholithic Iranian (or like some modern Iranian/Kurd-like groups) with a bunch more Steppe.

Even then, there's going to probably be a few more deviations going from the Indians more directly towards Steppe I imagine, especially with some of the high WHG populations.

So while one can model these groups as primarily Iran_N and Steppe, the real axis which differentiates all South Asians would be from ASI to ANI. And then to complicate things, there was probably additional merging with each other.

So it could be ancient Indians mixed with a Caucasus/Chl-or-later_Iranian-like population with a little bit of Steppe (ASI + ANI on that PCA)... or it could be Neolithic Iranians mixed with a lot of Steppe in South Central Asia and then the resultant population entered India and mixed with ancient Indians causing them to pulling off to the top right on the PCA.

Oh and FWIW, most Punjabi Jatt origin myths/legends (the original ones that come with the clan names) claim an origin near the Caspian Sea and Iran rather than Transoxania (perhaps since the era of British anthropologists they started claiming heritage from straight north). Perhaps all this means is that a westernly origin was more culturally important in Indian origin myths (regardless of where Jatts actually came from, since like half their Y-DNA lines are local Neolithic in origin), but even that's a significant clue about Indo-Europeans' arrival in India.

Davidski said...

ANI is a statistical construct of at least two very different real West Eurasian populations, so both of these real groups can't cluster in the same spot. One will be significantly north of the other.

Nathan Paul said...

Davidski says indigenous South Asians, which is exactly what they are...

Gill said...

I just ran into this Punjabi/Haryanvi (I'm assuming Jatt from their last name until I get confirmation) person in my Gedmatch relative list:

ANE 29.62
ASE 12.98
WHG-UHG 17.29
East_Eurasian 2.25
West_African 0.98
East_African 1.54
ENF 35.34

The previous high for K7 WHG-UHG in South Asians was 13.68% in two Punjabi/Haryanvi Jatts and he's several % over that. I figured there must be more individuals like this around.

Contrast with:

K7 Sintashta
30.99% ANE
8.25% ASE
43.05% WHG-UHG
0.05% East_Eurasian
1.84% West_African
0.00% East_African
15.84% ENF

K7 Andronovo
30.01% ANE
8.04% ASE
37.56% WHG-UHG
5.03% East_Eurasian
0.00% West_African
0.00% East_African
19.36% ENF

K7 Afanasievo
48.33% ANE
10.31% ASE
27.49% WHG-UHG
0.00% East_Eurasian
0.00% West_African
0.00% East_African
13.87% ENF

K7 Afanasievo 2
37.88% ANE
11.19% ASE
33.92% WHG-UHG
1.91% East_Eurasian
0.00% West_African
0.00% East_African
15.11% ENF

Joe Flood said...

I really hate the way people now post stuff without any kind of annotation or explanation, so that no-one has the faintest idea of what they are talking about, whether it is accurate or in any way relevant to anything.

Unknown said...

@Alberto. I’m on the same subclade J-CTS11760 on the Yfull tree as you. I’m YF09526.