Saturday, August 20, 2016

The Zoroastrians

According to Broushaki et al. 2016, Zoroastrians are genealogically one of the most closely related present-day groups to early Neolithic farmers from the Zagros Mountains in western Iran.

I won't quibble with this; the haplotype analysis in the paper makes sense. But it's clear to me that in terms of overall genetic structure, it's actually the Mazandarani Iranians from the Human Origins dataset that currently show the highest affinity to the Neolithic samples from Iran. You can see that on the graphs below based on double outgroup D-stats.

Another thing that shows up really well on these graphs is the somewhat cryptic genetic division, more or less represented by the red line of best fit, between western and southern Central Asia based on inflated affinity to ancient samples from Iran and the Eurasian steppe, respectively.

Interestingly, the Zoroastrians, in fact basically the same samples as in Broushaki et al. 2016, are the Iranian group closest to crossing the red line. What this suggests is they probably harbor the highest levels of Bronze Age steppe ancestry amongst present-day Iranians, or at least those featured in this analysis.

The datasheet used to make the graphs can be downloaded here. The Zoroastrian and Fars samples are freely available at figshare here, while the rest of the modern and ancient samples are offered via different datasets at the Reich Lab website here.

Update 23/08/2016: I added the Zoroastrians to my qpAdm tour of Iran (see here). They score 20.9% Yamnaya-related admixture, which is higher than any of the other groups in the analysis. Next best are the Mazandarani Iranians with 17.8%. In other words, qpAdm basically confirms the results from my graphs.


Broushaki et al., Early Neolithic genomes from the eastern Fertile Crescent, Science 14 Jul 2016, DOI: 10.1126/science.aaf7943

Shaikorth said...

Does this also look similar with WC1, which was the higher-coverage sample they used for the haplotype analysis? And if it does, how many populations there are that would be closer to the Zoroastrians than to Mazandarani with Mbuti Ancient Mota X?

Matt said...

Mazandaranis seem quite outlying in the affinity to Hotu Cave and Satsurblia as well: (Makes sense for a people from Northern Iran?).

I think there may be some very low level difference between the Zoroastrian group and others in the level of ASI or recent Central Asian ancestry as well. On the stats for Dai vs Ust Ishim (and also putting these in PCA with including EHG and WHG, which difference is linked to Dai vs Ust Ishim), it looks like the other groups are more intermediate between Zoroastrians and the Makrani-Balochi-Brahui axis:

Matt said...

@ Shaikorth, if this is what you're asking, comparing Iranian_Zoroastrian and Iranian_Manzandari with D(Mbuti,Ancient)(Mota,X) stats;

Zoroastrian more related to: Boncuklu, Natufian, El_Miron, Kostenki14, Villabruna, EHG
Less related to: Dai, Iran_Hotu, Karitiana, Satsurblia
Equal: AG3, MA1, Ust_Ishim

Albeit the divergences are quite fine (By more / less I indicate difference greater than 0.001 in the stats, equal is a lesser difference than that). Strongest differences more Villabruna+Boncuklu for Zoroastrian, more Iran_Hotu+Satsurblia+Dai for Manazandari.

Kurd Dgk said...


If it is not too much trouble, could you superimpose the Iraqi Kurd sample I sent you (and perhaps the avg of the Feyli Kurds) on your graphs, since the neolithic samples were recovered from Kurdistan. It is too bad they did not include them in their analysis, as what I am seeing is that some of them are more Iran N shifted than some of the Iranian samples they used.

With regards to:

'Interestingly, the Zoroastrians, in fact basically the same samples as in Broushaki et al. 2016, are the Iranian group closest to crossing the red line. What this suggests is they probably harbor the highest levels of Bronze Age steppe ancestry amongst present-day Iranians, or at least those featured in this analysis."

It is possible, or the alternative would be that they are less basal or SSA than the other Iranians, since basal/SSA in X reduces D in (X, Mota, Yamnaya, Mbuti)

Davidski said...

I'd need Kurdish samples genotyped on the Human Origins marker panel, because in this analysis missing markers will cause problems.

You Kurdish guys should get in touch with Broad MIT/Harvard and ask them to include Kurds in the Human Origins.

Kurd Dgk said...

The Kurdish samples also share more drift with Iran Chl and Iran WC1, than both Iranian samples, and SC Asian samples.

Sorted with samples sharing most drift on top

Iran_LN Mota Iran_ChL Mbuti.DG 0.2996
Iran_N1 Mota Iran_ChL Mbuti.DG 0.2986
.Kurd_C2 Mota Iran_ChL Mbuti.DG 0.2956
Iran_Hotu Mota Iran_ChL Mbuti.DG 0.2931
.Kurd_F8 Mota Iran_ChL Mbuti.DG 0.292
.Kurd_F7 Mota Iran_ChL Mbuti.DG 0.2907
.Kurd_C3 Mota Iran_ChL Mbuti.DG 0.2892
.Kurd_C1 Mota Iran_ChL Mbuti.DG 0.2859
Iran_Mazandarani Mota Iran_ChL Mbuti.DG 0.2814
.Sein Mota Iran_ChL Mbuti.DG 0.2805
Iran_Lori Mota Iran_ChL Mbuti.DG 0.2786
Iranian Mota Iran_ChL Mbuti.DG 0.2777
Kalash Mota Iran_ChL Mbuti.DG 0.2705
Balochi Mota Iran_ChL Mbuti.DG 0.2703
Brahui Mota Iran_ChL Mbuti.DG 0.2696
Pathan Mota Iran_ChL Mbuti.DG 0.2688
Makrani Mota Iran_ChL Mbuti.DG 0.2675

Iran_N1 Mota Iran_N_WC1 Mbuti.DG 0.307
Iran_LN Mota Iran_N_WC1 Mbuti.DG 0.3014
Iran_Hotu Mota Iran_N_WC1 Mbuti.DG 0.2892
Iran_ChL Mota Iran_N_WC1 Mbuti.DG 0.2866
.Kurd_C3 Mota Iran_N_WC1 Mbuti.DG 0.2695
.Kurd_F8 Mota Iran_N_WC1 Mbuti.DG 0.2692
.Kurd_C2 Mota Iran_N_WC1 Mbuti.DG 0.2682
.Kurd_C1 Mota Iran_N_WC1 Mbuti.DG 0.2678
.Kurd_F7 Mota Iran_N_WC1 Mbuti.DG 0.2652
.Sein Mota Iran_N_WC1 Mbuti.DG 0.2637
Iran_Mazandarani Mota Iran_N_WC1 Mbuti.DG 0.2627
Iran_Lori Mota Iran_N_WC1 Mbuti.DG 0.2573
Balochi Mota Iran_N_WC1 Mbuti.DG 0.2568
Kalash Mota Iran_N_WC1 Mbuti.DG 0.2563
Iranian Mota Iran_N_WC1 Mbuti.DG 0.2554
Brahui Mota Iran_N_WC1 Mbuti.DG 0.255
Pathan Mota Iran_N_WC1 Mbuti.DG 0.2547
Makrani Mota Iran_N_WC1 Mbuti.DG 0.2526

Again, they would plot to the rt of the red line towards the top

Kurd Dgk said...

I have posted a plot with Kurds included at AG at

Davidski said...

What are the marker counts for these Kurds vs the Human Origins and Broushaki et al. Iranians?

Kurd Dgk said...

The counts are about 400K for the individual Kurd, Pashtun, Iranian, Pathan, S Asian, Pashtun dataset, and Tajik dataset, and around 300K for the high coverage ancients. The HO samples are at about 120K, however, I checked before with an evenly pruned all inclusive dataset at about 120K, and the only thing affected was the Z, but not the D.

Davidski said...

The Kurds look out of position. They should be close to the Lurs; probably a bit further up the graph, but not past the red line.

Kurd Dgk said...

The reason is that they share more drift with EHG and Yamnaya than Iran Lori.They also cluster closer to Chechens than Lurs. I just posted one to one comparisons with Kurd C3 vs Yamnaya at the above link.

I also posted absolute comparisons with EHG. Here the difference between Lurs and my Kurd samples becomes even more pronounced, suggesting more actual steppe geneflow into the Kurd samples than the Lurs. I have also posted some excellent qpAdm fits for Kurd C3 here, which show that pops basal to Kurds, such as Iran N and Anatolia N do not have enough EHG to account for all the EHG in Kurd C3.

Davidski said...

Try running all of the samples on around the same numbers of SNPs, give or take a couple of thousand.

Shaikorth said...

Kurd, do these use the same amount of SNP's?

.Kurd_C3 Mota Iran_N_WC1 Mbuti.DG 0.2695
.Kurd_F8 Mota Iran_N_WC1 Mbuti.DG 0.2692
.Kurd_C2 Mota Iran_N_WC1 Mbuti.DG 0.2682
.Kurd_C1 Mota Iran_N_WC1 Mbuti.DG 0.2678
.Kurd_F7 Mota Iran_N_WC1 Mbuti.DG 0.2652
Iran_Mazandarani Mota Iran_N_WC1 Mbuti.DG 0.2627
Iran_Lori Mota Iran_N_WC1 Mbuti.DG 0.2573
Balochi Mota Iran_N_WC1 Mbuti.DG 0.2568
Kalash Mota Iran_N_WC1 Mbuti.DG 0.2563
Iranian Mota Iran_N_WC1 Mbuti.DG 0.2554
Brahui Mota Iran_N_WC1 Mbuti.DG 0.255
Pathan Mota Iran_N_WC1 Mbuti.DG 0.2547
Makrani Mota Iran_N_WC1 Mbuti.DG 0.2526

Kurd Dgk said...


They don't. The project members are at about 400K, including the Pashtun and Tajik sets, but the HO samples are at 117K.


You are right. I guess the difference in D is not noticeable when the marker differences are a few thousand, which I was going off of, but when it is a couple of hundred thousand it becomes noticeable. I will redo everything. Project members are not affected, by HO samples are. Also, one to ones, such as individual comparisons with Kurd C3 are not.

This is my 1st dataset, where I did not prune back to similar markers, go figure....

Anyways, here is the redone comparison with Iran Chl

Iran_LN Mota Iran_Chl Mbuti.DG 0.2925 34.945 49019
.Kurd_C2 Mota Iran_Chl Mbuti.DG 0.2865 50.971 94856
Iran_Mazandarani Mota Iran_Chl Mbuti.DG 0.2816 60.239 95025
.Kurd_F7 Mota Iran_Chl Mbuti.DG 0.2814 50.254 94971
Iran_N Mota Iran_Chl Mbuti.DG 0.2806 41.481 78922
.Kurd_F8 Mota Iran_Chl Mbuti.DG 0.2798 48.451 94696
.Kurd_C3 Mota Iran_Chl Mbuti.DG 0.2791 50.42 94957
Iran_Lori Mota Iran_Chl Mbuti.DG 0.2784 57.89 95025
Iranian Mota Iran_Chl Mbuti.DG 0.2778 59.481 95144
.Kurd_C1 Mota Iran_Chl Mbuti.DG 0.2735 49.572 94764
.Sein Mota Iran_Chl Mbuti.DG 0.271 49.145 94264
Kalash Mota Iran_Chl Mbuti.DG 0.2705 57.341 95144
Balochi Mota Iran_Chl Mbuti.DG 0.2704 58.919 95144
Brahui Mota Iran_Chl Mbuti.DG 0.2698 58.429 95144
Pathan Mota Iran_Chl Mbuti.DG 0.269 59.654 95144
Makrani Mota Iran_Chl Mbuti.DG 0.2677 57.963 95144

Gaspar said...

The Yazidi people of northern syria are associated with the Zoroastrians

Kurd Dgk said...

I have updated most of the tables to reflect similar marker overlaps. The pattern remains the same with most Kurds sharing more drift with Steppe Eneolithic and Yamnaya than other W or S Asians. I had to prune back to 80-100K to increase overlaps, so keep in mind Z scores will be likely higher with 400K comparisons.

Here is one comparing shared drift with Iraqi Kurd C2. The table is sorted, showing that all moderns (except for a Kurd with similar D)share less drift with Eneolithic than Kurd C2. Most of the tables at have been revised to reflect marker overlaps. A couple remain including the graph which will be updated tonight.

EHG .Kurd_C2 Steppe_Eneolithic Chimp 0.0879 10.609 78015
Scythian_IA .Kurd_C2 Steppe_Eneolithic Chimp 0.0375 4.28 74756
MA1 .Kurd_C2 Steppe_Eneolithic Chimp 0.0361 3.73 57027
Andronovo .Kurd_C2 Steppe_Eneolithic Chimp 0.0357 5.17 78594
.Znertu .Kurd_C2 Steppe_Eneolithic Chimp 0.0001 0.012 78450

Tajik .Kurd_C2 Steppe_Eneolithic Chimp 0 -0.007 78762
Kalash .Kurd_C2 Steppe_Eneolithic Chimp -0.0012 -0.205 78762
Karitiana .Kurd_C2 Steppe_Eneolithic Chimp -0.0028 -0.378 78762
.Kurd_C3 .Kurd_C2 Steppe_Eneolithic Chimp -0.0031 -0.451 78699
Pashtun_Afghan .Kurd_C2 Steppe_Eneolithic Chimp -0.0045 -0.761 74764
Pathan .Kurd_C2 Steppe_Eneolithic Chimp -0.0058 -1.036 78762
.Kurd_F3 .Kurd_C2 Steppe_Eneolithic Chimp -0.0074 -0.978 78642
.Kurd_F4 .Kurd_C2 Steppe_Eneolithic Chimp -0.0075 -0.98 78462
.Mfa .Kurd_C2 Steppe_Eneolithic Chimp -0.0075 -1.001 77586
.Kurd_F5 .Kurd_C2 Steppe_Eneolithic Chimp -0.008 -1.062 78645
.Kurd_F1 .Kurd_C2 Steppe_Eneolithic Chimp -0.0081 -1.079 78644
.Kurd_F2 .Kurd_C2 Steppe_Eneolithic Chimp -0.0097 -1.301 78516
.Sein .Kurd_C2 Steppe_Eneolithic Chimp -0.0105 -1.445 78115
.Parasar .Kurd_C2 Steppe_Eneolithic Chimp -0.011 -1.445 75505
Iranian_Mazandarani .Kurd_C2 Steppe_Eneolithic Chimp -0.0112 -2 78656
Iranian_Lori .Kurd_C2 Steppe_Eneolithic Chimp -0.0115 -2.023 78656
.Kurd_Ezidi .Kurd_C2 Steppe_Eneolithic Chimp -0.012 -1.604 78035
.NK19191 .Kurd_C2 Steppe_Eneolithic Chimp -0.0128 -1.621 78399
Balochi .Kurd_C2 Steppe_Eneolithic Chimp -0.0141 -2.551 78762
Iranian_Shirazi .Kurd_C2 Steppe_Eneolithic Chimp -0.0141 -2.496 78656
.Kurd_C1 .Kurd_C2 Steppe_Eneolithic Chimp -0.0144 -1.949 78581
Iranian .Kurd_C2 Steppe_Eneolithic Chimp -0.0146 -2.585 78762
.Kurd_F6 .Kurd_C2 Steppe_Eneolithic Chimp -0.0155 -2.053 75442
Brahui .Kurd_C2 Steppe_Eneolithic Chimp -0.0167 -3.052 78762
.Kurd_F7 .Kurd_C2 Steppe_Eneolithic Chimp -0.0174 -2.391 78636
.Kurd_F8 .Kurd_C2 Steppe_Eneolithic Chimp -0.0184 -2.542 78449
Iran_Chl .Kurd_C2 Steppe_Eneolithic Chimp -0.0211 -2.707 70028
Makrani .Kurd_C2 Steppe_Eneolithic Chimp -0.0221 -3.994 78762
Iran_LN .Kurd_C2 Steppe_Eneolithic Chimp -0.0258 -2.226 40444
Iranian_Bandari .Kurd_C2 Steppe_Eneolithic Chimp -0.0335 -5.954 78656
Syrian .Kurd_C2 Steppe_Eneolithic Chimp -0.0383 -6.66 78762
Saudi .Kurd_C2 Steppe_Eneolithic Chimp -0.0411 -6.991 78762
Iran_N .Kurd_C2 Steppe_Eneolithic Chimp -0.046 -4.839 65116
Han .Kurd_C2 Steppe_Eneolithic Chimp -0.0477 -7.519 78762
Palliyar .Kurd_C2 Steppe_Eneolithic Chimp -0.0498 -5.708 32376
Natufian .Kurd_C2 Steppe_Eneolithic Chimp -0.0505 -4.074 33184
Kharia .Kurd_C2 Steppe_Eneolithic Chimp -0.0592 -6.728 32376
Onge .Kurd_C2 Steppe_Eneolithic Chimp -0.0644 -6.68 32376
Papuan .Kurd_C2 Steppe_Eneolithic Chimp -0.0836 -12.185 78762
Mota .Kurd_C2 Steppe_Eneolithic Chimp -0.3008 -38.755 78752
Yoruba .Kurd_C2 Steppe_Eneolithic Chimp -0.3156 -54.348 78752

Kurd Dgk said...

Graphs for D ( X, Mota, Eneolithic, Mbuti), and D ( X, Mota, EHG, Mbuti) vs D ( X, Mota, Iran N, Mbuti) posted at

Davidski said...

The Zoroastrians are now in the qpAdm tour of Iran.

Lenny Dykstra said...

"What this suggests is they probably harbor the highest levels of Bronze Age steppe ancestry amongst present-day Iranians, or at least those featured in this analysis."

It appears most populations "east" of the line have not just higher steppe but higher South Indian ancestry than Iran. Might it be possible that Zoroastrians simply have higher South Indian ancestry than typical Iranians?

Davidski said...

Might it be possible that Zoroastrians simply have higher South Indian ancestry than typical Iranians?

The Zoroastrians do have higher South Indian ancestry than other Iranians. However, this not only pushes them down the plot, but also left (or west), because it depresses their Yamnaya statistics.

So if they had no South Indian ancestry, they'd have higher Yamnaya statistics, and they'd be further east on each of the plots.

The highest proportions of South Indian ancestry in this analysis are carried by GujaratiD and Punjabi_Lahore. and they have the most depressed Yamnaya statistics in the analysis.

Kurd Dgk said...


I imagine that they would have less ASI type admixture than the Iranian Bandari and perhaps Shirazi samples, assuming they are Iranian Z.

Could you run :

D (Zoro, X, Onge, Chimp)
D ( Zoro, X, EHG, Chimp)

for X being Iran Bandari, Iran Fars, Iran Shirazi, Iran Lori, Iran Maz,

and the same ones using D (X, Mbuti, Onge/EHG, Chimp) using the various Iranian groups to get absolute drift shared with Onge and EHG

Also perhaps you can run a couple through ADMIXTURE to test recent S Indian drift for the above.

If you are busy, I will do it over the next couple of days after I merge them into my files.

If it turns out they are as ASI shifted as Bandaris and Shirazis, that would need to be explained, because with the Bandaris it is due to merchant trade, but Zoros are secluded, and are quite strict about mixing with non Zoros.

So if it turns out they are more ASI admixed than W Iranians, that wiuld imply slight ASI dilution

Kurd Dgk said...

in W Iranians

Kurd Dgk said...

I just ran the Zoroastrians through a K6 ADMIXTURE run, with an Andamanese based ASI comp, to get an idea of recent drift. As expected, they scored about the same as W Iranians, in the 1-2% range, and less than Bandaris, Shirazis, and Fars, who scored up to 5%. This is expected as they are further inland, and are known to be strict about marriages outside their faith. Bandaris, on the other hand are coastal, and more affected by trade with the sub-continent.

They did score a little higher Iran N though than the other Iranians and Feyli Kurds.

Nirjhar007 said...

The Zoroastrians do have higher South Indian ancestry than other Iranians.

Yes its interesting . I think Zoroastrians (proper) emerged around the mid 2nd millennium bc.

Roughly, Yaz culture can be considered related to them . I talked to a scholar recently , he pointed me :

BMAC culture could not be Zoroastrian yet, because they practiced inhumation, strictly forbidden by Zoroastrianism, instead the Yaz culture has no burials, it can be the sign of the imposition of the new religion, around 1500 BC. That date for Zarathustra just before 1500 BC is not bad, because it is also contemporary with the late Rigveda.

Coldmountains said...

From where in Iran are the Zoroastrians? Yazd? It would be interesting to compare them with muslim Iranians from the same region. I think the Zoroastrians have slightly more Iran_N and Bronze Age admixture than other Iranians here because the Zoroastrians are from a more eastern location than the Iranians here which seem to be from Western Iran where EEF is higher and steppe admixture is lower

Davidski said...

I don't know where these Zoroastrians are from. It might say in the paper.

Anyway, testing for South Asian ancestry in Iran with Onge as references might be problematic because of confounding factors such as recent East Eurasian admixture in Iran.

But I ran some PCA and D-stats with all of the Iranians in my dataset, and I can't see any obvious South Asian admixture in the Zoroastrians relative to the other Iranians.

They definitely have more steppe ancestry than the other Iranians, but in fact probably less South Asian ancestry.

Seinundzeit said...

One wonders why steppe ancestry is so much lower in the Iranian plateau, compared to South Central Asia?

I mean, it seems the maximum for Iran is around 20% with formal methods (these Zoroastrians), compared to around 60% for South Central Asia (the Pamiri ethnic groups, with qpAdm).

Even the Balochistanis, who seem to represent the least steppe-admixed cluster of South Central Asians, are probably like 30%, exceeding any population from further west in Iran.

I'm assuming that the dynamics of linguistic change were rather distinct between both regions, as was the manner of diffusion? It would be nice if someone with a handle on the archaeological/historical details could chime in.

(For what it's worth, I think the Pashtun/Kalash/Kohistani/Nuristani cluster of South Central Asians involves populations which are around 45% BA steppe/Eastern European, if we assume that the Pamiri South Central Asians are around 60%. Although, the Afghan/Pakistani group could be as low as 30% steppe-admixed, or as high as 60%. The estimates seem somewhat volatile, depending on method. I think we need better reference populations, and perhaps better methods. I guess all we can say is that the percentage is very far from trivial, and probably high)

Davidski said...

What my new K7 suggests is that the formal stats might be confusing to some degree ancient eastern Caspian stuff with steppe ancestry for South Central and especially South Asians.

It's a pity that Iran Hotu is such a limited sequence and that we don't yet have any Neolithic samples from Turkmenistan or India. Obviously they'll clear up a lot in regards to this issue.

But once all of the data is in and the dust settles, nothing major will change, except some of the details.

Seinundzeit said...

I think that makes a lot of sense.

Your K7 test seems very robust, and it requires an ancestral population very similar to Iran Hotu, but with more ANE, in the case of scheduled caste South Indians (since they have no Villabruna ancestry outside the "Basal-rich" cluster, yet possess a substantial helping of ANE).

It'll be very interesting to see what those samples from Turkmenistan and India demonstrate.

But definitely, even with the possibility of ANE-rich Central Asian forager ancestry for South Asians, models based on the K7 test still have Pashtuns/Kalash at around 35% steppe-admixed, and the Pamiri populations at around 50%, so the main narrative has totally been established.

Basically, South Central Asians are a fusion of something very closely related to Neolithic Iran/Iran Hotu + steppe/ancient Eastern European ancestry, with the addition of approximately 10% ENA (if ASI really is ENA). I'm sure you are absolutely right, nothing radically different from that will be shown over the next few years.

Kurd Dgk said...


Just as I suspected they are less Onge/Andamanese shifted that the majority of Iranians and Kurds. To remove any confounding of any recent E Asian into Kurds or Iranians, I compared against Mbuti, who are neutral with regards to recent E Asian, or basal in comparison to Iranians and Kurds

The table is sorted with most Onge shared drift on top

GujaratiD Mbuti.DG Onge Chimp 0.3672 67.872 43883
Iran_LN Mbuti.DG Onge Chimp 0.3567 29.708 20064
GujaratiA Mbuti.DG Onge Chimp 0.3558 68.08 43883
Pathan Mbuti.DG Onge Chimp 0.3535 70.121 43883
.Farid Mbuti.DG Onge Chimp 0.3473 50.767 43720
Balochi Mbuti.DG Onge Chimp 0.344 68.061 43883
Brahui Mbuti.DG Onge Chimp 0.3421 67.557 43883
.Kurd_C2 Mbuti.DG Onge Chimp 0.3412 50.678 43798
.Kurd_F5 Mbuti.DG Onge Chimp 0.3403 49.458 43810
.Kurd_F2 Mbuti.DG Onge Chimp 0.3396 48.33 43751
Iran_Fars Mbuti.DG Onge Chimp 0.3378 66.646 43877
Iranian Mbuti.DG Onge Chimp 0.3371 66.585 43883
.Kurd_Ezidi Mbuti.DG Onge Chimp 0.3368 48.758 43586
Iran_ChL Mbuti.DG Onge Chimp 0.3365 48.743 40590
.Kurd_F7 Mbuti.DG Onge Chimp 0.3356 48.638 43806
Iran_recent Mbuti.DG Onge Chimp 0.3356 37.168 31277
.Kurd_F1 Mbuti.DG Onge Chimp 0.3352 46.878 43798
.Zara Mbuti.DG Onge Chimp 0.3351 47.333 43814
Makrani Mbuti.DG Onge Chimp 0.3349 65.665 43883
.Kurd_C3 Mbuti.DG Onge Chimp 0.3347 48.906 43836
.Kurd_F3 Mbuti.DG Onge Chimp 0.3345 46.771 43804
Iran_Zoroastrian Mbuti.DG Onge Chimp 0.3336 63.868 43877
.Mfa Mbuti.DG Onge Chimp 0.3329 47.678 43313
.Kurd_F8 Mbuti.DG Onge Chimp 0.3327 47.093 43684
.Kurd_F4 Mbuti.DG Onge Chimp 0.3311 48.078 43779
Iran_N_WC1 Mbuti.DG Onge Chimp 0.3307 38.932 27629
Iranian_Bandari Mbuti.DG Onge Chimp 0.3306 64.371 43883
.Kurd_F6 Mbuti.DG Onge Chimp 0.3297 45.481 42529
Iran_N Mbuti.DG Onge Chimp 0.3292 36.99 33908
.Kurd_C1 Mbuti.DG Onge Chimp 0.3254 44.397 43768

Keep in mind their may be some minor re-shuffling of ranks at higher SNPs

Kurd Dgk said...

With one to ones against Andamanese, table is sorted with most shared drift with Andamanese vs Zoros on top. Anything with -ve D, shares more drift with Andamanese than Zoros.

Ancient Iranian samples may be not be comparable due to lower SNPs. Some non-significant Z will become significant at high SNPs

Iran_Zoroastrian GujaratiD Andamanese Chimp -0.036 -10.978 43877
Iran_Zoroastrian GujaratiA Andamanese Chimp -0.0266 -8.326 43877
Iran_Zoroastrian Pathan Andamanese Chimp -0.0207 -9.208 43877
Iran_Zoroastrian Iran_LN Andamanese Chimp -0.015 -1.352 20059
Iran_Zoroastrian Balochi Andamanese Chimp -0.0101 -4.46 43877
Iran_Zoroastrian .Kurd_C2 Andamanese Chimp -0.0091 -1.669 43792
Iran_Zoroastrian .Farid Andamanese Chimp -0.0091 -1.55 43714
Iran_Zoroastrian .Halgurd Andamanese Chimp -0.0086 -1.382 43734
Iran_Zoroastrian Brahui Andamanese Chimp -0.0074 -3.403 43877
Iran_Zoroastrian .Kurd_F2 Andamanese Chimp -0.0065 -1.122 43745
Iran_Zoroastrian .Kurd_F5 Andamanese Chimp -0.0053 -0.955 43804
Iran_Zoroastrian .Kurd_F1 Andamanese Chimp -0.0035 -0.623 43792
Iran_Zoroastrian .Kurd_C3 Andamanese Chimp -0.0033 -0.599 43830
Iran_Zoroastrian Iran_Fars Andamanese Chimp -0.0018 -0.884 43877
Iran_Zoroastrian Iranian Andamanese Chimp -0.0013 -0.711 43877
Iran_Zoroastrian .Kurd_F3 Andamanese Chimp -0.001 -0.172 43798
Iran_Zoroastrian .Kurd_F7 Andamanese Chimp -0.0004 -0.081 43800
Iran_Zoroastrian Iran_ChL Andamanese Chimp -0.0002 -0.035 40584
Iran_Zoroastrian .Kurd_F8 Andamanese Chimp 0.0001 0.019 43679
Iran_Zoroastrian .Kurd_Ezidi Andamanese Chimp 0.0014 0.237 43580
Iran_Zoroastrian .Zara Andamanese Chimp 0.0019 0.303 43808
Iran_Zoroastrian Makrani Andamanese Chimp 0.0025 1.159 43877
Iran_Zoroastrian Iran_recent Andamanese Chimp 0.0026 0.3 31273
Iran_Zoroastrian Iran_N_WC1 Andamanese Chimp 0.0029 0.383 27623
Iran_Zoroastrian .Kurd_F4 Andamanese Chimp 0.0037 0.643 43773
Iran_Zoroastrian .Mfa Andamanese Chimp 0.0043 0.797 43307
Iran_Zoroastrian .Kurd_F6 Andamanese Chimp 0.007 1.137 42524
Iran_Zoroastrian .Kurd_C1 Andamanese Chimp 0.0084 1.446 43762
Iran_Zoroastrian Iranian_Bandari Andamanese Chimp 0.0104 3.99 43877
Iran_Zoroastrian Iran_N Andamanese Chimp 0.0159 1.921 33902

Kurd Dgk said...


After reading some of the recent papers referencing Neolithic Iranians, some were left with the impression that Zoroastrians were more Iran N shifted than Kurds. These dstats show this not to be the case. Whereas they do show them to share more drift with Iran N than Balochis, Brahuis, and Iranians this is not the case when it comes to Kurds, as these show the majority of Kurds to be more Iran N shifted than Zoros. I guess too bad Kurds were not part of the studies.

For me this is not too surprising as Iran N was recovered from what is known as Kurdistan.

This is sorted with most shared drift relative to Zoros on top. Anything -ve shares more drift with Iran N than Zoros. Refer to my comments on Z significance and no of SNPs

Iran_Zoroastrian Iran_LN Iran_N Chimp -0.0664 -6.773 43383
Iran_Zoroastrian Iran_N_WC1 Iran_N Chimp -0.0494 -7.058 51925
Iran_Zoroastrian Iran_ChL Iran_N Chimp -0.0238 -4.592 77551
Iran_Zoroastrian Iran_recent Iran_N Chimp -0.0078 -0.998 67224
Iran_Zoroastrian .Kurd_C2 Iran_N Chimp -0.0073 -1.228 80096
Iran_Zoroastrian .Kurd_F8 Iran_N Chimp -0.0059 -1.093 79920
Iran_Zoroastrian .Kurd_F5 Iran_N Chimp -0.004 -0.687 80126
Iran_Zoroastrian .Kurd_F3 Iran_N Chimp -0.0027 -0.443 80125
Iran_Zoroastrian .Kurd_F2 Iran_N Chimp -0.0026 -0.452 79981
Iran_Zoroastrian .Kurd_F6 Iran_N Chimp -0.0025 -0.411 77111
Iran_Zoroastrian .Kurd_F4 Iran_N Chimp -0.0024 -0.393 80009
Iran_Zoroastrian .Kurd_F7 Iran_N Chimp -0.002 -0.354 80113
Iran_Zoroastrian .Kurd_C3 Iran_N Chimp -0.0018 -0.308 80164
Iran_Zoroastrian .Mfa Iran_N Chimp -0.0008 -0.14 79125
Iran_Zoroastrian Iranian Iran_N Chimp 0.0009 0.473 80251
Iran_Zoroastrian .Zara Iran_N Chimp 0.0028 0.439 80128
Iran_Zoroastrian Iran_Fars Iran_N Chimp 0.0035 1.622 80251
Iran_Zoroastrian Balochi Iran_N Chimp 0.0038 1.577 80251
Iran_Zoroastrian .Kurd_Ezidi Iran_N Chimp 0.0048 0.778 79573
Iran_Zoroastrian Brahui Iran_N Chimp 0.0048 2.035 80251
Iran_Zoroastrian .Kurd_F1 Iran_N Chimp 0.0051 0.881 80116
Iran_Zoroastrian .Kurd_C1 Iran_N Chimp 0.0059 1.053 80036
Iran_Zoroastrian Pathan Iran_N Chimp 0.0067 2.85 80251
Iran_Zoroastrian Makrani Iran_N Chimp 0.0076 3.157 80251
Iran_Zoroastrian GujaratiA Iran_N Chimp 0.0095 2.909 80251
Iran_Zoroastrian .Farid Iran_N Chimp 0.0125 2.022 79978
Iran_Zoroastrian Iranian_Bandari Iran_N Chimp 0.0182 7.04 80251
Iran_Zoroastrian GujaratiD Iran_N Chimp 0.0221 6.646 80251

Kurd Dgk said...

To ensure any recent admixture is not confounding the stats, I re-ran with comparison to Mbuti. We see the same pattern as above with Kurds sharing more Iran N drift than Zoros.

Sorted with most shared drift on top

Iran_LN Mbuti.DG Iran_N Chimp 0.4185 47.402 43416
Iran_N_WC1 Mbuti.DG Iran_N Chimp 0.4002 55.176 51961
Iran_ChL Mbuti.DG Iran_N Chimp 0.3882 66.789 77590
Iran_recent Mbuti.DG Iran_N Chimp 0.3767 49.836 67261
.Kurd_F5 Mbuti.DG Iran_N Chimp 0.3742 57.832 80165
.Kurd_C2 Mbuti.DG Iran_N Chimp 0.374 60.192 80135
.Kurd_F8 Mbuti.DG Iran_N Chimp 0.3727 60.496 79958
.Kurd_F2 Mbuti.DG Iran_N Chimp 0.371 58.831 80020
.Kurd_F6 Mbuti.DG Iran_N Chimp 0.3709 54.683 77148
.Kurd_F7 Mbuti.DG Iran_N Chimp 0.3709 57.588 80152
.Mfa Mbuti.DG Iran_N Chimp 0.3709 58.681 79163
.Kurd_F4 Mbuti.DG Iran_N Chimp 0.3708 59.518 80048
.Kurd_C3 Mbuti.DG Iran_N Chimp 0.3707 60.192 80203
.Kurd_F3 Mbuti.DG Iran_N Chimp 0.3697 56.528 80164
Iran_Zoroastrian Mbuti.DG Iran_N Chimp 0.3693 77.91 80251
Iranian Mbuti.DG Iran_N Chimp 0.3687 79.472 80290
.Kurd_Ezidi Mbuti.DG Iran_N Chimp 0.3673 55.042 79611
Balochi Mbuti.DG Iran_N Chimp 0.3668 78.165 80290
.Zara Mbuti.DG Iran_N Chimp 0.3667 56.252 80167
Iran_Fars Mbuti.DG Iran_N Chimp 0.3666 76.629 80251
Brahui Mbuti.DG Iran_N Chimp 0.3664 78.295 80290
.Kurd_C1 Mbuti.DG Iran_N Chimp 0.3655 58.4 80074
.Kurd_F1 Mbuti.DG Iran_N Chimp 0.3655 59.232 80155
Makrani Mbuti.DG Iran_N Chimp 0.3647 76.115 80290
Pathan Mbuti.DG Iran_N Chimp 0.3646 75.766 80290
GujaratiA Mbuti.DG Iran_N Chimp 0.3623 73.07 80290
.Farid Mbuti.DG Iran_N Chimp 0.3618 54.98 80017
Iranian_Bandari Mbuti.DG Iran_N Chimp 0.3573 75.073 80290
GujaratiD Mbuti.DG Iran_N Chimp 0.3534 68.975 80290

Kurd Dgk said...

With regards to shared drift with EHG, David is correct, that Zoroastrians share more drift with EHG than other Iranians, except for a few Kurds that share a little more EHG drift than Zoroastrians.

Sorted with most EHG shared drift on top

.Kurd_F5 Mbuti.DG EHG Chimp 0.3797 69.856 96718
Pathan Mbuti.DG EHG Chimp 0.3782 93.242 96869
.Mfa Mbuti.DG EHG Chimp 0.3774 68.387 95547
.Kurd_C3 Mbuti.DG EHG Chimp 0.3751 70.311 96770
Iran_recent Mbuti.DG EHG Chimp 0.3747 56.095 76426
GujaratiA Mbuti.DG EHG Chimp 0.3745 86.435 96869
Iran_Zoroastrian Mbuti.DG EHG Chimp 0.3745 92.064 96829
.Kurd_F3 Mbuti.DG EHG Chimp 0.3741 63.58 96706
.Kurd_F1 Mbuti.DG EHG Chimp 0.3737 69.487 96704
.Kurd_F2 Mbuti.DG EHG Chimp 0.3737 68.788 96557
Iran_Fars Mbuti.DG EHG Chimp 0.3723 91.466 96829
.Kurd_C2 Mbuti.DG EHG Chimp 0.3718 68.437 96689
.Zara Mbuti.DG EHG Chimp 0.3716 64.396 96721
.Kurd_F6 Mbuti.DG EHG Chimp 0.3711 66.31 93440
.Kurd_F4 Mbuti.DG EHG Chimp 0.3708 67.775 96602
Balochi Mbuti.DG EHG Chimp 0.3707 91.596 96869
Iranian Mbuti.DG EHG Chimp 0.3703 92.256 96869
.Halgurd Mbuti.DG EHG Chimp 0.3699 65.102 96599
.Kurd_Ezidi Mbuti.DG EHG Chimp 0.3695 66.132 96085
.Kurd_F8 Mbuti.DG EHG Chimp 0.3695 67.316 96463
.Kurd_C1 Mbuti.DG EHG Chimp 0.369 68.505 96608
.Kurd_F7 Mbuti.DG EHG Chimp 0.3685 66.363 96701
Brahui Mbuti.DG EHG Chimp 0.3682 90.122 96869
GujaratiD Mbuti.DG EHG Chimp 0.3669 85.147 96869
Makrani Mbuti.DG EHG Chimp 0.3648 88.257 96869
Iran_LN Mbuti.DG EHG Chimp 0.3646 44.183 48905
Iran_ChL Mbuti.DG EHG Chimp 0.3623 69.375 92189
.Farid Mbuti.DG EHG Chimp 0.361 63.354 96536
Iranian_Bandari Mbuti.DG EHG Chimp 0.3585 85.226 96869
Iran_N_WC1 Mbuti.DG EHG Chimp 0.3429 55.45 62681
Iran_N Mbuti.DG EHG Chimp 0.3404 48.258 79675

Kurd Dgk said...

One to ones, show the same pattern as above. Only -ve Ds share more drift with EHG than Zoroastrians

Iran_Zoroastrian Pathan EHG Chimp -0.0051 -2.772 96829
Iran_Zoroastrian .Kurd_F5 EHG Chimp -0.0049 -0.994 96678
Iran_Zoroastrian .Mfa EHG Chimp -0.0024 -0.506 95508
Iran_Zoroastrian .Kurd_F3 EHG Chimp -0.0008 -0.165 96666
Iran_Zoroastrian .Kurd_C3 EHG Chimp -0.0006 -0.128 96730
Iran_Zoroastrian GujaratiA EHG Chimp -0.0005 -0.196 96829
Iran_Zoroastrian .Kurd_F2 EHG Chimp 0.0005 0.092 96517
Iran_Zoroastrian .Kurd_F1 EHG Chimp 0.0018 0.386 96664
Iran_Zoroastrian .Kurd_C2 EHG Chimp 0.0024 0.491 96649
Iran_Zoroastrian Iran_recent EHG Chimp 0.0024 0.36 76388
Iran_Zoroastrian Iran_Fars EHG Chimp 0.0034 1.853 96829
Iran_Zoroastrian .Kurd_F6 EHG Chimp 0.0041 0.81 93402
Iran_Zoroastrian .Halgurd EHG Chimp 0.0044 0.854 96559
Iran_Zoroastrian .Kurd_F8 EHG Chimp 0.0051 1.022 96424
Iran_Zoroastrian .Kurd_F4 EHG Chimp 0.0052 0.999 96562
Iran_Zoroastrian Balochi EHG Chimp 0.0056 2.901 96829
Iran_Zoroastrian .Zara EHG Chimp 0.0057 0.986 96681
Iran_Zoroastrian Iranian EHG Chimp 0.0059 3.584 96829
Iran_Zoroastrian .Kurd_F7 EHG Chimp 0.006 1.226 96661
Iran_Zoroastrian .Kurd_Ezidi EHG Chimp 0.0074 1.428 96046
Iran_Zoroastrian .Kurd_C1 EHG Chimp 0.0086 1.73 96569
Iran_Zoroastrian Brahui EHG Chimp 0.0091 4.78 96829
Iran_Zoroastrian GujaratiD EHG Chimp 0.0115 4.158 96829
Iran_Zoroastrian Makrani EHG Chimp 0.0143 7.325 96829
Iran_Zoroastrian Iran_LN EHG Chimp 0.0171 2.052 48872
Iran_Zoroastrian Iran_ChL EHG Chimp 0.0173 3.912 92149
Iran_Zoroastrian .Farid EHG Chimp 0.0203 3.876 96496
Iran_Zoroastrian Iranian_Bandari EHG Chimp 0.0236 10.229 96829
Iran_Zoroastrian Iran_N_WC1 EHG Chimp 0.0342 5.909 62644
Iran_Zoroastrian Iran_N EHG Chimp 0.0435 6.225 79636

Kurd Dgk said...

Since the above indicate that Kurds share more drift with Iran N than Zoroastrians or Balochis do, so then what is it that we are seeing with ADMIXTURE, where Zoroastrians and Balochis score higher Iran N than the above.

Well, ADMIXTURE is sensitive to recent drift, so I suppose that it is possible that some recent geneflow into Kurds is dampening their Iran N scores. The other possibility may be that certain drifted groups due to inbreeding or isolation, such as Balochis Brahuis and Zoroastrians may be highjacking the Iran N component. I have observed this in other situations, so it may be a good idea to run ADMIXTURE, by removing all Baloch, Brahui, and Zoro samples, and just leave 1 of each in the run, and see what happens.

Chad Rohlfsen said...

Any SSA or ENA will depress stats. So, one could share more drift yet have less ancestry from a pop. It depends on the rest of the ancestry.

Shaikorth said...

Kurds C2 and F5 are more shifted towards Iran_N and Andamanese alike compared to Zoroastrians. Kurd F5 is to EHG as well. But do Zoroastrians actually have SSA compared to these Kurds?

MomOfZoha said...

@Kurd Dgk:

If you have any Talysh samples (from Iran or Azerbaijan), I have a feeling they will be even more interesting to compare for many reasons: They probably don't have much recent geneflow from other groups, they are in the vicinity of Mazandaran while also being related to ancestors of some Kurdish groups, and their language is claimed to highly resemble the language of the Avesta. I know some small Talysh villages which to this day have sacred trees despite their nominal Shi'ism.

@Davidski: Thank you so much for your analyses as well as the links to the datasets.

Kurd said...

@ Shaikorth

Actually, according to my Iran N K6 calculator, the Zoroastrians have an average SSA lower than Kurds and Baloch. Here are the averages:

Iran Fars 2.23% 49.29% 37.96% 6.31% 3.10% 1.10%
Iran Zoroastrian 0.92% 53.27% 36.82% 7.06% 1.09% 0.85%
Kurd Feyli 1.12% 50.96% 40.40% 4.52% 1.56% 1.44%
Kurd C 3.77% 47.20% 40.41% 4.55% 2.72% 1.35%
Balochi 8.70% 64.02% 17.24% 6.04% 2.93% 1.07%
Brahui 8.27% 64.63% 17.45% 5.56% 2.49% 1.60%

Kurd C3 shows SSA of 2.31%, but is still in the top ranks for the comparisons

Kurd said...

@ MontofZoha

I don't have Talysh samples

MfA said...

David, There should be 2 Kurdish samples from Armenia labeled as Kurd_WGA available on Human Origins dataset.

Shaikorth said...

So it looks like neither SSA or ENA is depressing the Zoroastrian stats relative to Kurds. Would be nice to test the SSA with D-stats to be certain.

Chad Rohlfsen said...

A direct comparison Chimp Mbuti/Yoruba X Zoroastrian, might work better.

Davidski said...

Basal-rich K7/nMonte models.

It's interesting that the Kurdish model shows a poor fit. But as expected, the Zoroastrians appear to have the highest proportion of Steppe_EMBA ancestry.

Iran_Chalcolithic 75.35
Andronovo_Kytmanovo 24.25
Andamanese_Onge 0.25
Han 0.15
Iran_Hotu 0
Iran_Neolithic 0
Papuan 0
Yamnaya-Catacomb_Ulan 0

distance%=5.9562 / distance=0.059562

Iran_Chalcolithic 76.45
Andronovo_Kytmanovo 15.35
Iran_Hotu 5.9
Han 1.15
Andamanese_Onge 0.8
Yamnaya-Catacomb_Ulan 0.35
Iran_Neolithic 0
Papuan 0

distance%=0.3248 / distance=0.003248

Iran_Chalcolithic 75.9
Andronovo_Kytmanovo 15.7
Yamnaya-Catacomb_Ulan 6.15
Iran_Hotu 2.2
Andamanese_Onge 0.05
Han 0
Iran_Neolithic 0
Papuan 0

distance%=0.4042 / distance=0.004042

MfA said...

You need to add referances like Armenian_EBA, Iran_IA to accommodate Anatolian_Neolithic ancestry in Kurds, majority of Kurds live outside the Iranian Plateau. That's the main difference between Kurds and rest of the West Iranics.

Simon_W said...

On the subject of Mazandaranis being the modern population that's closest to Iran_N, check out this bunch of Mazandaranis here: They strike me as quite Italian-looking, or rather on the less north European side of Italians.

Davidski said...

Adding Iran Iron Age F38 to the models for the West Iranian groups does improve their fits and makes things very interesting, especially for the Kurds. I posted these new models in the K7 post here...

Also, I updated the K7 spreadsheet.

MfA said...

To me those numbers make sense, but would like to see DSTATS confirmation as well. Corduene people from 5 BC also gonna look like F38, related little kingdoms in the mountainous zome from south of Lake Van to Ilam.

Kurd Dgk said...

There are a couple of problems using monte, and using IA F38 to infer steppe ancestry in modern W Asians.

First, IA F38 is itself likely steppe admixed, and second, monte is ultimatele dependent on output from ADMIXTURE, which is a not an accurate tool for inferring ancient admixture in a modern, for a variety of reasons. So qpAdm trumps ADMIXTURE, because the latter compares genomes SNP by SNP, and outputs standard errors as well as fits.

I will be posting qpAdm fits for Zoroastrians as well as some Kurds later today. The fits obtained for some Kurds using Afansievo and steppe eneolithic were better for some Kurds than for Zoroastrians, although the steppe admixture proportions were comparable

Kurd Dgk said...

QpAdm models using Iran N Anatolia N and various Steppe pops for Kurds and Zoroastrians are posted at

They appear to be consistent with qpAdm models showing about 20% EHG ancestry in Kurds