First came the Indo-Aryans, probably in a couple of waves. Historical linguistics and archeology tell us that they originated on the Trans-Urals steppe in the Sintashta-Andronovo horizon, and pushed south around 2,000 BC to establish themselves as the ruling elite over Central Asian agriculturalists, who were probably in large part of West Asian origin.
There are multiple lines of genetic evidence suggesting that this is indeed what happened, which I discussed in detail in several earlier blog posts, like here.
But arguably the easiest way to show it is with D-stats of the form D(Indo-Aryan,Southeast_Asian; X,Outgroup), where the Indo-Aryans are the Kalash, a population isolate from the Hindu Kush with a relatively low level of extra-West Eurasian admixture and speaking an archaic form of Indo-Aryan. The Southeast Asians are the Dai from southern China, one of the best proxies for the South and East Asian admixture in the Kalash, while X represents a wide variety of present-day and ancient populations in my dataset. The top five D-stats, each based on well over 500K SNPs, are listed below:
Kalash Dai Kotias Ju_hoan_North 0.0684 22.704
Kalash Dai Sintashta Ju_hoan_North 0.0632 25.036
Kalash Dai Georgian Ju_hoan_North 0.0625 30.991
Kalash Dai Afanasievo Ju_hoan_North 0.0612 24.496
Kalash Dai Yamnaya_Samara Ju_hoan_North 0.0611 27.97
Really cool results. Obviously, Kotias is the recently published Caucasus hunter-gatherer (CHG) genome. The Kalash appear to carry the highest level of Kotias-related ancestry among present-day populations, which they probably acquired from both the Central Asian agriculturists and Indo-Aryan invaders. At the same time, however, Georgians show the highest affinity to Kotias because they harbor less extra-West Eurasian admixture.
After the Indo-Aryans came the Iranians, in all likelihood also from the steppe. They were either an offshoot of Sintashta-Andronovo or the more westerly Srubnaya Culture. I'd say the D-stats below, of the form D(Eastern_Iranian,Southeast_Asian)(X,Outgroup), are inconclusive, because the differences are small, and the outcome possibly affected by the methodology and/or sampling bias.
Tajik_Shugnan Dai Sintashta Ju_hoan_North 0.0716 26.427
Tajik_Shugnan Dai Poltavka Ju_hoan_North 0.0695 25.234
Tajik_Shugnan Dai Afanasievo Ju_hoan_North 0.0691 24.703
Tajik_Shugnan Dai Srubnaya Ju_hoan_North 0.069 28.266
Tajik_Shugnan Dai Corded_Ware_Germany Ju_hoan_North 0.0684 27.328
But again, the top five results make a lot of sense in the context of historical linguistics and archeology. By the way, Tajik Shugnans are a population isolate in the Pamir Mountains, like the Kalash with low level extra-West Eurasian admixture, and thus likely to be among the best available reference groups for early Eastern Iranians.
Interestingly, based on that list the Shugnans look more European than the Kalash. In large part this might be a reflection of the sharp rise in the level of European-specific Western hunter-gatherer (WHG) admixture on the steppe during the Middle Bronze Age, probably caused by population movements originating at the western edge of the steppe and/or in East Central Europe.
As far as I can tell, the fact that the Shugnans and Kalash have around the same level of extra-West Eurasian admixture means that I can try to hone in on the differences between their steppe-derived ancestry with D-stats of the form D(Kalash,Tajik_Shugnan)(Kotias,X). The top result seems to confirm my hunch, because Loschbour is, of course, a Western hunter-gatherer.
Loschbour 0.0149 3.874
Basque_Spanish 0.0113 4.232
Anatolia_Neolithic 0.0112 4.257
Karelia_HG 0.0105 3.005
Poltavka 0.01 3.539
Corded_Ware_Germany 0.0099 3.734
Afanasievo 0.0094 3.213
Srubnaya 0.0094 3.538
Yamnaya_Kalmykia 0.0091 3.362
Albanian 0.0088 3.419
Altai_IA 0.0088 3.087
Sintashta 0.0088 3.146
Greek 0.0076 3.094
Full output available here
More recently, during historic times, large parts of northern South Asia were settled by the Balochi, a Western Iranian people from the South Caspian region, whose ancestors were probably Indo-Europeanized a couple millennia earlier by Proto-Iranians from the steppe moving west across the Iranian Plateau. D-stats comparing the Balochi to the Kalash and Shugnans, respectively, clearly reflect the Near Eastern origins of the Balochi.
BedouinB 0.0104 6.151
Anatolia_Neolithic 0.0094 5.495
Druze 0.0084 5.228
Cypriot 0.0082 4.839
Syrian 0.0079 4.714
Armenian 0.0063 3.935
Satsurblia 0.0059 2.472
Georgian 0.0055 3.443
Iranian 0.0055 3.345
Abkhasian 0.0053 3.279
Greek 0.0052 3.166
Okunevo -0.0081 -3.552
Karelia_HG -0.0104 -4.666
Full output available here
Satsurblia 0.007 2.078
BedouinB 0.0051 2.277
Basque_Spanish -0.0073 -3.156
Mezhovskaya -0.0085 -3.045
Altai_IA -0.0092 -3.677
Scythian_IA -0.0092 -3.108
Yamnaya_Samara -0.0095 -4.092
Karitiana -0.0098 -3.501
Karasuk -0.0099 -4.322
Andronovo -0.01 -4.09
Sintashta -0.01 -3.951
Corded_Ware_Germany -0.0102 -4.34
Srubnaya -0.0106 -4.605
Yamnaya_Kalmykia -0.011 -4.511
MA1 -0.0118 -3.691
Okunevo -0.0122 -3.783
Poltavka -0.0125 -5.043
Afanasievo -0.0136 -5.235
Loschbour -0.0148 -4.201
Karelia_HG -0.0208 -6.537
Full output available here
In this analysis I used ancient samples from the recently published Jones et al. and Mathieson et al. studies, available on request from the authors and at the Reich lab website here, respectively. The present-day samples are from the Human Origins dataset, also available at the Reich lab website.