search this blog

Saturday, July 5, 2014

Analysis of Mesolithic Swedish forager StoraFörvar11

StoraFörvar11, or SfF11, is a late Mesolithic genome from a cave on the small island of Stora Karlsö, just off the coast of Gotland. It was published earlier this year by Skoglund et al. along with several other ancient genomes dating to the Neolithic from Gotland and mainland Sweden (see here). Belonging to Northeast European-specific mitochondrial haplogroup U5a1, SfF11 appears to be the archytypal Scandinavian forager, with no detectable Neolithic farmer admixture but considerable Ancient North Eurasian (ANE) ancestry related to Upper Paleolithic hunter-gatherers from Siberia, such as MA-1 and AG2 (see here).

Please note, Sf11 was superimposed onto the first Principal Component Analysis (PCA) plot below, which initially only included La Brana-1, the ancient Mesolithic genome from northern Spain, and present-day West Eurasians. I did this to avoid creating a cluster with the two ancient genomes based not on genuine genetic affinities between them but their relatively poor quality. I obtained the PC coordinates for Sf11 from an almost identical 13K SNP PCA plot which can be seen here.

Note also the clear eastern affinity shown by SfF11 relative to La Brana-1, which in all likelihood is the result of the above mentioned shared ANE ancestry with MA-1, featured on the second PCA. To date, all ancient genomes from Western and Central Europe have basically lacked this admixture, while Scandinavian hunter-gatherers carried it at levels of 15-19%. As hypothesized by Lazaridis et al. 2013, it's likely that Eastern European hunter-gatherers harbored even greater levels of ANE, and it's probably a good bet that they introduced it into Scandinavia during and/or before the Mesolithic.

Below are the Eurogenes K15 ancestry proportions for SfF11, and below that the 4 Ancestors Oracle results. Even though the K15 test was based on just 8K SNPs, the outcome appears robust, and correlates closely with results from more sophisticated formal mixture tests in scientific literature, in which European hunter-gatherers show a strong relationship to present-day East Baltic populations, especially Lithuanians. Moreover, among the best 4-way Oracle fits for SfF11 is 3/4 La Brana-1 and 1/4 MA-1, which is extremely close to the actual genetic structure of Scandinavian foragers: around 80% Western European Hunter-Gatherer (WHG) and around 20% ANE.

The unusually high South and Southeast Asian scores can probably be explained by shared ANE ancestry with South Asians and lack of the so called Basal Eurasian admixture, respectively. Indeed, the latter is a very good bet considering the complete absence of any sort of Mediterranean and Near Eastern signals in these results.

Eurogenes K15

Baltic 29.24
North_Sea 23.97
Eastern_Euro 23.23
Southeast_Asian 5.97
Atlantic 5.62
Amerindian 4.52
South_Asian 4.36
Oceanian 2.17
Northeast_African 0.58
Siberian 0.34
West_Med 0
West_Asian 0
East_Med 0
Red_Sea 0
Sub-Saharan 0

4 Ancestors Oracle

Least-squares method.

Using 1 population approximation:
1 Estonian @ 14.153281
2 Erzya @ 14.620788
3 Kargopol_Russian @ 14.700492
4 Southwest_Russian @ 15.448751
5 Ukrainian @ 15.825631
6 Lithuanian @ 15.842059
7 Ukrainian_Belgorod @ 16.110345
8 East_Finnish @ 16.435534
9 Belorussian @ 16.531115
10 Ukrainian_Lviv @ 16.638975
11 Estonian_Polish @ 16.671571
12 Polish @ 17.379799
13 South_Polish @ 17.805012
14 Russian_Smolensk @ 17.812963
15 Finnish @ 18.279374
16 La_Brana-1 @ 19.903407
17 Southwest_Finnish @ 21.942936
18 Moldavian @ 23.158096
19 Croatian @ 23.266324
20 Hungarian @ 24.020402

Using 2 populations approximation:
1 Erzya+Estonian @ 12.292066
2 Estonian+Kargopol_Russian @ 13.190123
3 Erzya+La_Brana-1 @ 13.192429
4 Erzya+Lithuanian @ 13.414829
5 Erzya+Ukrainian @ 13.440955
6 Erzya+Ukrainian_Lviv @ 13.540859
7 Erzya+Finnish @ 13.602815
8 East_Finnish+Lithuanian @ 13.693698
9 Kargopol_Russian+Lithuanian @ 13.735122
10 Estonian+Southwest_Russian @ 13.994994
11 East_Finnish+Erzya @ 14.077424
12 Estonian+Ukrainian_Belgorod @ 14.113102
13 Kargopol_Russian+Ukrainian @ 14.126683
14 Estonian+Estonian @ 14.153281
15 Belorussian+Erzya @ 14.180946
16 Erzya+Southwest_Russian @ 14.186181
17 Kargopol_Russian+Ukrainian_Lviv @ 14.247527
18 Estonian+Ukrainian @ 14.247854
19 Erzya+Polish @ 14.291491
20 Estonian+Lithuanian @ 14.31161

Using 3 populations approximation:
1 50% Estonian +25% Lithuanian +25% MA-1 @ 11.982448
2 50% Lithuanian +25% Estonian +25% MA-1 @ 12.169832
3 50% Estonian +25% Estonian +25% MA-1 @ 12.225538
4 50% Erzya +25% Estonian +25% La_Brana-1 @ 12.250755
5 50% Erzya +25% Estonian +25% Estonian @ 12.292066
6 50% Lithuanian +25% La_Brana-1 +25% MA-1 @ 12.473574
7 50% Erzya +25% La_Brana-1 +25% Lithuanian @ 12.480595
8 50% Lithuanian +25% Finnish +25% MA-1 @ 12.547096
9 50% Erzya +25% Estonian +25% Ukrainian_Lviv @ 12.657215
10 50% Erzya +25% Estonian +25% Ukrainian @ 12.660239
11 50% Erzya +25% Estonian +25% Lithuanian @ 12.661794
12 50% Estonian +25% Erzya +25% Kargopol_Russian @ 12.679962
13 50% Erzya +25% Erzya +25% La_Brana-1 @ 12.695461
14 50% Erzya +25% La_Brana-1 +25% Ukrainian @ 12.707643
15 50% Estonian +25% Erzya +25% Estonian @ 12.716859
16 50% Erzya +25% Finnish +25% Lithuanian @ 12.72455
17 50% Erzya +25% Estonian +25% Finnish @ 12.737834
18 50% Erzya +25% La_Brana-1 +25% Ukrainian_Lviv @ 12.753404
19 50% Lithuanian +25% Lithuanian +25% MA-1 @ 12.768751
20 50% Estonian +25% Belorussian +25% MA-1 @ 12.780747

Using 4 populations approximation:
1 Estonian+Estonian+Lithuanian+MA-1 @ 11.982448
2 Estonian+Lithuanian+Lithuanian+MA-1 @ 12.169832
3 Estonian+Estonian+Estonian+MA-1 @ 12.225538
4 Erzya+Erzya+Estonian+La_Brana-1 @ 12.250755
5 Erzya+Erzya+Estonian+Estonian @ 12.292066
6 Estonian+La_Brana-1+Lithuanian+MA-1 @ 12.434074
7 La_Brana-1+Lithuanian+Lithuanian+MA-1 @ 12.473574
8 Erzya+Erzya+La_Brana-1+Lithuanian @ 12.480595
9 Finnish+Lithuanian+Lithuanian+MA-1 @ 12.547096
10 Erzya+Erzya+Estonian+Ukrainian_Lviv @ 12.657215
11 Erzya+Erzya+Estonian+Ukrainian @ 12.660239
12 Estonian+Lithuanian+MA-1+Ukrainian @ 12.66118
13 Erzya+Erzya+Estonian+Lithuanian @ 12.661794
14 Erzya+Estonian+Estonian+Kargopol_Russian @ 12.679962
15 Erzya+Erzya+Erzya+La_Brana-1 @ 12.695461
16 Estonian+Lithuanian+MA-1+Ukrainian_Lviv @ 12.697136
17 Erzya+Erzya+La_Brana-1+Ukrainian @ 12.707643
18 Erzya+Estonian+Estonian+Estonian @ 12.716859
19 Erzya+Erzya+Finnish+Lithuanian @ 12.72455
20 Erzya+Erzya+Estonian+Finnish @ 12.737834
21 Estonian+Finnish+Lithuanian+MA-1 @ 12.746305
22 Erzya+Erzya+La_Brana-1+Ukrainian_Lviv @ 12.753404
23 Lithuanian+Lithuanian+Lithuanian+MA-1 @ 12.768751
24 Belorussian+Estonian+Estonian+MA-1 @ 12.780747
25 Estonian+Estonian+MA-1+Ukrainian @ 12.797031
26 Estonian+Estonian+La_Brana-1+MA-1 @ 12.807529
27 Erzya+Estonian+Estonian+Ukrainian @ 12.813496
28 Estonian+Estonian+MA-1+Ukrainian_Lviv @ 12.822931
29 Erzya+Estonian+Kargopol_Russian+La_Brana-1 @ 12.831473
30 Erzya+Estonian+Estonian+Lithuanian @ 12.839613
31 Chuvash+Estonian+Estonian+Lithuanian @ 12.851803
32 Belorussian+Estonian+Lithuanian+MA-1 @ 12.855733
33 Erzya+Estonian+Estonian+Ukrainian_Lviv @ 12.857349
34 East_Finnish+Erzya+Estonian+Lithuanian @ 12.875013
35 Erzya+Estonian+Kargopol_Russian+Lithuanian @ 12.901956
36 Erzya+Estonian+La_Brana-1+Lithuanian @ 12.90565
37 Erzya+Kargopol_Russian+La_Brana-1+Lithuanian @ 12.914481
38 Erzya+Estonian+Estonian+La_Brana-1 @ 12.921321
39 Erzya+Estonian+Estonian+Southwest_Russian @ 12.931952
40 Lithuanian+Lithuanian+MA-1+Ukrainian @ 12.932804

Gaussian method.

Using 1 population approximation:
1 East_Finnish @ 12.111642
2 Finnish @ 12.136433
3 Tatar @ 12.260871
4 Chuvash @ 12.287812
5 Kargopol_Russian @ 13.238854
6 Erzya @ 13.290701
7 Ukrainian @ 14.224517
8 North_Swedish @ 14.501487
9 Mari @ 14.582022
10 La_Brana-1 @ 15.102585
11 Ukrainian_Lviv @ 15.466692
12 Moldavian @ 16.561361
13 Ukrainian_Belgorod @ 16.829215
14 Southwest_Finnish @ 17.044556
15 Southwest_Russian @ 17.644306
16 Estonian_Polish @ 17.912619
17 Swedish @ 18.055712
18 Estonian @ 18.417704
19 Hungarian @ 18.442869
20 Lithuanian @ 18.500045

Using 2 populations approximation:
1 La_Brana-1+Mari @ 9.086839
2 Kargopol_Russian+La_Brana-1 @ 9.216681
3 La_Brana-1+MA-1 @ 9.529079
4 Chuvash+La_Brana-1 @ 9.628936
5 Erzya+La_Brana-1 @ 9.741056
6 La_Brana-1+Tatar @ 10.312023
7 East_Finnish+La_Brana-1 @ 10.369729
8 Chuvash+Estonian @ 10.38245
9 Estonian+La_Brana-1 @ 10.698394
10 Chuvash+Finnish @ 10.701826
11 Estonian+Tatar @ 10.72273
12 Estonian+Shors @ 10.734028
13 Chuvash+Lithuanian @ 10.781409
14 Chuvash+East_Finnish @ 10.832523
15 Chuvash+Kargopol_Russian @ 11.058841
16 Finnish+Tatar @ 11.078731
17 East_Finnish+Tatar @ 11.104768
18 Lithuanian+Shors @ 11.131471
19 Chuvash+Ukrainian @ 11.241182
20 Estonian+Hakas @ 11.257456

Using 3 populations approximation:
1 50% La_Brana-1 +25% Estonian +25% MA-1 @ 6.880967
2 50% La_Brana-1 +25% La_Brana-1 +25% MA-1 @ 7.035486
3 50% La_Brana-1 +25% Lithuanian +25% MA-1 @ 7.1341
4 50% Estonian +25% La_Brana-1 +25% MA-1 @ 7.18973
5 50% La_Brana-1 +25% East_Finnish +25% MA-1 @ 7.57191
6 50% La_Brana-1 +25% Finnish +25% MA-1 @ 7.600389
7 50% Lithuanian +25% La_Brana-1 +25% MA-1 @ 7.628929
8 50% La_Brana-1 +25% Estonian_Polish +25% MA-1 @ 7.697983
9 50% La_Brana-1 +25% Belorussian +25% MA-1 @ 7.70291
10 50% La_Brana-1 +25% Kargopol_Russian +25% MA-1 @ 7.781779
11 50% La_Brana-1 +25% MA-1 +25% Southwest_Finnish @ 7.798672
12 50% La_Brana-1 +25% Erzya +25% MA-1 @ 7.80171
13 50% La_Brana-1 +25% MA-1 +25% Polish @ 7.929863
14 50% La_Brana-1 +25% MA-1 +25% Southwest_Russian @ 7.935151
15 50% La_Brana-1 +25% MA-1 +25% Russian_Smolensk @ 8.031297
16 50% La_Brana-1 +25% MA-1 +25% North_Swedish @ 8.049602
17 50% La_Brana-1 +25% MA-1 +25% Ukrainian_Belgorod @ 8.049701
18 50% La_Brana-1 +25% MA-1 +25% Ukrainian @ 8.06409
19 50% La_Brana-1 +25% MA-1 +25% South_Polish @ 8.188305
20 50% Finnish +25% La_Brana-1 +25% MA-1 @ 8.237496

Using 4 populations approximation:
1 Estonian+La_Brana-1+La_Brana-1+MA-1 @ 6.880967
2 La_Brana-1+La_Brana-1+La_Brana-1+MA-1 @ 7.035486
3 La_Brana-1+La_Brana-1+Lithuanian+MA-1 @ 7.1341
4 Estonian+Estonian+La_Brana-1+MA-1 @ 7.18973
5 Estonian+La_Brana-1+Lithuanian+MA-1 @ 7.414412
6 East_Finnish+La_Brana-1+La_Brana-1+MA-1 @ 7.57191
7 Finnish+La_Brana-1+La_Brana-1+MA-1 @ 7.600389
8 La_Brana-1+Lithuanian+Lithuanian+MA-1 @ 7.628929
9 Estonian+Finnish+La_Brana-1+MA-1 @ 7.689347
10 Estonian_Polish+La_Brana-1+La_Brana-1+MA-1 @ 7.697983
11 Belorussian+La_Brana-1+La_Brana-1+MA-1 @ 7.70291
12 East_Finnish+Estonian+La_Brana-1+MA-1 @ 7.712903
13 Finnish+La_Brana-1+Lithuanian+MA-1 @ 7.779771
14 Kargopol_Russian+La_Brana-1+La_Brana-1+MA-1 @ 7.781779
15 La_Brana-1+La_Brana-1+MA-1+Southwest_Finnish @ 7.798672
16 Erzya+La_Brana-1+La_Brana-1+MA-1 @ 7.80171
17 East_Finnish+La_Brana-1+Lithuanian+MA-1 @ 7.850763
18 Estonian+Estonian_Polish+La_Brana-1+MA-1 @ 7.890161
19 Belorussian+Estonian+La_Brana-1+MA-1 @ 7.906509
20 Estonian+La_Brana-1+MA-1+Southwest_Finnish @ 7.927839
21 La_Brana-1+La_Brana-1+MA-1+Polish @ 7.929863
22 La_Brana-1+La_Brana-1+MA-1+Southwest_Russian @ 7.935151
23 Estonian+Kargopol_Russian+La_Brana-1+MA-1 @ 7.940811
24 Erzya+Estonian+La_Brana-1+MA-1 @ 7.965223
25 La_Brana-1+Lithuanian+MA-1+Southwest_Finnish @ 7.991558
26 La_Brana-1+Lithuanian+MA-1+North_Swedish @ 8.029449
27 La_Brana-1+La_Brana-1+MA-1+Russian_Smolensk @ 8.031297
28 Belorussian+La_Brana-1+Lithuanian+MA-1 @ 8.038993
29 Estonian_Polish+La_Brana-1+Lithuanian+MA-1 @ 8.046271
30 La_Brana-1+La_Brana-1+MA-1+North_Swedish @ 8.049602
31 La_Brana-1+La_Brana-1+MA-1+Ukrainian_Belgorod @ 8.049701
32 La_Brana-1+La_Brana-1+MA-1+Ukrainian @ 8.06409
33 Estonian+La_Brana-1+MA-1+North_Swedish @ 8.075392
34 Estonian+La_Brana-1+MA-1+Polish @ 8.08945
35 Kargopol_Russian+La_Brana-1+Lithuanian+MA-1 @ 8.100132
36 Estonian+La_Brana-1+MA-1+Southwest_Russian @ 8.108852
37 Erzya+La_Brana-1+Lithuanian+MA-1 @ 8.127814
38 Estonian+La_Brana-1+MA-1+Ukrainian @ 8.153751
39 La_Brana-1+Lithuanian+MA-1+Polish @ 8.17359
40 La_Brana-1+La_Brana-1+MA-1+South_Polish @ 8.188305

The Eurogenes K15 and Alexandr Burnashev's 4 Ancestors Oracle are available for use free of charge at GEDmatch for anyone with genotype data from 23andMe and similar personal genomics companies. Look for the Ad-mix option and then the Eurogenes tab.


Maju said...

Quite interesting analysis, thank you.

The first PCA tell of a much more SW affinity of Bra1 in comparison with SF11 in eigenvector 2 (even if in eigenvector 1 they are both off the mark re. modern populations).

You are certainly right on these Scandinavian HGs, much like Motala, having already important ANE affinity. However both Motala and SF11 are a bit imprecise re. their cultural affinities and personally I'd love if a more clearly Hamburgian-Ahrensburgian-Maglemosean sample was available from areas closer to the North Sea. If this extra ANE affinity could be confirmed for the generality of the NW Epipaleolithic population (what I think likely but so far unconfirmed), much of the extra ANE found in populations like Scots could be safely attributed to their own local Epipaleolithic roots, rather than to later IE migrations.

Tone said...

To add to what Maju was saying regarding Epipaleolithic roots: I've always found it interesting that red hair is found at the highest frequencies in two populations, the Scots in the fringes of Western Europe and the Udmurts on the Volga in the fringes of Eastern Europe. While I understand correlation does not mean causation, it's an interesting clue and I think red hair might show the vestiges of a very ancient ANE population that stretched across the north. Just thinking . . .

About Time said...

It seems confusing if we think of these ancient genomes as modern derbies. But really it's we moderns who are derived + mixed.

Sf11 is more "derived Eurasian" than we are, so seems India/SE Asian like. We probably have more "underived Eurasian" (aka Basal Eurasian) from EEF. That is, modern Europeans are "Mediterraneanized" wrt to Mesolithic Europeans.

IMO it's wrong to think of Basal Eurasian as "African" in the modern sense. It was more underived wrt Africans, but was maybe "Sui Generis" on an evolutionary trajectory towards agriculture. Basal-->Natufian-->(with WHG admixture) EEF-->(with ANE admixture) modern Europeans.

But other Basal branches probably mixed with some Africans, obviously in situ with Arabian HG, wtc. Bringing all of these diverse Eurasian HG branches closer together through Basal admixture --- and probably social selection for ability to function and contribute in Neolithic complex societies. Just a theory.

Davidski said...

Yes, the Oracle results will look much better after I stick many more ancient genomes into the reference set, including hopefully Ust-Ishim.

About Time said...

Would be interesting to see an oracle using nothing modern from Europe, but only MA-1, SF11, La Brana, and modern Middle East, African, and Asian clusters.

Btw to correct my sloppy text, I mean we moderns are mixed. We are in a strange position wrt derived Eurasians: We have derived Eurasian from WHG and later ANE (even more derived aoparently, thus more Amerind + India like), but also a large amount of underived Basal ancestry. So a mixed bag in that regard.

I personally suspect derived Eurasian HG picked up physical adaptations from various archaic. But the main human "root sapiens" lineage was Basal, which fostered selection for fully modern behavioral traits and associated neurophysiological changes.

The mixed bag comes in because farmers needed "supper staff" of semi-HG pops that were better physically adapted to local natural environments (heat, cold, sun) and better hunters to procure necessary protein and protection of EEF settlements from animals and hostile HG.

Tesmos said...

Davidksi, what do you think about the North Sea score? It is a bit higher than the East Euro. Makes me wonder where the North Sea component was orignated.

Davidski said...

Still hard to say, but I'm looking at Ajvide58 now and its North Sea score is much higher at 31.02%. Of course this sample is from the Neolithic, so a bit younger than StoraFörvar11. The difference in the ancestry proportions might be the result of a hunter-gatherer migration from the west to Gotland, admixture from Neolithic farmers, and/or genetic drift towards a higher membership in the North Sea cluster.

I need to have a look at a couple more of the foragers and also Gokhem2, the TRB farmer, to get a better idea of what the North Sea cluster might represent.

Matt said...

On the world PCA it looks like PC1 is an ANE vs Africans PCA, as the Karitiana and Greenlanders look right shifted, as well as SF11.

It seems like position on that PCA is determined by the rank order of drift shared with MA-1, like so SF11>recent West Eurasians / La Brana>East Asians/ASI>Oceanians(Denisovan admixture)>Africans.

I wonder if you projected MA-1 it would sit along the parallel from Yoruba to Pathan, but shifted farther to the right (I guess not possible due to SNPs).

Example -

(one line cuts parallel to Yoruba, the other through the Han-Karitiana axis, I'm guessing MA-1 might sit at the intersection).

Dimension 2 is obviously a dimension giving residual leftover distance for East Eurasians and present day West Eurasians from Africa and one another. I'd guess dimension 3 would be a residual Oceanian vs African and everyone else dimension?

About Time said...

Curious about how Gokhem fits in North Sea. North Sea looks like a stabilized blend of WHG + ANE with its own drift.

My pet theory is that some of that drift is from a specialized northern EEF branch like TRB (so Gokhem is great test of theory).

Also we should be thinking about Vistula links with Black Sea and thus West Asia in some periods. Would be great to have DNA samples from Lusatian culture etc. Esp as related to apparent deep rooted R1a and even Q clades in Kashubians and Sorbs (Vistula branch of Comb Ceramic? And what is relation to Gotland, Denmark etc in Neolithic?).

About Time said...

FWIW some discussion of Kashubian Q1a (related to Gotland/South Swedish Q? Is this from very old Comb Ceramic?):

Also cites some discussion in Polska that Kashubians might have some old connections with Mesolithic Comb-Ceramic peoples:

Davidski said...

And Ajvide70 is 33.95% North Sea.

Chad Rohlfsen said...

I would not be surprised if ANE was present in the Maglemosian Culture and therefore ANE will be found in Mesolithic Britain, Northern France and across to Poland.

Seinundzeit said...

Thanks David! The output is extremely interesting.

I just had one question. Is it possible for you to try a Eurasian-only PCA plot with Sf11, and a separate Eurasian-only PCA plot for La Brana-1? I was hoping you would use all the populations in your data-set, but just to the exclusion of Native Americans, Sub-Saharan Africans, and Oceanians. It would be quite interesting to compare them with MA1. On such PCA plots, MA1 tends to cluster with South Central Asians. In fact, MA1 tends to cluster near a specific sample (HGDP00214), but is much more "eastern" than them. I don't think Sf11 and La Brana-1 will behave in a similar manner (I'm guessing La Brana-1 will just cluster with northeastern Europeans, and Sf11 won't be too different either). But that's just me guessing, it would awesome if we could see how they actually compare to MA1, when measured in a modern Eurasian context.

Davidski said...

Below are the Eurogenes K15 results for Gokhem2, the TRB farmer genome. It certainly look part Western Hunter-Gatherer (WHG), but not Ancient North Eurasian (ANE). Note the 0% Eastern Euro, 0% South Asian and 0% Amerindian. What this suggests is that these TRB farmers acquired their WHG admixture somewhere in Western or Central Europe where there was no ANE at the time.

Following on from that, the North Sea component appears to be almost 0% ANE, which makes sense because MA-1 only has 2.9% of it. I'd say its origins are mostly in Northwest Germany and/or the Frisian coast, where there was little ANE but plenty of WHG. On the other hand, the Atlantic component is obviously native to the Atlantic facade of Europe.

North_Sea 12.65
Atlantic 21.49
Baltic 5.06
Eastern_Euro 0
West_Med 38.42
West_Asian 0
East_Med 8.19
Red_Sea 2.47
South_Asian 0
Southeast_Asian 5.22
Siberian 2.3
Amerindian 0
Oceanian 4
Northeast_African 0.21
Sub-Saharan 0

Chad Rohlfsen said...

Wasn't gokhem2 50% EEF, 50% whg? I remember her basal being about 23%.

Davidski said...

I can't see an WHG/EEF ratio estimate for Gokhem2 in the Skoglund paper. But the Basal Eurasian ratio is indeed reported to be 22.8%.

By the way, what I should have added about the Atlantic component is that it appears to carry much more EEF influence than the Baltic, North Sea and Eastern Euro components.

barakobama said...

Davidski, so would you say the main components in northwestern Europe descend from west European hunter gatherers and farmers, not east European Indo Europeans?

Davidski said...

My views haven't really changed after seeing these results.

Western Europe was initially inhabited by hunter-gatherers with no ANE. However, there was up to 20% of ANE in Scandinavia until the Neolithic, during which it was probably reduced to trivial amounts by farmers who didn't carry any of it.

So modern Northwestern Europeans, including Scandinavians, got most of their 15-17% of ANE from post-Neolithic population expansions from somewhere in the east, most likely the middle Volga region.

The only thing we don't really know still is the scale of the population turnover across Northern Europe from the Copper Age onwards. If the early Indo-Europeans were 100% ANE, then 15-17% of the population was replaced. But this is very unlikely.

I'd say it's reasonable to assume from all we've seen that the Proto-Indo-Europeans were 40-50% ANE. If so, this means that well over 25% of the population across Northern Europe was replaced during the early Indo-European dispersals, and some of the WHG and EEF now seen there actually comes from the middle Volga region.

Helgenes50 said...

Thanks for these results.

SF11 and Gokhem2 lack of West Asian, don't you think that this one is of IE origin ?

Chad Rohlfsen said...

Yeah, I sort of inferred that from the data, as she is roughly 50% of the ~44% basal of the farmers. It's the same as my Basal score, so she is off in the Atlantic, across from Brits and myself. If you replaced 14% of her WHG with ANE, she is SE English, today.

Davidski said...


Yep, I think the PIE genomes from the steppe will show some of that West Asian from contacts with the early Kartvelians and Maykopians from the Caucasus. It'll be part of their EEF-like component. But they'll be mostly a mix of ANE and WHG.

truth said...

But..if PIE were R1a, and today ANE peaks in the Caucasus, how come the caucasus populations have low levels of R1a ? Is their source of ANE different ?

Seinundzeit said...


Here are MA1's HarappaWorld results:

30.99% NE-Euro
23.56% Baloch
17.91% American
12.52% S-Indian
8.19% Beringian
2.62% W-African
2.50% Papuan
1.52% Pygmy
0.19% San
0.00% Caucasian
0.00% SE-Asian
0.00% Siberian
0.00% NE-Asian
0.00% Mediterranean
0.00% SW-Asian
0.00% E-African

[1,] “haryana-jatt_harappa_5″ “29.985″
[2,] “tajik_yunusbayev_15″ “33.0286″
[3,] “pashtun_harappa_3″ “33.7657″
[4,] “punjabi-jatt_harappa_8″ “35.1815″
[5,] “burusho_hgdp_25″ “35.3912″
[6,] “nepalese-a_xing_12″ “35.6073″
[7,] “pathan_hgdp_23″ “35.8363″
[8,] “chuvash_behar_17″ “36.4808″
[9,] “bhatia_harappa_2″ “36.7083″
[10,] “kalash_hgdp_23″ “37.0992″
[11,] “kashmiri_harappa_2″ “37.1299″
[12,] “punjabi-brahmin_harappa_2″ “37.689″
[13,] “kashmiri-pandit_reich_5″ “37.8599″
[14,] “singapore-indian-c_sgvp_10″ “38.2385″
[15,] “punjabi_harappa_10″ “38.3354″
[16,] “up-brahmin_harappa_3″ “38.7022″
[17,] “punjabi-ramgarhia_harappa_2″ “39.395″
[18,] “punjabi-arain_xing_25″ “39.6685″
[19,] “uzbek_behar_15″ “39.6911″
[20,] “gujarati-muslim_harappa_3″ “39.6944″

[1,] “64.9% haryana-jatt_harappa_5 + 35.1% mexican_1000genomes_64″ “19.9687″
[2,] “33.5% ecuadorian_bryc_19 + 66.5% haryana-jatt_harappa_5″ “19.9937″
[3,] “76.6% haryana-jatt_harappa_5 + 23.4% peruvian_1000genomes_69″ “20.789″
[4,] “79.3% haryana-jatt_harappa_5 + 20.7% maya_hgdp_21″ “20.989″
[5,] “80.5% haryana-jatt_harappa_5 + 19.5% pima_hgdp_13″ “21.0829″
[6,] “20.5% bolivian_xing_22 + 79.5% haryana-jatt_harappa_5″ “21.1178″
[7,] “80.5% haryana-jatt_harappa_5 + 19.5% totonac_xing_23″ “21.1633″
[8,] “37.3% colombian_bryc_26 + 62.7% haryana-jatt_harappa_5″ “21.2279″
[9,] “18.5% colombian_hgdp_7 + 81.5% haryana-jatt_harappa_5″ “21.3809″
[10,] “81.6% haryana-jatt_harappa_5 + 18.4% karitiana_hgdp_12″ “21.3939″
[11,] “81.6% haryana-jatt_harappa_5 + 18.4% surui_hgdp_6″ “21.3945″
[12,] “40.8% mexican_1000genomes_64 + 59.2% nepalese-a_xing_12″ “22.3732″
[13,] “28.5% finnish_1000genomes_100 + 71.5% haryana-jatt_harappa_5″ “22.4033″
[14,] “40.3% mexican_1000genomes_64 + 59.7% punjabi-jatt_harappa_8″ “22.4566″
[15,] “64.9% haryana-jatt_harappa_5 + 35.1% russian_hgdp_25″ “22.5441″
[16,] “40.8% chuvash_behar_17 + 59.2% haryana-jatt_harappa_5″ “22.5521″
[17,] “36.5% colombian_1000genomes_72 + 63.5% haryana-jatt_harappa_5″ “22.6058″
[18,] “38.9% ecuadorian_bryc_19 + 61.1% nepalese-a_xing_12″ “22.6483″
[19,] “38.5% ecuadorian_bryc_19 + 61.5% punjabi-jatt_harappa_8″ “22.6501″
[20,] “63.5% haryana-jatt_harappa_5 + 36.5% mordovian_yunusbayev_15″ “22.789″

The geographic origins of their top 20 single population matches are quite interesting.

85% of matches are South Asian
10% of matches are Central Asian
5% of matches are European

Obviously, MA1 doesn't have any South Asian admixture, so the question that immediately comes to mind is why MA1 can be modeled as a modern South Asian with some Native American admixture? Parsimoniously, because it's South Asians and Native Americans that have the highest ANE admixture out of all modern populations. Formal methods for testing admixture demonstrate that northern South Asians show much stronger signals of ANE admixture than Lezgians and Chechens. Native Americans seem to be 40%-45% ANE, and northern South Asians seem to be 35%-40% ANE. So, the R1a connection persists, since most southern Central Asians and northwestern South Asian populations range between 70% R1a to 20% R1a, in addition to R2, and the occasional/rare R1b.

PS: Please note that MA1 has 0% "Caucasian" admixture. Same goes for Scandinavian hunter gatherers with ANE admixture.

Maju said...

ANE evidences the Eastern European penetration (inferred as Indoeuropean) in those parts of Europe where it did not exist in the Neolithic nor Epipaleolithic.

But we can't say much more than that with the data we have. I was earlier about to discuss Davidski's opinion about early IEs having as much as 40-50% ANE, when today's Eastern Europeans have around 16% only but the matter, when we try to discern such a great detail, becomes so confuse (for lack of enough data) that it's better not to issue an strong opinion, not yet.

In NW Europe (for example Scotland), it's very possible that a good deal of the ANE component is pre-Indoeuropean, because it has been detected among Epipaleolithic Swedes, which were presumably of the same cultural area (Hamburgian and successor cultures, which contrast to some extent with Magdalenian and successors further south, whose known representatives lacked meaningful ANE affinity).

It is perfectly possible that the same happens in the Caucasus or wherever (we have even less data when we move away from Western and Northern Europe). ANE should not be automatically identified with "Indoeuropean" ethnicity, nor should R1a in my understanding, even if in some cases they do indeed strongly correlate. Correlations may perfectly vary in different circumstances. We should not happily jump from the particular to the general without any evidence.

Maju said...

"why MA1 can be modeled as a modern South Asian with some Native American admixture?"

Very possibly because when the "Paleosiberian" population that is at the root of Ma1 and, partly, of Native Americans diverged, West Eurasians were still not really distinct from South Asians, especially those from the North (Indus and Ganges basins).

Or possibly this I say, in addition to what you say and in addition to the significant "ANI" element among South Asians, which is of Neolithic West Asian origin almost certainly.

Seinundzeit said...


These are very good/important points. But I would note that the formal testing tells us the same story. With formal testing involving f-statistics, we don't face the possible issues we see with ADMIXTURE:

Lezgian; MA1, Samaritian -0.00388778 -5.80145

Pashtun; MA1, Samaritian -0.00501548 -6.82275


Lezgian; MA1, Palestinian -0.00307633 -7.5444

Pashtun; MA1, Palestinian -0.00392319 -9.51417

This is some very interesting output, courtesy of a friend of mine. The first number is the actual f3 score. The second number is the z-score, which I guess is a quantification of confidence. We already know that the Lezgians are at approximately 30% ANE. As is quite evident, Pashtuns show much stronger evidence of ANE admixture in comparison to them. The f3 score is quite higher, and on top of that, the fit is more robust. This method never conflates ENA with ANE, which is very important for our purposes.

Also, I will note that La Brana-1 (as well Neolithic European farmers) never show any affinity to South Asia. It's just MA1 and Afontova Gora, both of whom have some sort of relationship with the Gedrosian/southern Central Asian component, and with the South Indian/peninsular South Asian component. Where there is smoke, there is surely fire. And with this much smoke, there has to be a full scale blaze at work.

About Time said...

@Seinundzeit, maybe ANE is ancestral for Amerinds and South Asians.

Ie, South Asia is a refugium for populations that were at first mainly living in Pleistocene Siberia / Tarim Basin / Oxus that only later were pushed past the Khyber Pass into S Asia.

Imagine hunter bands roaming out of Arabia, probably following the Tigris/Euphrates and or Persian Gulf littoral. Some taking a turn up the Caspian littoral and discovering plentiful game near Oxus / Aral Sea and up to Semirechiye (notice the "seven rivers" exists both in Kazakhstan and ancient India - Sapta Sindhu. Interesting, no?).

Those northerners become MA1. At first keeping in touch (seasonal migrations, game trails) with their cousins in C Asia / Tarim. Eventually losing touch with climate changes / glaciation making travel difficult.

End result is what we see in Asia today. MA1 would be a tiny "snapshot" of an eastern and northern extreme of that big hunting zone. ANI/ASI might be two southern branches if the same tree. ASI could simply be an earlier wave that came through the Khyber Pass into India, and ANI a related but later wave that was more in touch with ancestors of Paleo-Anatolians.

But ultimately MA1, ANI, ASI, and AWA (? Ancestral West Asian) were all the same population.

Seinundzeit said...

About Time,

This is actually a very fascinating scenario, I think it could account for much of the data. I have a feeling that "Ancestral North Eurasian" is a misnomer, just a function of the fact that we found MA1 in the far north. Just as you've postulated, I think it's eminently reasonable to suggest that ANE populations (and related groups) could have existed throughout vast portions of ancient Eurasia. I find the South Asian refugium idea to be very interesting.

About Time said...

@Seinundzeit, topic brings to mind findings in Wang 2012. Look at how close even Sardinians are to even Palestinians and even Pathans today. Contrast with SSAs and She or Maya.

Part of this (the upper left axis of graph) is probably effect of Basal and/or EEF mixing into Sardinians/Palestinians/Pathans, bringing them all closer together --- towards Aral Sea btw (urheimat of hunter bands? Prior to dedication).

But --- Part of what we see in Asia (the Y axis of graph, the part furthest from SSA) might be also be result of common origins from the original non-Basal Aral (?) hunters. Maybe they all were Neanderthalized, but only some mixed with Eastern Neanderthals (or others like Heidelberhensis or Erectus variants?).

Also consider original inhabitants of India might have been more like Papuans, but were pushed out eastward and into islands by ASI hunter bands as they spread out from Aral vicinity.

Africans obviously were affected by third factor. Not surprising, as Africans have physical adaptations nobody else has. Means that just like Euros and Asians, Africans probably have some pre-Sapiens genes. Probably environmental adaptations, because all pre-Sapiens---everywhere---had less complex tool use so was much more exposed to environment.

I imagine Basals as the ur-Sapiens, the line with the fewest non-Sapiens genes. Why? Because the trade off of pre-Sapiens genes was: good environmental physical adaptations, but also bad disruption of molecular pathways (neurological ones I bet) that gave otherwise fragile Sapiens the decisive advantage.

Physical weakness causes Sapiens to cooperate better and use tools to make up for physical defects or shortcomings. Brains over brawn, and most of all caring for others, including the elderly and infirm (who might have had valuable knowledge that sometimes have key survival benefits). Like a Hawkings or even Einstein, which a hunter band would just leave behind (stupidly) as a "weak link."

Maju said...

Seniundzeit: what the figures indicate is not admixture but affinity. The greater Ma1 affinity of Pashtuns can, at least in theory, originate in the fact that they belong to the same "Central Asian" macro-population as Ma1. I'm not saying it's as simple as that but it's a plausible major cause; lacking aDNA for ancient Afghans or Pakistanis we cannot establish a causation of any sort beyond mere speculations.

A key evidence in this regard is that the "Central Asian" component of Hui Li 2009, which peaks in West Siberia (Khanty, neighbors of the Kets, who are the closest thing to Ma1 alive) is very important in Afghanistan, making 50% or more of the Hazara genetic makeup. See here. This component is not the same as "ANE", of course, but it indicates strong West Siberia-Central Asia genetic unity, penetrating clearly into the NW areas of the Indian subcontinent.

Only if we knew, as happens in parts of Europe, that ancient populations lacked that affinity, we could reach to any solid conclusion. But so far no aDNA from AfPak.

Davidski said...


How is it possible for Neolithic West Asians to have carried ANE if there was no ANE in Western and Central Europe until after the Neolithic? Think about it.

But there's no reason that I can see why the middle Volga populations of the late Neolithic couldn't have carried 40-50% of ANE. Indeed, there's actually an abstract online suggesting that they carried enough to be a source of the ANE presently found in Europe.


The North Caucasus probably gets its higher ANE than Europe from its more eastern geography (and thus closer proximity to the Urals, where I'd say ANE peaked until historic times), rather than the Indo-European expansions, which seem to have resulted in a rapid spread of the same small group people and a founder effect of an ANE derived paternal haplogroup.


Unfortunately, those Oracle results are unlikely to be correct. The reason is that the reference samples were tested under different conditions than MA-1 (they were in the ADMIXTURE run that produced the allele frequencies for the test, while MA-1 wasn't).

In order to produce an accurate Oracle test the reference samples have to be run under exactly the same conditions as the test sample. In other words, the allele frequencies must be sourced from one set of samples, and another set of samples then used to produce the population averages for the Oracles. I know that Zack doesn't do this because he doesn't believe it's necessary, but he's wrong.

Seinundzeit said...


This is certainly plausible. But if we think in terms of affinity, and cite a lack of aDNA from southern Central Asia+northwestern South Asia, we run into quite a few issues. The Raghavan et al. paper came out quite some time before the paper on Anzick, so they had no Native American aDNA. Yet, that didn't stop them from inferring around 45% ANE admixture for Native Americans. We don't have aDNA from the northern Caucasus, but that didn't stop Lazaridis et al. from inferring around 30% ANE for Lezgians.

Also, TreeMix infers 38%-42% ANE for Pashtuns. F4 ratio estimation puts Pashtuns at 34%-39% ANE admixed. David's supervised ADMIXTURE runs put people in South Asia at an almost constant 30% ANE (to put that in perspective, I think Lezgians were 25%-27% ANE in David's ADMIXTURE runs. I'll have to recheck). So, everything is right in place for us to claim that ANE admixture peaks in South Asia, if we are talking about a Eurasian context. Not to mention the fact that MA1 clusters with South Asians on PCA plots, even when the South Asian dimension of genetic variation is fully fleshed out on the PCA plot in question.

Off course, I'm not claiming that you are incorrect. Your suggestion is very likely, but it is less parsimonious.


The calculator effect is surely a serious problem, but not as much for highly divergent components. It should be noted that MA1's HarappaWorld results are pretty much identical (qualitatively speaking) to what he got in Raghavan et al., and in Lazaridis et al. So far, he always tends to be a mix of South Asian+Native American+Northern European on unsupervised ADMIXTURE runs. Tests that have a southern Central Asian-specific component, like the "Baloch" component, tend to have that as MA1's second largest component. In that light, I'd say the oracle results do make sense. Also, I'm assuming he can be compared to people who weren't included in the original admixture run.

Chad Rohlfsen said...


Karitiana, was used for figuring ANE into Native Americans. None of the components listed for MA-1, are pure components. They are all mixed and contain things that are lacking in MA-1. Modern components cannot be used to describe the aDNA of ancient samples.

Davidski said...

It's hard to predict what kind of an impact the calculator effect has on this. Sometimes it doesn't take much of a shift to give a very different impression of a sample.

In my oracles MA-1 is closest to the Mari, Burusho, Chuvash, Shors, and Ket. I think that's correct based on what I've seen in the recent papers.

In any case, even if ANE ranged deep into South Asia before Indo-European expansions, which is very likely, we're now seeing genetics, linguistics and archeology converging very nicely to explain what happened during the Bronze Age.

Seinundzeit said...

Chad Rohlfsen,

No doubt about that my friend. MA1 isn't predominantly North Indian, with some serious Mexican admixture. Rather, his relatives contributed a lot of genetic ancestry to Native Americans, South Asians, and Europeans. So in a modern context, he becomes construed as a mix of "clusters" which have a strong share of ANE allele frequencies in their composition.

Regardless, the ADMIXTURE results are weak tea. There is a mountain of data demonstrating the same phenomena, and in a more unambiguous manner.


I'd agree with that, the young age of R1a is good evidence for an expansion into South Asia during the Bronze Age. I'd say that a substantial portion of South Asian ANE ancestry should be of Indo-Aryan origins.

For what it's worth, I'm assuming that we are probably dealing with three layers of ANE admixture. First from Paleolithic/Mesolithic hunter gatherer populations deep inside South Asia, closely related to MA1, and linked to ANE hunter gatherer populations in Central Asia. Secondly, perhaps more ANE admixture entered South Asia via the Indo-Aryan expansion. Finally, the last dose of ANE admixture might have occurred with the massive hordes that descended on peripheral South Asia (from Central Asia) during the historical period, and likely played a part in the ethnogenesis of groups like Gujars, Jatts, and Pashtuns.

If we want to think in terms of ADMIXTURE components, and if we want to link these movements to ADMIXTURE components, I'm speculating that this could describe things in a reasonable manner:

ANE admixture mediated via the cluster modal in South India probably involves very ancient ANE ancestry in South Asia, perhaps of Paleolithic or Mesolithic vintage. By contrast, the ANE admixture mediated via the cluster modal in Afghanistan/western Pakistan probably involves Indo-Aryan input. Later ANE admixture from Scythians, Kushans, Hepthalites, etc, is probably articulated via the "Northern European" percentages found throughout South Asia.

That's my tentative/rough picture of the situation, so far.

About Time said...

I wonder whether the ANE in West Asia is "mixture" in the sense we assume. What if it's the reverse: West Asia started out more ANE than it is now, and has been mixed by waves of Egyptian/Palestinian/Armenian(?) populations over time.

A corollary could be that Mitanni/Median/Kurdish/Pashto are not really "expansions" from somewhere else, but remnants of the original inhabitants of those areas that have been mixed not with ANE (they were always ANE), but instead mixed with Basal/EEF.

Think of this. Farmers have to stay put and move only with population, requiring arable land. Their expansions will be slow.

But hunters move with animals. Big animals like mammoths, horses, wild ox (?) move a lot. So hunters would have a very wide range at low population densities. So it's not so strange to imagine ANE everywhere between Syro-Anatolian steppe, Aral/Caspian littoral (wasn't that a big inland sea at some point?), etc.

But those ANE would be swamped out by EEF with higher population carrying capacity settlements, so it might look like "a little ANE introgression" especially near Neolithic hot spots like Syro-Anatolia, Persia, etc.

Further out, ANE had more staying power and had more % of descendant populations. India/AfPak, Amerinds, etc. We should look at Tamils etc: they might have just as much ANE as Pathans.

Davidski said...

ANE is a recent introduction to the Near East because just like Western Europe and North Africa, the Near East still has people who lack it completely, like some Bedouins.

Also, in South Asia it certainly peaks in and around the Hindu Kush, so Tamils can't have as much ANE as Pathans. You can see that on this figure of shared drift between modern populations and MA-1, where South Asians generally don't feature near the top of the list.

The other thing you can see on that diagram is the phylogeography of ANE. It's obviously a component of the north, closely related to the Upper Paleolithic people of Europe, Central Asia and Siberia. It doesn't look at home in southwest or southeast Asia.

Matt said...


Are there any HarappaWorld tests for other ancient samples? Like La Brana etc.

Notwithstanding the calculator effect, in MA-1 and South Asian, I'd tentatively say:

"South Asian" components seem to be composite of ENA, of a sort, plus West Eurasians, of a sort. Large amounts of ASI, which seems ENA with low drift, but not wholly separated from recent ANI ancestry as the mixing is deep and strong that references like the Onge have to be treated carefully to capture this.

"Amerind" components are East Asian plus ANE.

Adding "South Asian" to "Amerind" components approximates a population which is not that close to either South Asians or Amerinds, but is ENA shifted compared to present day West Eurasians, and ANE shifted compared to present day West Eurasians.

So this combination gives you generic ENA shiftedness than isn't biased towards any particular ENA, and gives you ANE affinity.

Combine this with larger amounts of West-Central Eurasian components (Northeast European and "West Asian" components of sorts) and you can begin to approximate MA-1. Adding in other components like African and Oceanian and the approximation can become more fine.

MA-1 certainly sits in its world relationships relatively close to present day South Asians (perhaps after the North East Europeans), but whether it is actually ancestral to them, or their mix approximates the nature of its population fairly well seems like an open question.


On another topic, for on Gokhem 2 following the adding up method of adding up fractional component fsts and comparing it to other European populations examined using the same method, and the Eurogenes proportions places it in an unusual position.

Gokhem 2's "formula" is mostly like Sardinian, but swaps out some of East and West Med for North Sea and for Eastern Non African components which are divergent from one another, and in the case of Oceanian, extremely divergent on a world scale.

So it mostly ends up like West Med and Atlantic components, just with slightly decreased fsts with ENA components and slightly more increased fsts with West Eurasian components.

However, as the West Med is extremely non-ENA shifted (beyond even the Sardinians), the end result is that Gokhem 2 ends up with similar distances to Eastern Non Africans components as Basques and just higher distances from present day West Eurasian components.

Not sure whether this is modelling anything real - seems like it could be hinting at a drifted EEF + La Brana like combination, as in Skoglund, approximating that as best it can (with calculator effects in play).

Maju said...


"How is it possible for Neolithic West Asians to have carried ANE if there was no ANE in Western and Central Europe until after the Neolithic? Think about it."

How was it possible for Ancient Scandinavians to have carried ANE then? We do not know and for certain Kabul is quite closer to Irkutsk than Stockholm is.

"But there's no reason that I can see why the middle Volga populations of the late Neolithic couldn't have carried 40-50% of ANE."

It seems a bit too high, considering what we see today in Eastern Europe. It would have to have been diluted into 3-4 parts of Western European blood, what is not likely at all.

"Indeed, there's actually an abstract online suggesting that they carried enough to be a source of the ANE presently found in Europe."

I think that most of ANE comes from Eastern Europe in much more general terms and my guesstimate for Volga peoples in particular is of ~24% ANE. A key issue here is that not all EEFs cluster well with Sardinians but some do with other Italians or Basques or Iberians, all of which have a lot more ANE than the islanders, so you can get very extremist estimates if your only pre-IE reference are Sardinians but the situation balances a lot if you take other more realistic references instead. I did some estimates privately but I never dared to publish because it's so subtle and uncertain that I really prefer to wait a couple of years, if luck is on our side, and have more data before issuing an opinion.

Maju said...


The example you mention of Native American ANE "admixture" actually illustrates my point, because it is something so old and so directly linked to the genesis of Native Americans in Siberia that talking of "admixture" is messing up with the vocabulary. How did they estimate that "admixture"? By comparing with mainstream East Asians, which are known to be genetically akin to NAs. If you compare with Anzick, you get 0% ANE "admixture" instead. That's why local aDNA is so important in order to estimate these things.

Obviously ANE affinity in Native Americans has nothing to do with Indoeuropeans.

"ANE admixture peaks in South Asia"

Actually among Kets. And you should not talk of "admixture" but "affinity", I insist, because "admixture" implies that the core origin was something else before, something that we either don't know or is clearly not the case (at least for Siberians and Native Americans).

I don't know how to interpret the high ANE scores you mention for South Asia but it's perfectly possible and likely that it has other explanations than just Siberian or IE flow into South Asia, which, if ever happened, was almost certainly very diluted.

The question is what are South Asians being compared with (besides Ma1)? And the answer is Palestinians (or Samaritans, similar enough). Anyone knowing of population genetics knows that Caucasus peoples are closer to Palestinians than South Asians are, so, by default, South Asians will score more Ma1. Why not to compare them with, say, Tamils? The results are bound to be totally opposite.

Davidski said...

What do you mean how is it possible for ancient Scandinavians to have carried ANE? It's not only possible, it's a proven fact.

But there's no evidence of ANE in West Asia during the Neolithic, and if there was then Neolithic farmers in Europe would've carried it, but they didn't. Also, Kabul isn't in West Asia, it's in Central Asia.

And I still don't see why there couldn't have been 40-50% of ANE around the middle Volga during the late Neolithic, considering that Northeast Caucasians today carry 25-30% of ANE, and the Chuvash, Mari and western Siberians probably more than that, except we don't know how much exactly because they have a lot of fairly recent WHG and ENA ancestry too, from the west and east, respectively, which confuses things. Lazaridis et al. didn't even attempt to estimate ANE in such populations, which is a shame, but it might not be doable with the tools available currently.

Maju said...


"ANE is a recent introduction to the Near East because just like Western Europe and North Africa, it still has people who lack it completely, like some Bedouins".

You're making a huge assumption here about West Asia being homogeneous in the past, what is almost certainly wrong.

@Matt: good observations.

Maju said...

"What do you mean how is it possible for ancient Scandinavians to have carried ANE? It's not only possible, it's a proven fact."


Of course. But that's precisely my point: we lack of any such "proven facts" (in either direction) for ancient West Asia, Caucasus, South Asia, etc.

"But there's no evidence of ANE in West Asia during the Neolithic"...

Nor the opposite. We do not have a single autosomal sequence for the whole West Asia at any point in Prehistory! Same for South Asia, Caucasus, etc.

"But there's no evidence of ANE in West Asia during the Neolithic"...

Compared with what? EEFs? WHGs? Neither are likely to represent the ancient Caucasian population. It's apples, oranges and pineapples.

WHG and EEF are not useful proxies for West nor South Asia, nor the Caucasus either. It's like saying: I know dogs and cats, so I'm going to evaluate how an elephant compares to them. Is an elephant more dog-like or rather cat-like? The answer is neither.

We need at least some local aDNA samples before we can judge other regions than Western Europe (and even in Western Europe, some more samples would be nice to have).

Davidski said...

So where do you think the Bedouin B sample originated? Deep in the Arabian Peninsula where there was never any ANE? Perhaps, but even Yemenite Jews carry ANE, and so do most Saudis.

It seems to me like these Bedouins are a remnant of a pre-ANE Near East, and I expect this will be backed up with ancient DNA.

Maju said...

I suspect that the important genetic differences within West Asia are in their core features very old, at the very least Neolithic. Also that some parts of West Asia have significant African and para-African ("aboriginal Arabian") elements missing further North that act as "anti-ANE" (or also "anti-WHG") weights (but "pro-EEF" because EEFs also had some of that African-like element). Negev Bedouins are extreme in these Southern West Asian trend, even more than Palestinians.

So why should we take Palestinians in general as reference for the Neolithic West Asian population? Unless there was some aDNA saying so, particularly further North (in Turkey, Caucasus or Kurdistan maybe), which is not the case, my stand is to remain cautious and accept as at least a strong possibility that the modern genetic makeup of the region has deep local roots in each area.

And of course it's pointless to use Bedouins or Palestinians as reference for South Asia. Only their African-like component weights importantly against any other Eurasian affinity, be it Ma1 or whichever else.

That's probably why Basques appear as much more ANE than Sardinians, because our WHG element is much stronger and therefore the African-like one weights less. And that's why Canarians appear more EEF-like than they should considering their largely Spaniard origins: because they also have an important NW African element that weights against ANE and WHG and any other Eurasian affinity.

We have to take the African-like element into account (Palestinians, EEFs) and also understand that there was necessarily diversity in prehistoric times, diversity that is only very limitedly documented yet.

Davidski said...

I never said Palestinians were good proxies for Neolithic West Asians. I don't think they are, because they do carry some ANE, while in my opinion Neolithic West Asians didn't carry any ANE, and were basically like the Bedouin B, except maybe with less Sub-Saharan admixture.

Where in the Fertile Crescent did the Near Eastern ancestors of Stuttgart originate in your opinion? Wherever it was, there was obviously no ANE there when they left for Europe.

About Time said...

Hmm, well the other assumption is that populations don't get up and move en toto. What if asking about ANE in Kurdistan 6 kya is like asking about Amerinds in New England 500 years ago?

They surely were there, but you'd have to genome scan 1000's of people to pick up any trace - and most of what you'd find would actually be reintroduction from other sources (Latino immigrants since the 20th century, only distantly related to Narragansetts and other original inhabitants).

Some of their relatives probably survived in scattered form in Canada, Great Lakes, maybe even further afield. It's impossible to know without ancient DNA.

The Neolithic was a long and slow tumult that changed the world, partly in the face if climate changes and partly in the face of human activity.

What we see now is the result of wholesale folk migrations in many cases. The Kets say they lived somewhere else, before the "stone people" made a war with then and forced them further north.

We know from the West Siberian Scythian mtDNA that major population changes affected Central Asia - it's more West Asian (EEF?) since the Iron Age.

About Time said...

@David, Bedouins move around. That Bedouin B sample could have roots in the Sudan or Sahara - or long ago (contra my scenario on this thread) in ancient Syria or Turkmenistan for that matter.

The oldest references to Amorites are in the north Levant. Before that, before camels were introduced from Central Asia, there were nomads using donkeys and mules (an African animal).

Speaking of population replacement, the Assyrian Empire did that as a matter of intentional policy. Move a troublesome population to the opposite corner of the very large empire.

People from poor areas also moved for work (just like they do now). No reason to assume ancient Mesopotamia = modern Mesopotamia.

Davidski said...

Those Bedouins aren't from the Sahara. They're perfectly Near Eastern apart from the fact that they don't have any ANE.

They're at the base of a West Asian cline that runs from the Levant to the northeast Caucasus, along which ANE rises from 0% among them to 28% among the Lezgins.

This suggests that there was a migration of ANE into the Near East and Caucasus from the northeast somewhere, like the ancient Urals.

About Time said...

@David, I don't doubt there was a migration of ANE southwards 3-4 kya. Same time as Maikop btw, "the mother culture of all Kurgans-Scythians."

That's exactly when Semitic + Hurrian + Indo-Aryan names show up in lists if mercenaries in the Middle East empires.

Also major upheavals in Anatolia, probably Hittites:

Further east, Gutians and later Kassites. Their religion was close to what we see in Rigveda (Indra, Mitra, Varuna) - different from historical Hinduism.

Simplest explanation is that ANE came from this. But was this just a late back migration of semi nomadic peoples with a folk memory of originating in the Miseast long ago, before being pushed out by sprawling cities and all the social problems that come with them?

Nomads like their space and freedom. But when space runs out, they turn against the empires that seem (from their pov) to oppress them. From the empires' POV, they are the "bad northern barbarians who congregate like locusts" and break up their calm imperial domains.

Davidski said...

It wasn't a back migration. It was the result of the domestication of the horse and the invention of the chariot, all of which happened on the steppe far to the north of the Caucasus.

Maju said...

"I never said Palestinians were good proxies for Neolithic West Asians."

Palestinians, Negev Bedoins (who are also Palestinians)... similar enough, especially in contrast to all the rest, which are rather in the J2-Anatolian-Zagros-Caucasus genetic cluster.

"Where in the Fertile Crescent did the Near Eastern ancestors of Stuttgart originate in your opinion?"

Right now I suspect that they have Palestinian-like origins via Cyprus, very possibly via the sea, with maybe lesser Anatolian elements. Cypriots are much more like Swedish Neolithic peoples than Turks are, even though they generally cluster very close to each other in almost every analysis.

But again the evidence for all this is more suggestive than conclusive: Cypriot extra Atlantic affinity, way too much African-like element among EEFs, E1b being rare in Anatolia, while G is common in Palestine, lack of any clear Anatolian precursor of Thessalian Neolithic... stuff like that.

"Wherever it was, there was obviously no ANE there when they left for Europe."

ANE is not an absolute category but relative to whatever is compared with. And obviously ANE (Ma1 and such) belong to the major Eurasian+ branch of Humankind, while in the West Asian EEF precursors (and also in Palestinians, Negev Bedouins, etc.) there is also some African-like element. This African-like genetic component always weight against ANE when compared with anything else from Eurasia, Australasia or America.

This is the "Basal Eurasian" thingy of Lazaridis, which in Skoglund's supplemental material is revealed as Dinka admixture (maybe not strictly Dinka but something from the Nile region for sure). There is absolutely no reason to believe that this African element was generalized in the West Asian Neolithic but rather than it had a founder effect in Thessalian Neolithic and, prior to that, in the Palestinian Mesolithic, particularly in the Harifian (desert variant, possible precursor of Negev Bedouins and source of Semitic languages).

Seinundzeit said...


Here are more results for ancient samples. None of them display MA1's South Asian affinity. Due to space, I'm only showing the top 5 results, but these samples don't get South Asians even with 50-60 matches, while almost all of MA1's top 60 matches are South Asian.

[1,] "russian_hgdp_25" "10.2533"
[2,] "belorussian_behar_9" "10.5786"
[3,] "lithuanian_behar_10" "10.8229"
[4,] "mordovian_yunusbayev_15" "11.5544"
[5,] "russian_behar_2" "12.7521"

[1,] "13.6% colombian_bryc_26 + 86.4% lithuanian_behar_10" "6.4163"
[2,] "14.7% colombian_1000genomes_72 + 85.3% lithuanian_behar_10" "6.4263"
[3,] "88.4% lithuanian_behar_10 + 11.6% mexican_1000genomes_64" "6.711"
[4,] "11% ecuadorian_bryc_19 + 89% lithuanian_behar_10" "6.7889"
[5,] "84.6% lithuanian_behar_10 + 15.4% puerto-rican_1000genomes_76" "6.9645"


[1,] "belorussian_behar_9" "12.709"
[2,] "lithuanian_behar_10" "12.7123"
[3,] "russian_hgdp_25" "14.7995"
[4,] "russian_behar_2" "15.1324"
[5,] "ukranian_yunusbayev_20" "15.3383"

[1,] "32.7% basque_hgdp_24 + 67.3% finnish_1000genomes_100" "7.1076"
[2,] "66.6% finnish_1000genomes_100 + 33.4% spain-basc_henn2012_20" "7.1336"
[3,] "53.8% british_1000genomes_99 + 46.2% finnish_1000genomes_100" "8.5706"
[4,] "44.2% finnish_1000genomes_100 + 55.8% orcadian_hgdp_15" "8.7302"
[5,] "45.6% finnish_1000genomes_100 + 54.4% utahn-white_1000genomes_100" "8.7494"

[1,] "spaniard_1000genomes_98" "8.9358"
[2,] "spaniard_behar_12" "9.8892"
[3,] "italian_hgdp_13" "15.8758"
[4,] "spain-basc_henn2012_20" "16.8806"
[5,] "basque_hgdp_24" "18.017"

[1,] "25.4% sardinian_hgdp_28 + 74.6% spaniard_1000genomes_98" "6.3022"
[2,] "22.4% finnish_1000genomes_100 + 77.6% sardinian_hgdp_28" "6.6152"
[3,] "28.1% sardinian_hgdp_28 + 71.9% spaniard_behar_12" "6.7616"
[4,] "14.8% morocco-n_henn2012_18 + 85.2% spaniard_1000genomes_98" "6.9766"
[5,] "85.8% spaniard_1000genomes_98 + 14.2% tunisia_henn2012_18" "6.9815"

All of these results make sense, so there isn't any reason why MA1 should be special, especially since his results are in sync with other more robust pieces of data.

Also, there isn't much of a calculator effect at work here. The results are different from Eurogenes oracles because Zack's data-set is much bigger, and much less European in composition. The input data shapes the parameters, and David has a very rich European data-set, while Zack has a very rich South Asian data-set, in addition to more populations from West and Central Asia (along with more African populations). Also, the clusters at work are completely different. HarappaWorld has two separate South Asian components, of completely different modalities and affinities. Finally, the actual HarappaWorld results are basically what MA1 gets in every ADMIXTURE run, including Eurogenes. South Asian+Native American+Northern European is the story of his life, as far as this software is concerned.


South Asians weren't being "compared" to any one. The software picks the pair by itself. If there is any evidence of admixture (not affinity, you need to read up on this software), it does everything by itself. If not, no population pairs with that population ever come up. The Tamils never come up in the results, that's not something we can control. It's all hands free. I'm assuming you are thinking of something else.

Maju said...

"South Asians weren't being "compared" to any one. The software picks the pair by itself."

I meant only the f3 results you mentioned.

Seinundzeit said...

Exactly my friend, I'm also talking about the f3 results. The software finds and creates pairs by itself, one doesn't do anything. It's all hands free.

In other words, MA1+Samaritian=Pashtun is discovered by the program itself. The statistics never involve Tamils, we don't play any part in tweaking that. Also, this isn't due to affinity, because that would defeat the purpose of this particular software. This software was designed to detect admixture. The more negative the f3 score, the greater the evidence/extent of admixture. The more negative the z-score, the greater the "confidence" in the fit. The top statistic for both Pashtuns and Lezgians is MA1+Samaritan. But Pashtuns have much stronger f3 scores, and much more robust z-scores, with MA1+Samaritan. In combination with everything else, we are seeing a consistent pattern. In accordance with TreeMix, f4 ratio estimation, ADMIXTURE results, PCA analyses, IBS, and shared drift estimates, Pashtuns should have much more ANE admixture in comparison to Lezgians. They should be in between 35%-40% ANE. Perhaps a more liberal range would be 30%-45%. But it is definitely going to be higher than 30%. Some of the scores have involved Chechen+MA1=Pashtun, and Lezgian+MA1=Pashtun. If that isn't proof, I don't know what constitutes proof any more.

The story is different for other South Asians. Most South Asians from the north get Lithuanian+Ho, so very ANE+WHG shifted European+ASI-rich tribal South Asian. For some reason, all Pashtun scores involve MA1 and Near Easterners, the ASI connection doesn't appear. But even for other South Asians, the fact that their best fit is Lithuanian+Ho is a strong sign that they have extremely substantial ANE admixture, since the Lithuanians are predominantly “western Derived Eurasian”.

Matt said...

@ Sein,
Not to cause any confusion, but I get the impression you interpreted / inferred what I said as thinking there was something about the Harappa project that would cause MA-1 to show a spurious South Asian affinity. That's not the case - I was just interested in seeing as many ancient samples in as many runs as possible.

All of these results make sense, so there isn't any reason why MA1 should be special, especially since his results are in sync with other more robust pieces of data.

Sure, those combinations seem to approximate MA-1 as well as present day calculator or oracle do.

MA-1 isn't "special", but it is a lot more ancient and a lot more distant from present day people, and the means by which it contributed to present day populations is more obscure, even than Gok2 or SF11.

If formal tree building along the lines of Laz and Skoglund finds that South Asian populations need to have direct or mediated contribution from an ANE population, as Native Americans and modern day Europeans do to represent their distances correctly, then I would be more likely to find a contribution from ANE to South Asia likely.

Even a basic finding that South Asians and Karitiana are closer than they should be based on shared Onge-like ENA ancestry and South Asians' ancient Near Eastern ancestry would be useful. If South Asians are formally closer to Amerinds, net of Onge like Eastern Non-African ancestry in Amerinds and South Asians, then we can begin talking about elevated ANE in South Asians.

Until then, I will probably assume that the relationship is more one of similar composition than contribution of a population similar to MA-1 to present day South Asian populations. But we will wait and see, I'm sure there is a lot of modelling underway on South and West Asian populations at the moment.

Maju said...

"I'm also talking about the f3 results. The software finds and creates pairs by itself, one doesn't do anything. It's all hands free."

Not at all. f3(Test, Ref1, Ref2):

Lezgian; MA1, Palestinian -0.00307633 -7.5444

... means "let's find out how do Lezgians score as a function of Ma1 and Palestinians". The tester chooses the samples and the result is totally dependent on them.

"In other words, MA1+Samaritian=Pashtun".

No, not at all.

"The more negative the f3 score, the greater the evidence/extent of admixture."

Yes, this is correct, I understand but, as Lazaridis et al. put in supp. info 11 of their famous paper:

The f3-statistic1,2 f3(Test, Ref1, Ref2) can be significantly negative if the Test population is a mixture of populations related to two reference populations Ref1 and Ref2. It is not necessary that the two reference populations be identical to the admixing ones.

They are just proxies. In the results it is clear that both Pashtuns (particularly) and Lezgians (to lesser extent but also) are better expressed as function of Samaritans+Ma1 than as function of Palestinians+Ma1. But that it's very logical because Samaritans are much more like Northern West Asians than Palestinians are.

But without testing for other reference population pairs we cannot know which is the best fitting pair of references.

If other pairs of references were tested as you say, I'd like to know the scores, particularly for those involving other South Asians or Iranians in the f3 of Pashtuns. Without knowing these other scores, the only thing we can say is that Samaritans work better as proxy for the non-Ma1 proxy in Pashtuns and Lezgians than Palestinians. Nothing else.

Seinundzeit said...


Formal modelling has already been done. I would recommend trying your hand at TreeMix or F4 ratio estimation. In general, f-statistics based work reveal the same exact patterns. You do need an unmediated contribution from MA1 to South Asians in order to account for the results.

I guess I'm coming at this question from a different angle in comparison to yourself. I've seen so much work on this issue, with identical results, that my confidence is fairly strong. Now, you probably don't know what I'm talking about, since you haven't had a chance to view this body of work. Taking that into account, I encourage you to wait for academic work on this question. You'll be quite surprised.


I think the language barrier is posing a problem. This was done on a complete data-set. In other words, this was done on a global data-set, the program choose pairs based on populations across the globe. We didn't force it to choose MA1+Samaritan. There were countless other populations.

About Time said...

@Sein, maybe the ANE in Pashtuns is older and the Samaritan is the later influx.

They have a strange story of their origins that nobody believes because it's "religious" and due to language (as if Indo-Iranian languages were never spoken in northern Canaan/Levant----which Mitanni evidence disproves).

Also the Pashtun language isn't Indo-Aryan; it's Iranian of the Northeastern / Scythian branch. Look up where Scythopolis is located sometime. I've been there and seen the beautiful Hellenistic mosaic of Sol Invictus.

Seinundzeit said...

About Time,

Certainly, the Pashtun origin myth involves Judaic antecedents. The barren hill region sandwiched between Peshawar and Nangarhar was once referred to as "Dasht-e-Yahudi", "wasteland of the Jews". Scholars have tended to doubt the local story, because (just as you mentioned) Pashto belongs to the same Iranic branch as the Scythian languages.

Pashtun ethnogenesis is very complicated, and completely shrouded in mystery. One scholar compared it to searching for the source of the Amazon. But that's to be expected, since Pashtuns have always been too busy killing each other, and thus have never enjoyed the free time to learn how to read and write. ;-)

Chad Rohlfsen said...

Turks are now 6-15% East Eurasian now. They are not going to plot as true as modern Cretans or Cypriots.

About Time said...

@Sein, well not exactly Judaic, but close. Rabbinic account is that modern Jews come from mainly the (small and poor) tribe of Judah, plus Benjamin, plus Levi (although some say many Levites were lost "in Babylon") plus remnants of the 10 other tribes that rebelled.

So modern Jewish populations should have a trace of whatever the 10 northern tribes were, but maybe only a little. Just for the sake of argument, what if northern tribes were ANE+Mideast? We do see some ANE in Jewish genomes, for sure.

So it's not the silliest theory. Plus the Pashtun have no reason religiously to claim a negative origin for themselves, both as related to Judeans and as "descendants of early bad apostates" at that. So proms facie, it's unusual with no clear motive for wanting to claim that "dubious" origin in a Muslim context.

We need to know a lot more about the ancient Mideast, including varying populations in each early city center. There were other cities with Indo-Aryan connections, not just Mitanni. Like Hattusa, some possible others in Syro-Anatolia that I can't think of offhand.

We forget, the Fertile Crescent was the "America" of the ancient world. Lots of people went there for trade, opportunities, adventure, jobs, education, etc. Greece was nothing until much later. Sumer/Babylon was the "big apple" of the day. No surprise if people had links to there and remembered it.

Seinundzeit said...

About Time,

I'd agree with that. The purported Israelite origins of Pashtuns were often used by neighboring ethnic groups to insult them. One source describes them as "a race of savage, cruel, brutish Jews who have lost the ways of their forefathers". So there really isn't much incentive in an Islamic context to claim links with Jewish people, yet Pashtuns have always done this. Pashtun G lineages and Q lineages were often cited as evidence of Judaic links, but I'm not sure if that panned out.

About Time said...

@Sein, one other thing. Jewish tradition is that the 10 northern tribes were very warlike.

Lots of peoples are warlike, but I think it's fair to say the Pashtun have lots of individuals at the high end of the global "warlike spectrum."

There are some other things too, little folk customs that an Israeli woman wrote about. Even physical appearance in some cases it's said.

Seinundzeit said...

About Time,

True, quite a few customs match. Pashtuna village mothers place knives near where they've placed their sleeping newborn/young children, to protect them from a Lilith-like evil. This has no precedent in any other Central Asian or South Asian culture. Pashtun villagers also light candles on Friday night, just like Jewish custom. Pashtuns also practice Levirate, and the punishment for adultery is in sync with the Torah, not with the Quran. A lot of Pashtun culture is oddly Levitical, and doesn't match up with neighboring groups in Central and South Asia. And I can vouch for the physical appearance, most Pashtun elders could easily be co-opted into documentaries about Biblical patriarchs, they look like they just rose from the pages of the Old Testament.

The funny thing is that the British noted the warlike Pashtun culture-warlike 10 tribes connection. One British official wrote:

"Anyone who has fought the Pathan can see with great clarity that the blood which flows in his veins is the same blood that flowed in the veins of those fierce men who destroyed Jericho."

Personally, I have no strong opinion on Pashtun origins. We probably will never know, due to a complete lack of written records, and the fact that the region has never been controlled by any empire (so no documentation).

Seinundzeit said...

Here, this is a fairly typical Pashtun elder, I can see why people might think he looks like a Biblical Patriarch.

About Time said...

@Sein, on the flipside, we also now have to deal with evidence of Z93 in Levites, so again the same unexplained connection.

There are even rare examples that look "pseudo European" like this gentleman:

The whole area was once called Ariana. The funny thing is, that sounds a lot like the name of the apostate king. Not the way it's written in English, but the way it's actually pronounced in Hebrew. Yaro'bham means in Hebrew "The People Contend." But in Avestan, Airiiaman / Erman means "a member of the community/tribe" and the name of a person who heals the community.

*Erminaz can also mean "great" in western languages. There is also Airmanareiks (rix is "king," a Celtic loanword). Very different titles, but the functions associated with that type is pretty similar in each case. Organizing tribes against tyrannical taxation in the oldest known eastern example and the latest western example.

Now, how this fits in with ANE or West Asian or even Kalash types of admixture we see in Europe, I have no real idea.

Seinundzeit said...

Yup, this chap is from Khost, right next to the HGDP Pashtun sampling location. His facial features are extremely typical, he would probably constitute a good Pashtun facial average, if only his skin was more golden-wheatish, and if his hair was dark brown-black.

These are some rather deeply-pigmented Pashtuns with identical facial features (these men are very dark for the average):

I'm with you, I have no real idea how these things connect together, but we are all trying to figure things out.

Seinundzeit said...

Just for the fun of it, here are a bunch of old photos, and a sketch:

On the other topic, I wanted to post f3 scores for a bunch of populations, but my friend also made use of 23andMe raw-data from participants. Scores involving them can't really be shown. So, I won't post the actual numbers, and I'll just use broad ethnic labels for participants. Scores are posted in order of most significant f3 score, so the first one is the most significant, and the last one is the least significant.

Uttar Pradesh Brahmins:
BrahminsUP; MA1, North African
BrahminsUP; Ho, Lithuanians
BrahminsUP; Bonda, Lithuanians
BrahminsUP; Bonda, British
BrahminsUP; Bonda, Armenians
BrahminsUP; Bonda, Basque
BrahminsUP; Ho, Lezgians
BrahminsUP; Ho, Armenians
BrahminsUP; Ho, Basque
BrahminsUP; Assyrian, Ho

Burusho; MA1, North African
Burusho; White American, Dai
Burusho; Han, Georgians
Burusho; Han, Armenians
Burusho; Han, Assyrian
Burusho; Assyrian2, Han
Burusho; Dai, Armenians
Burusho; Han, Lezgians
Burusho; Karitiana, North African

Iranian1; MA1, North African
Iranian1; Indian Jatt, North African
Iranian1; Karitiana, North African
Iranian1; Haryana Jatt, North African
Iranian1; Punjabi Jatt, North African
Iranian1; MA1, Samaritians
Iranian1; Yoruba, British
Iranian1; North African, Pulliyar

Iranian2; MA1, North African
Iranian2; South Indian Christian, Bedouin_B
Iranian2; Karitiana, Bedouin_B
Iranian2; Karitiana, North African
Iranian2; Punjabi Jatt, North African
Iranian2; Karitiana, Samaritians
Iranian2; Assyrian, Papuan
Iranian2; Dusadh, Bedouin_B
Iranian2; Pashtun, Bedouin_B
Iranian2; Bihari Brahmin, Bedouin_B

Georgians; MA1, North African
Georgians; Punjabi Jatt, North African
Georgians; MA1, Samaritians
Georgians; Assyrian, Tamil Brahmin
Georgians; Northern European, Assyrian
Georgians; Northern European, North African
Georgians; Assyrian, MA1
Georgians; Baloch1, Sardinian
Georgians; Assyrian, Punjabi Jatt2
Georgians; Karitiana, North African

Pashtun; MA1, North African
Pashtun; MA1, Samaritians
Pashtun; Karitiana, Samaritians
Pashtun; Karitiana, North African
Pashtun; MA1, Turkish
Pashtun; Karitiana, Sardinian
Pashtun; MA1, Kurdish
Pashtun; Karitiana, Armenians
Pashtun; MA1, Iranian2

All of these match academic results.

Maju said...


"I think the language barrier is posing a problem. This was done on a complete data-set. In other words, this was done on a global data-set, the program choose pairs based on populations across the globe. We didn't force it to choose MA1+Samaritan. There were countless other populations."

I don't think there's any "language barrier" here, unless it's you who has problems with English. Just that this you say now seems very much contradictory with what you said previously:

"This is some very interesting output, courtesy of a friend of mine. The first number is the actual f3 score. The second number is the z-score, which I guess is a quantification of confidence."

You did not mention any "countless populations" tested for nor the results of any other tests, just with Samaritans and Palestinians, which are for sure not the most similar populations to Pashtuns on Earth.

Also the way you phrased it strongly suggests that your friend gave you the data bits and that you don't even understand well how it works (as indicated by your uncertainty about what the z-score means).

Whatever the case, I'd like to see the whole results and not just isolated bits before accepting these claims of yours that sound so outlandish.

Seinundzeit said...


It does seem to be the case that a language barrier is operative at your end, since you don't know what "contradictory" means.

One has a full data-set, and one runs the program on every population in it, together. You don't work via bits and pieces, that defeats the purpose of the exercise. To be somewhat blunt, you don't know what you are talking about. To help you out, read the post above your own, those are the full results.

Maju said...

So are you saying, Sein, that "MA1, North African" is the first scoring combo for all the populations tested for? That's more than a bit unexpected. I remain strongly skeptical.

Seinundzeit said...

Lol, the results are what they are, nothing can be done about that. The North African individual in question is a Moroccan Berber, with 20% Sub-Saharan African admixture. As a result, and due to a lack of ANE admixture for this individual, they act as a good proxy for "ancient Near Eastern" ancestry present in all West Eurasians. All West Eurasian populations scored this individual+MA1 as their top result. But, not everyone scored this individual as their top result. There are more results, but I don't want to bombard you.

For example, here is a South Indian Nasrani Christian's results:

Nasrani; Bonda, Iranian2
Nasrani; Bonda, Samaritians
Nasrani; Kurdish, Bonda
Nasrani; Bonda, Iranian Lur
Nasrani; Bonda, Assyrian1
Nasrani; Kurdish, Paniya
Nasrani; Bonda, Armenians
Nasrani; Kurdish, Pulliyar

No MA1 in this case, and no North African populations/individuals.

Matt said...

@ Sein,
Post wake up in the morning comments: So, like, you let the software pick all populations and MA1+Samaritan is a stronger signal for Pashtun than, say, Georgian+Onge?

That's surprising given how well Pathans fit as a mixture of ANI and ASI -

Re: all the stuff you've seen, sure, I've sort of browsed some threads on forums on this subject, and I appreciate the enthusiasm and technical know how, not that impressed with the results that I feel they give firm % estimates.

To clarify, I wouldn't say that ANE contribution to South Asians is not present at at all, just we need to formally account for Onge related ENA from South Asians (which is present even in groups in Pakistan and Afghanistan) using the tree methodology from Lazaridis, before we can talk about the distance of various South Asian populations from MA-1. I'm sure Nick Patterson or David Reich or Priya Moorjani have something in the works.

For a trivial toy example, say you have an edge of around 30% from Karitiana and 70% from Samaritan to Pathans. Karitiana are around 55% ENA and 45% ANE, so our assumption would be that this would model a contribution of around 13.5% ANE (40% x 30%) and 16.5% ENA (ASI) to Pathans.

Samaritans are probably around 10% ANE, so add 7% ANE to this for a total 20.5% ANE.

Such an ENA fraction would also be totally consistent with Moorjani 2013, and the level of ANE would be totally consistent with the remaining Ancestral North Indian fraction not being wildly divergent from the North-Central Caucasus.

One reason to think direct ANE cannot be that much higher than in North Caucasus populations is that if a Northern South Asian population had high West Eurasian ancestry *and* almost as high ANE as Karitiana (44%), these would be better proxies for ANE than Karitiana. Which we already know can't be true from the existing modelling and relatedness.

This is all why complex tree models are necessary, it won't be possible to get estimates for contributions from such ancient and subsequently admixed populations without them.

And there is still the question mark over whether this ANE is actually representing a North Eurasia resident population that migrated or a distantly related population to La Brana and MA-1 that lived in West Asia and we don't have a good proxy for yet...

Davidski said...

My bet is around 30% of ANE for populations in and around the Hindu Kush, with just over 30% for some. I reckon that's visible on these PCA, where ASI influence doesn't make the Pathans more ANE-like, but less. Otherwise, the Pathans are a fraction closer to MA-1 than the Lezgins, which means they must have around 30% of ANE.

Matt said...

Hmmm, whether ASI influence makes Pathans more MA-1 like, hard to eyeball, but depends on the relative weight of the PCs. Eigenvector 2 (separating MA-1 from West Eurasians) should be smaller than eigenvector 1 (separating Indians from Northeast Europeans), so moving the Pathan population along the Pathan-Indian cline should decrease overall distance from MA-1 until the combination is parallel on eigenvector 2, after which it will begin to increase again (faster due to positive contributions from distance on both dimensions). Your bet might well come in, or not, I can't call it.

That PCA does look like Pathans don't really have any more MA-1 tilt than a straightforward Indian+Northcaucasus population, but still hard to resolve if the Indian population is Onge(relative)+Northcaucasus.

Maju said...


It is possible I guess that here Ma1 represents "non-africanized" early West Eurasians and, as you say, the North African sample represents the "africanized" ones that may in the end be affecting also other areas like Afghanistan in the Neolithic (or even earlier, as seen in Bra1). I really can't say much more without the whole results, sorry, because what was compared and what results were produced (and also the variance of the scores) is way too relevant to understand the meaning.

Your friend should publish his/her results, even if informally.

About Time said...

Waiting for Farmana and the new Reich Volga paper. Would be cool to see some TreeMix experiments to give these ideas some experimental play in the meantime.

Davidski said...

Yeah, I think it's time for some formal mixture tests. I'll try and set up TreeMix this week and then see if I can finally estimate formally the level of MA-related ancestry among the Kalash and Pashtuns.

Seinundzeit said...


Certainly, the best fit for the Pashtuns involves MA1 or Karitiana. The z-scores are best as well in such cases. Pashtuns are the only population in the region that always get MA1 or Karitiana in the scores. Other groups in the north (after having their most significant f3 score involving MA1) tend to be best fitted as European+Adivasi (Jatts), or West Asian+Adivasi (Sindhis). Balochis are intermediate between Pashtuns and northwestern South Asians. I always put the range at 30% to 45%, but 30% is too low, and 45% is too high. Rather, if I would bet on a precise estimate, I'd say 35% ANE admixture for populations in this region. Since one can model the Pashtuns as mixtures between MA1 and Chechens or Lezgians, I don't quite understand how it can be said that they aren't at least 30% ANE. Whatever the actual estimate, it is going to be higher than whatever Lezgians are at, since you can formally model Pashtuns as having more ANE admixture than Lezgians.


I suppose that's possible.


Awesome! This is going to be fun.

Tesmos said...

David when can you estimate the MA-related and WGH-related ancestry for the people that did the Eurogenes Genetic Ancestry Detective‏ test?

Matt said...

@ Sein - "Since one can model the Pashtuns as mixtures between MA1 and Chechens or Lezgians, I don't quite understand how it can be said that they aren't at least 30% ANE."

That's an odd result as it doesn't really seem to fit well with David's PCA.

On David's West and South Eurasian PCA plus MA-1, the Pathans look like they *could* be

a) on a cline away from Chechens towards India, not MA-1 (they might be a little off cline in an MA-1 direction, but not very much).

b) on a cline from the Bedouin to MA-1, though this seems a bit worse of a fit.

See -

A North Caucasus to MA-1 cline (mixture) doesn't fit at all.

I'd think the North Caucasus -India mix would be a better fit than Bedouin-MA-1 - essentially 1) as there *actually* are populations in our world that fill out a North Caucasus -India cline and 2) all the populations so far we know of that could have mediated ANE to a Bedouin like population are somewhere on a La Brana - MA-1 cline -, but it is certainly difficult to rule either out.

(It's a similar story on the PC without Indian populations - Pathans look more like Chechens plus generic Onge-like ENA more than MA-1 mixed Bedouin, which they're more off cline for, but it's still hard to distinguish).

Of course, I know PCA aren't formal tests, and I'll await the treemix results with interest.

About Time said...

@Matt, if ANI-ASI models simulate allele frequencies for ANI and ASI meta populations, it must be possible to generate "zombie" ANI and ASI genotypes.

Major question is how those relate to ANE etc in context of TreeMix, shared drift, etc.

In absence of Farmana (not to mention Jiroft, Maykop, Tel Samarra, etc) genomes, zombie ANI and ASI should tell us a lot.

Also, nobody has looked at whether MA1 carried more Neanderthal (or different types of Neanderthal) than La Brana, Gokhem, Stuttgart, etc.

Seinundzeit said...


"Of course, I know PCA aren't formal tests...".

I guess you pretty much settled any possible concerns you had, just by acknowledging this fact. It's fine and dandy to hypothesize concerning possible fits. Still, the fact remains that we have to deal with actual fits, combinations that the software does find in reality. And "Pashtun; MA1, Chechen" is one of them.

It is going to be a lot of fun to see David work with TreeMix, I certainly share your feelings on this.

Davidski said...


I haven't figured it out yet, but I' sure I will. When I do I'll send out an update.

I'm going through all the e-mails now. I had about 50, and it was just too daunting to answer them all.