- Karasuk outlier RISE497 (the most eastern Karasuk individual) is surprisingly important for Near Eastern populations - Nordic LNBA and Sintashta look very similar in terms of overall ancestry proportions, suggesting that they perhaps derive from the same ancestral population - The effects of postmortem deaminantion or DNA damage appear to be expressed in many of the non-UDG treated ancient samples as minor Sub-Saharan admixtureCan anyone put together a better model for West Eurasians? Also, I'd really like to see a well thought out D-stats/nMonte analysis of South Central Asia. See also... Yamnaya = Khvalynsk + extra CHG + maybe something else D-stats/nMonte open thread #2
search this blog
Friday, March 11, 2016
D-stats/nMonte open thread
I'll start the ball rolling with a 9-way mixture analysis of 93 European, Near Eastern and Central Asian present-day and ancient populations. The relevant datasheet and R script are available here and here.
Below is a simple tree/cluster analysis based on the results, using the freely available Past3 software. Makes perfect sense, I'd say.
It's important to understand that these sorts of tests are basically designed to estimate ancient ancestry proportions, rather than calculate minor admixtures. With that in mind, here are a few observations:
Subscribe to:
Post Comments (Atom)
228 comments:
1 – 200 of 228 Newer› Newest»Wow, very cool.
Some questions I have:
What is the proxy for ASI-like ancestry here in your opinion?
How come Azeris and Iranians score so much in SSA components?
What's up with Atayal scores among Spaniards, Ashkenazis and other similarly Western Eurasian folks?
What the Karasuk is capturing here?
Yes, very nice.
The question of SSA is interesting. Are modern Near Easterners really more basal than Anatolia Neolithic or is it just that they have SSA admixture? Is SSA here a substitute for more Basal Eurasian? Lebanese or Iraqi_Jew get a very good fit based on Anatolia Neolithic by adding a good amount of SSA.
Also it's a good question about Atayal in Southern Europe. West Sicilian 4.35%? It's like the Dai I was seeing in Spanish_Extremadura.
For South Central Asians we probably need an outgroup that makes a better distinction between East Asian and ASI. Maybe Papuan? But anyway we also need an ANE sample that is better than MA1. Selkup works good, but it has a lot of East Asian.
In Reich lab's modeling EHG works just as well for Native Americans as MA-1, if not better, so wouldn't that do for S-Asians?
The dendrogram looks true to life. To follow Alberto, I'd strongly suspect that the Middle East shifted toward SSA and Red Sea with the expansion of Afro-Asiatic in the Near Eastern EBA.
Does anyone know of any pre-Sumerian Iraqi remains being tested and from which culture?
@Shaikorth
Yes, EHG works, but for some reason (too high WHG affinity?) the Kalash preferred Selkup (see the models I posted at the end of the previous thread). I haven't been able to test much yet, though, so we'll see with other populations.
(Well, I was going to respond to the previous thread, but then I saw this post!)
I ran some middle easterners / southern euros using a 6way mix of Anatolia_Neolithic, Yamnaya_Samara, Caucasus_HG, Mozabite, Loschbour_WHG, and Dai, distances < .005.
Ideally an some sort of ancient middle easterner would replace CHG + Mozabite.
https://www.dropbox.com/s/3jpvlrf3uw0yc85/nMonte_6mix_1.csv?dl=0
For whatever it's worth, Dravidian_India fits best as Ust_Ishim + Dai + CHG + EHG. And it fits better as Ust_Ishim + Ust_Ishim + CHG + EHG than it does as Dai + Dai + CHG + EHG (based on what I tried in 4mix).
In addition, South Central Asians fit better as CHG + Karelia_HG + Anatolia_Neolithic + Ust_Ishim than they do as CHG + Karelia_HG + Anatolia_Neolithic + Dai.
This also goes for West Asians, Caucasians, and southeastern Europeans.
I think Ust_Ishim is acting as a proxy for a third Eurasian branch which contributed a lot of ancestry to South Asia, and varying amounts to Central Asia, West Asia, the Caucasus, and southeastern Europe.
And if it isn't a third Eurasian branch distinct from both ENA and K14/ANE/EHG/WHG, it could perhaps be a very divergent West Eurasian meta-population that contributed a lot of ancestry to South Asia, and varying amounts to Central Asia, West Asia, the Caucasus, and southeastern Europe.
Either those two possibilities, or Iranians are 17% Dai, which makes no sense (I tried to model Iranians using Dai, and that's the percentage I got, while Pashtuns turned out 25%, both much too high).
Here are some of the models in question:
Dravidian_India
[1] Target = 46% Ust_Ishim + 24% Dai + 23% Caucasus_HG + 6.99999999999999% Karelia_HG @ D = 0.0067
[1] Target = 49% Ust_Ishim + 21% Ust_Ishim + 15% Caucasus_HG + 15% Karelia_HG @ D = 0.0310
[1] Target = 16% Dai + 34% Dai + 50% Caucasus_HG + 0% Karelia_HG @ D = 0.0453
Pashtun
[1] Target = 35% Ust_Ishim + 29% Caucasus_HG + 22% Karelia_HG + 14% Anatolia_Neolithic @ D = 0.0146
[1] Target = 43% Caucasus_HG + 27% Ust_Ishim + 22% Karelia_HG + 8% Dai @ D = 0.0158
[1] Target = 40% Caucasus_HG + 26% Anatolia_Neolithic + 25% Dai + 9% Karelia_HG @ D = 0.0216
Iranian
[1] Target = 38% Anatolia_Neolithic + 28% Caucasus_HG + 27% Ust_Ishim + 7% Karelia_HG @ D = 0.0042 (this works pretty well for Iranians)
[1] Target = 45% Anatolia_Neolithic + 38% Caucasus_HG + 17% Dai + 0% Karelia_HG @ D = 0.0229
An interesting detail, if you look at the Pashtun model with 35% Ust_Ishim, it matches TreeMix!
TreeMix tends to have Pashtuns as 65% LN/EBA European + 35% of something which is intermediate between West Eurasia and East Eurasia. And here, Ust_Ishim seems to fit that 35% rather well (as it does for West Asia and the Balkans), while the West Eurasian ancestry proportions resemble Bronze Age steppe populations.
I think this explains why TreeMix mistakenly had South Central Asians at 60%-70% Sintashta/Andronovo, even though Pashtuns are actually around 30%-35%, the Kalash are around 35%-40%, and the Pamiri peoples are around 55%-60%.
@ Alberto, yeah, in these stats, some extra African is not real distinguishable from Basal Eurasian. The relatedness to Yoruba only varies by 0.0043 from maximum to minimun and only by 0.0026 when some ancient outliers are discounted. There's a slight trend there in relatedness to Yoruba (Yamnaya types less, recent Mediterranean and Middle East more), but almost flat.
Esan's D-stat sharing with Yoruba is barely much higher than the Eurasians, and the Masai's sample is non-significantly different, so a proportion of ancestry from them pretty much behaves like a Basal Eurasian would be expected to (reduces relatedness to Eurasians, while leaving relatedness to Yoruba flattish) in its major effects.
Re: Atayal, might be some consequence of a lack of free varying EHG proportion. It's clearly getting placed in those populations because, without it, their relatedness to East Asia is too low than what the rest of their ancestry finds. EHG is more related to East Asia than Loschbour is (as CHG is compared to Anatolia_Neolithic), so might have something to do with the method groping fro that ancestry.
@ Sein, perhaps it could just be that the Ust-Ishim+Dai proportion together is teaming up to model something which is on the ENA branch, but a relatively early split that's not exactly East Asian? Rather than a third split that then got resorped by into an East Asian and West Eurasian branch offs. On a related note I also found with the extra stats David gave with BedouinB as a column and row, that South Asia tends to prefer BedouinB+Dai+CHG+EHG to Anatolia_Neolithic+Dai+CHG+EHG.
http://i.imgur.com/GZSpSH4.png
I don't understand how you got good fits for West Asians using Caucasus_HG and Anatolia_Neolithic. In every other attempt I've seen the fits are bad.
@Shaikorth
"In Reich lab's modeling EHG works just as well for Native Americans as MA-1, if not better, so wouldn't that do for S-Asians?"
I have a test with MA1, WHG, EEF, EEF, EHG outgroups. Native Americans come out 37% MA1 and 0% EHG. EHG's affinity to WHG and EEF outgroups is too high to be an ancestor of Native Americans.
When compared straight up to EHG and MA1, Native Americans don't prefer either, but that's because D-stats do a bad job at noticing little differences in relatedness. Instead of doing that type of test, it is better to have outgroups and see how Native Americans behave as compared to MA1 and EHG. They behave much more similar to MA1.
Siberians on the other hand prefer EHG. Most Come out as East Asian+EHG+Andronovo. Mansi in particular who in a recent study were said to be 50%+ MA1, trace their Western blood mostly to Andornovo-types.
The East Asian side of Siberians and Native Americans is also differnt. Siberians are not the brothers of Native Americans who stayed in Asia(not saying you claimed they were). Instead they're a mixture of various Eastern and Western people who settled Siberia 1,000s of years after Native American's ancestors already arrived in America.
@Matt
Yes, we'll need ancient samples with high Basal Eurasian and no SSA to really know what's going on in modern West Asians.
I tested West Sicilians now with the set that has Karelia_HG (and Dai instead of Atayal) as a test pop and it didn't make a difference:
Italian_WestSicilian
"Anatolia_Neolithic" 61.9
"Caucasus_HG" 16.1
"Karelia_HG" 10.1
"Loschbour_WHG" 4.7
"Dai" 4.5
"Esan_Nigeria" 2.7
distance = 0.002159
To check where it comes from, I added Nganasan:
Italian_WestSicilian
"Anatolia_Neolithic" 61.9
"Caucasus_HG" 16.1
"Karelia_HG" 9.9
"Loschbour_WHG" 4.8
"Dai" 4
"Esan_Nigeria" 2.7
"Nganasan" 0.6
distance = 0.002164
Not much change. Then adding Dravidian_India:
Italian_WestSicilian
"Anatolia_Neolithic" 61
"Caucasus_HG" 14.6
"Karelia_HG" 8.9
"Dravidian_India" 7.2
"Loschbour_WHG" 4.6
"Esan_Nigeria" 2.1
"Dai" 1.6
"Nganasan" 0
distance = 0.002254
No improvement in the score so it's not really a better fit, but it does take Dravidian which seems to confirm that Dravidian (ASI?) ancestry made its way into West Asia and Southern Europe. Though we'd need a ASI rich outgroup to test this with more certainty.
Matt,
That's certainly a possibility.
Regardless, I'm really interested as to how this sort of ancestry spread in the area between the Balkans and South Asia. Southeastern Europeans are around 5%-10% (other Europeans are pretty much 0%), Caucasians are around 15%-20%, West Asians (Turks and Iranians) are around 25%-30%, and South Central Asians are around 35%-40%. The geographical spread of this sort of ancestry makes me doubt it being just a more divergent ENA. Rather, I think it could be a confluence of a more divergent ENA and Basal Eurasian, or perhaps something quite similar to Ust_Ishim (who is an actual Basal Eurasian, not whatever is involved with CHG and EEF). Since we lack the aDNA, it's all speculative.
I guess the only solid thing we can say is that there is an element that is somewhat distinct from West Eurasia (K14/ANE/EHG/WHG), and which connects populations from South Asia, Central Asia, West Asia, the Caucasus, and the Balkans.
I'd like to look into trying some BedouinB-based fits.
For whatever it's worth, Pashtuns as Andronovo + CHG + Anatolia_Neolithic + Ust_Ishim:
[1] Target = 18% Caucasus_HG + 0% Anatolia_Neolithic + 33% Ust_Ishim + 49% Andronovo_full @ D = 0.0145
As always, it's a better fit than one with Dai:
[1] Target = 35% Caucasus_HG + 17% Anatolia_Neolithic + 24% Dai + 24% Andronovo_full @ D = 0.0213
Krefter, the successful admixturegraph models that included WHG and farmers as well as EHG and MA-1 fit Karitiana's ANE as coming from groups equally related to MA-1 and EHG. This despite EHG's greater relatedness to EEF and WHG. So both fit as long as you go by their models (with Onge-ENA, Basal Eurasian assumed to exist and whatnot).
It might get more complicated than that though, especially in case of South Asians.
This spreadsheet has some results with Europeans modeled as (EEF+WHG mixture in their region 5,000 years ago)+(Steppe admixed people in Europe 5,000-4,000 years ago)+(Modern Middle East).
https://docs.google.com/spreadsheets/d/1FR-fMCZ52nEXdVFl_qquFzN750QerjoB6QrjY2pbeZg/edit#gid=1358112912
MN5=5% WHG, 95% Anatolia_Neolithic.
MN10=10% WHG
MN15=15% WHG
MN20=20% WHG
The results look realistic to me. Migrations of admixed EEF/WHG/Steppe people to South Europe had about as much of an impact as did Steppe migrations in North Europe. Between 2800-1000 BC, every region of Europe was mostly repopulated by Eastern and Middle Eastern migration!
Thanks, Dave! Two questions/ notes:
1. Is it possible to replace Masai by Mota to eliminate possible later admix (yDNA T, Malay expansion, Arab trade with E. Afr.)?
2. Fixing the first column would enhance readability of the table.
I deem it useful to add some notes on the Nigerian Esan, for those who want to put their shares in context:
The W, Nigerian Esan are generally linked to the medieval Benin Empire and the IA Nok Culture. Also known as Ishan, they may etymologically be connected to the Ghanaian Ashanti. Some linguists place Esan within Kwa languages that also includes Akan (Ashanti), but that association isn't universally accepted.
Early Portuguese reports describe the W. Nigerian coast as well agriculturally developed. The wider area is a/o credited with domestication of the oil palm (earliest evidence from Ghana, by 2800 BC appearing in the S. Nigerian pollen record), the Kolanut, and Sorghum (L. Chad region). Not too far away, the first evidence of W. African banana cultivation (a SEA import prior to from 500 BC) has been found. At least since 800 AD, Esanland has evidence of walled, proto-urban settlements of up to 3km diameter. This pre-dates possible Arab influence, the Benin Kingdom also wasn't islamised.
The IA Nok culture from which Esans are believed to have originated is famous for its terracottas (quite unusual for Africa, where wood carving prevails). Sculptures reveal signs of social stratification, and document horse-riding. Unlike the ass, the (wild) horse has never been native to Africa.
https://en.wikipedia.org/wiki/Nok_culture
From ca. 500 BC onwards, the Nok culture shows well developed iron melting and forging. Inriguingly, this date corresponds to Hanno the Navigator's journey along the West African coast. Carthaghinian goods were paid in gold, possibly mined in the Gold Coast (Ashantiland).
https://en.wikipedia.org/wiki/Hanno_the_Navigator
The NOK culture apparently evolved from the Jos Plateau, a major area; many Nok terracottas were actually found in allovial tin mines. It is yet unknown how far tin mining on the Jos plateau reaches back in time. Bronze-making, technologically well developed (lost-wax mould casting) is only from 1000 AD onwards known in Benin/Nigeria, but there is some IA evidence of tin bead production.
While copper occurs widespread in mineable quantities, tin sources are quite rare, and thus gave rise to the establishment of long-range trade relations during the Bronze Age. Two smaller sources, the Eastern Taurus Mts, and Galicia/ N.Portugal, appear to have covered initial EBA tin demand but quickly became insufficient to satisfy ever growing demand. BMAC propably filled the gap (Bactrian Lapis Lazuli appears in Egypt by 2900 BC), but may have been insufficient to supply the total quantities required.
This puts the Jos Plateau, a relevant tin supplier to Egypt during the 19th century Kano Sultanate, in the focus. Trans-Saharen trade is well evidenced for the Kanem empire (after 300 AD), and previously the Garamantes. The origin of Kanem has been up to a lot of speculation. Some Israelic roots appear to be present, e.g. Bagauda David as first king of Kano, and the local greeting "Shammo" (peace). Some tentative linkage has been made between the early Kanem empire and Assyrians. Kanembu itself, however, is a Nilo-Saharan language, while Chadic (AfrAs) Haussa serves as the region's lingua franca.
In short - the Esan appear to rather be a sink than a source, stemming from the contact zone of Niger-Congo, Nilo-Saharan and Chadic languages, and possibly influenced by maritime contact with Carthaginians, the trasnfer of banana out of SEA to W. Africa, and Trans-Saharan interaction with Lybia, Egypt and beyond.
@George Okromchedlishvili
What is the proxy for ASI-like ancestry here in your opinion?
How come Azeris and Iranians score so much in SSA components?
What's up with Atayal scores among Spaniards, Ashkenazis and other similarly Western Eurasian folks?
What the Karasuk is capturing here?
Really hard to say right now. We still need more ancient DNA, especially from the Near East and Southern Europe.
The Atayal is often a signal of ancestry from South and East Asia, but often it represents something that is missing. The same can be said about the African and Karasuk ancestry proportions, although I think the Karasuk stuff might be linked to Scythian and early Turkic influence.
By the way, in the nMonte script I changed this...
# do 1000 cycles
To this...
# do 15000 cycles
And it did make a difference by tightening up some of the results. Or am I just imagining things?
A few, unsystematic observations:
1. Afanasievo displays some 9% CHG excess over Yamnaya. Considering the structure of what Chinese archeology terms the "Afanasievo package", namely arsenic bronze, wheat and sheep, a predominantly Caucasian rather than EHG root of Afanasievo makes sense.
This also indicates, however, that
(a) Afanasievo and Yamnaya aren't directly related, but rather two expressions of the same phenomenon, namely a CA expansion out of the Caucasus that "collected" EHG ancestry on the way; and
(b) those expanding CA Caucasians picked up less EHG en route to North Central Asia/ Western Mongolia than their relatives travelling up the Volga to (and beyond) Samara, even though uniparental markers signal substantial EHG presence on the Lake Baikal (Lokomotif etc.) already since the 6th mBC. In conclusion, EHG appears to have been far more widespread to the west than to the east of the Urals, and seems to essentially be a EuropHG, i.e WHG/SHG-related phenomenon.
2. I have noted slight traces of SSA (Esan, Masai) ancestry in BB Germany. They are absent from most of the other "ancients" except for Nordic_LNBA and Armenia_BA, so we might be talking a "real" indicator and not just a deamination artefact here.
SSA traces are of course present in the NE, strongest among Bedouins, where Esan and Masai shares tend to roughly assume the same value. In the Mediterranean (Greeks/Maltese/Spanish etc.), Esan shares typically dominate at some 10:1 ratio against Masai. Conversely, the cryptic SSA ancestry in BB Germany is predominantly Masai (9:1 vs. Esan). A similar structure, i.e. Masai far above Easan admix, is found with Kumyks, Abkhasians, Chechen, Turkmen, Yamnaya Kalmykia, Adygei N. Ossetians, Nordic LNBA, as well as, interestingly, Aragonese and Baleares.
I am well aware that we are probably far beyond the level of statistical significance here. Nevertheless, this seems to indicate poat-CWC (no SSA admix) population movement from the Caucasus into Germany BB and Nordic LNBA. Or did that SSA admix arive from Aragon? Not unthinkable either, considering that BB may have had Iberian roots. In any case, the Aragonese "Masai-lean", when most of the Mediterranean and Iberia leans Esan, requires explanation. I'd love to see respective stats for Galicians and North Portuguese - tin and copper mines there could have attracted specific settlers.
LN/BA Ancestors of Europeans
https://docs.google.com/spreadsheets/d/1RNn1pmYeHcSvFJj-5d38c6810KbssAXweZFnzZOwFjI/edit#gid=0
Every inch of land in Europe(except maybe Sardinia) was mostly repopulated after 3000 BC. This includes East Baltic and Finland. Steppe/EEF/WHG admixed populations basically completely repopulated the Northern edges of Europe.
If you think about Northern Europe was repopulated 3 times. First after the Ice age, second in Neolithic, and third in the Bronze age, and fourth in 2016 AD by Middle Easterners....(opps).
Another observation: Here are some of today's populations, ranked according to their Yamnaya share (numbers indicate the absolute rank among current pops included, Non-IE speakers marked bold, * indicates Non-IE during the IA):
1. Estonians 54%
2. Ukrainian East 51%
3. Balorussians 49%
4. Lithuanians 49%
5. Russinas Kargopol 49%
6. Mordovians 48%
7. Icelandic 48%
8. Finns 47%
9. Norwegians 47%
...
12. Hungarian 43%
...
20. Lezgin 31%
21. Chechen 28%
22. Spanish Castilla La Mancha 26%
23. Greek 1 26%
24. Spanish Pais Vasco 25%
25. Albanian 25%
26. Basque French 25%
27. Italian_Tuscan* 25%
28. Spanish_Valencia* 24%
29. Spanish_Andalucia* 24%
30. Basque_Spanish 23%
...
43. N.Ossetian 18%
44. Turkmen 18%
45. Uzbek 18%
46. Greek 2 16%
47. Iranian 16%
48. Azeri_Baku 16%
...
52. E. Sicilian 13%
53. Turkish 11%
54. Cypriot 11%
55. Georgian 10%
...
61. Armenian 5%
...
70. Bedouin B 1%
71. Yemenite Yew 0%
I could run a formal analysis, but visual inspection already makes obvious that there isn't any even remotely statistically significant relation between Yamnaya and IE. Things are obviously, as always, more complicated. Whoever comes with "elite dominance" is invited to demonstrate this on the Estonian case (cf. Estonian CWC aDNA); Basques vs. Galicians, Ossetians, Iranians, and, of course, E. Sicilians (Siculi) also make up for interesting studying objects.
@Krefter: Interesting approach, though it could do with a bit of fine-tuning (e.g. basing the Italian scenario on the Remedello aDNA). As concerns Northern Europe, I question your 3000 BC baseline. I think 35% HG (with SHG probably the best proxy) by 3000 BC is far more realistic for Northern Germany and Poland, given the uniparental (mtDNA) information available for 3500-3000 BC. Scandinavia is probably better approximated by Pitted Ware than Skoglund's FB data that, for all archeology tells us, comes from a short-lived, aborted Michelsberg colonisation attempt.
I think one will need to add the MBA-LBA transition, i.e. the 1200 BC "bronze age crisis", which had major effects on Italy (Urnfield-derived Proto-Villanova replacing Terramare), and is also clearly visible when contrasting Hungary_CA to Hungary_BA, to the list of repopulation events.
2016 is nuts. If at all, its 2015, but we are talking of some 1.5% fresh immigrants compared to the existing population in Germany, a bit more in Sweden and Malta, 1% in Austria, 0.6% in Norway and Finland, 0.3-0.4% in Benelux amd Denmark, around 0.05% in Poland and the UK, 0.01-0,02% in the Baltics, Czech Republic and the Balkans. France may get above their current 0.1% in 2016 (they have a great way to assume responsibility as former mandatory in Syria, almost at par with the UK ss concerns Iraq).
Aside from the NE (Syria/Iraq), the immigration also comes from Central Asia (lots of Afghans, quite some Pakistani) and the Horn of Africa (Eritreans, who prefer Scandinavia).
Otherwise, there is a lot of truth in your remarks.
@FrankN
Afanasievo and Yamnaya aren't directly related, but rather two expressions of the same phenomenon, namely a CA expansion out of the Caucasus that "collected" EHG ancestry on the way.
So Caucasian women migrated out into the steppe, picked up EHG husbands, and expanded with them across Eurasia.
Very funny Frank.
@Dave: Caucasian metalworking women in Afanasievo? Or have I been misreading your analysis?
Frank,
You need to calm down. The extra ~9% of CHG in Afanasievo doesn't mean it's not closely related to Yamanya Samara, just as the extra ~6% CHG in Yamnaya Kalmykia doesn't mean it's not closely related to Yamnaya Samara.
These are very closely related populations showing very little variation, and what little variation they do show is simply caused by occupying or coming from somewhat different parts of the Pontic-Caspian steppe.
And as for that claim of Sub-Saharan admixture in Bell Beakers. Pft.
FrankN,
1.5% per annum immigration in a mass population nation such as Germany, which has exhibited negative birth rates for at least two generations, is a very big deal.
Someone, I forget who, computed the figures and came to the conclusion that entire age cohorts of younger generations in Germany will be dominated by the newcomers.
@Frank,
Funnel Beaker Germany and Sweden are about 20% WHG according to these D-stats. I find it unlikely many had 35% WHG. East Baltic was hunter gatherer till Corded Ware times, and is why East Baltic today has excess WHG, that Funnel Beaker can't explain.
I've modeled Lithuanians as SHG+MN(20% WHG)+Yamnaya, and the Yamnaya score doesn't go down very much. They have some but not a lot of Baltic Hunter gatherer ancestry. My guess would be 10-15%. Most of their EEF/WHG ancestry should be from outside of the Baltic region. So, we're certainly looking at massive population replacement in East Baltic after 3000 BC.
@Frank,
Ancient mtDNA shows that populations in Western Siberia around 3000 BC were a mixture of EHG and East Asians. Most of their mtDNA was typical for modern Siberians(who have lots of EHG mtDNA to). So, Afansevo doesn't appear to have local Siberian ancestry. It makes much more sense they formed in Russia with Corded Ware and Yamnaya.
@Davidski,
"So Caucasian women migrated out into the steppe, picked up EHG husbands, and expanded with them across Eurasia"
Yamnaya had a big chunk of EHG mtDNA. While, in Latin America, it's almost impossible to find Spanish mtDNA. So, the admixture was at least not as sex-biased as in Latin America.
Yamnaya had a big chunk of EHG mtDNA. While, in Latin America, it's almost impossible to find Spanish mtDNA. So, the admixture was at least not as sex-biased as in Latin America.
Wrong comparison, because the steppe wasn't colonized by Caucasians.
The appearance of Caucasian genome-wide DNA and mtDNA on the steppe is more comparable to European settlers in North America taking Indian wives.
This happened when the EHG became pastoralists and thus also more mobile, but apparently continued their hunter-gatherer tradition of taking women from nearby groups, probably to avoid inbreeding.
@Dave: There is a difference between "closely related" and "directly related". When it comes to Afanasievo vs. Yamnaya, I have little doubt on the former, but your data givs reason to question the latter.
Afanasievo clearly wasn't a "EHG men pick CHG women" phenomenon, but a migration out of Caucasia. The fact that it included making of arsenic bronze, and also the results from sheep DNA analysis that I have linked previously, point at a relatively quick movement, which is unlikely to have commenced much before 4000 BC, yet reached the Altai by around 3500 BC.
The origin of this expansion is actually somewhat obscure, as it predates Kura-Araxes, which is believed to only have reached the North Caucasus by 3000 BC. The preceding Shulaveri-Shomu culture was still obsidian-based, with little signs of metal processing. Dnieper-Donets is also out of question, as it apparently lacked any of the components of the "Afanasievo package", i.e. arsenic bronze, wheat, and sheep.
Yamnaya has evidence of arsenic bronze and sheep (not sure about wheat, though), so it was apparently influenced by that same Cuacasian migration. Whether directly, i.e. including Caucasian metalworkers, or just indirectly (the CHG women taking a bit of arsenic bronze made by their fathers/ brothers with them when marrying EHG men), I leave to your judgement.
In any case, your figures show that in that admixing process of CHG and EHG, more EHG was picked up west, than east of the Urals.
Frank
How would you explain the CHG in CWC ?
It doesn't appear to have been part of the sheep/ arsenic bronze package. In fact, (although the state of research isn't too developed), I think CWC retained the older pure copper smithing ?
It is impossible for all CHG in the Steps to be from women, unless 100% of Yamnaya mtDNA was CHG. Lots was from CHG man/EHG woman, because so much of their mtDNA was still EHG.
Your theory has to take into account for CHG men moving into EHG tribes further North. Or maybe women always go to the man's tribe? I don't know. There's lots of differnt circumstances that could have lead CHG and EHG to mix.
The Copper age Samara genomes are the best evidence that CHG tribes didn't migrate into the Steppe, but instead CHG individuals did. Because, instead of seeing pure-breed immigrants straight from Anatolia like we do with LBK or Steppe like we do with Corded Ware, we're seeing a small amount of CHG and differnt amounts in differnt individuals. Unless CHG men were looking for work in a factory, it's more likely women were the ones who moved into new tribes. In that case sex-biased admixture makes sense.
@ Davidsky
ABOUT nMONTE
If I understand you correctly you have just changed a comment line. That cannot change the results of the calculation. I think you have witnessed an imperfect convergence.
nMonte does a Monte Carlo simulation and it has a finite convergence. So if you run the tool twice on the same data, you will get two slightly different results.
The algorithm starts with a batch of randomly sampled items. Next it replaces the batch items with randomly chosen test items.
If the introduction of a test item improves the composition the sample, it is maintained; otherwise it is dropped.
The algorithm starts an evolution in which a random batch morphs in the direction of the targetted population, without ever reaching the limit of complete identity.
The more iterations in the process, the closer you get to this limit; however the price is a longer run time.
So, being a simulation, nMonte introduces some noise in the results.
If the number of iterations is large enough, this method-noise is small compared to other sources of noise.
I did not measure the magnitude of this method-noise, but I am confident that it is acceptable.
However, one should be wary of items that surface only a single time in the batch. These might be just test-items that were not yet weeded out.
With the present parameters of the algorithm these spurious test-items get the minimum value of 0.05% in the output.
Of course this minimum value may also refer to a small but real component. My suggestion: cut them, better safe then sorry.
It is preferable to get as many iterations as you can get, but I think we hit the limit of practical run time.
I have already made some improvements in the performance of the code, I picked most of the low hanging fruit.
A real performance boost seems only possible if somebody ports the code to C (but not me).
By the way, thanks for your post. I enjoyed it. Too bad I cannot put my own Dstats in.
Frank,
Here's a tree based on the raw D-stats that includes Yamnaya Samara. As you can see, Afanasievo, Poltavka and Yamnaya are essentially the same people.
https://drive.google.com/file/d/0B9o3EYTdM8lQMldMR3BxS0s3MUU/view?usp=sharing
Krefter,
The most heavily CHG admixed Yamnaya individuals are the youngest ones from Kalmykia. But even they're not showing any Y-DNA J. In fact, one of them is I2.
So if this process of female endogamy continued, things would eventually get to a point where the Kalmykia steppe groups were 90% CHG with no CHG Y-DNA, but even then you wouldn't have an argument.
huijbregts,
Yep, I was pretty sure it was a comment line, and yet the results changed. That's why I asked, because I thought I was going a bit crazy.
Anyway, it's not possible to put our D-stats in just yet, but might be in the near future.
Re: nMonte, I think that actually since we only use 4-8 source populations even 1000 cycles is too much. I reduced that to 100 (line 14: Ncycles = 100 #cycles of simulation) and I get 99% of the accuracy while it being 10 times faster (the steps go from 0.05% to 0.1%, which is more than good enough and reduces the combinations by 2).
”Here's a tree based on the raw D-stats that includes Yamnaya Samara. As you can see, Afanasievo, Poltavka and Yamnaya are essentially the same people.”
Indeed. This tree also nicely unites IE-speakers, Uralic speakers and Turkic speakers with the exception that Mediterranean IEs are heavily neoliticised, Siberian Uralics Siberianised and Caucausian IEs and Turkics caucasianised and, of course, Indian IEs indianised.
Therefore, I can stick to my idea that Yamnaya represents a kind of a Nostratic language, and even more so considering Frank’s comment that “visual inspection already makes obvious that there isn't any even remotely statistically significant relation between Yamnaya and IE”.
@Krefter: As I have said several times before - FB Sweden isn't representative. It comes from an area without previous Ertebolle presence, namely the strip of land between Vättern and Vännern lakes that was partly flooded during most of the mesolithic before the meltdown of Scandinavic glaciers started to uplift the whole peninsula. The data represents a colonisation attempt by the Michelsberg culture (those that also neolothicised Britain) that was given up again by 3100 BC, when the area returned to HG (Pitted Ware) practice.
We neither have autosomal aDNA for "mainstream" Nordic FB, e.g. the previous Ertebolle area, nor Western FB (Netherlands/ Lower Saxony), nor Polish FB, nor late central FB aka Bernburg Culture. But we have uniparental data from Blätterhöhle (Western FB), Ostorf (Elbe-Havel FB) and the Bernburg Culture. I guess you have them all in your mtDNA files and can do the counting yourself.
A big misunderstanding, which is unfortunately also present in J. Marcos lists on ancestraljourneys.com, is mistaking the Salzmünde Culture for FB-related. It wasn't, Salzmünde was a Baden offspring and as such rather representative of Carpathian Basin than Central German aDNA. Between 3300 and 3100 BC, there existed a highly militarised border, signified by multiple fortifications and "warrior graves", between GAC/Bernburg (FB) on one side and Salzmünde on the other. It ran approximately along a line Erfurt-Potsdam-Szscecin, cutting just between Bernburg and Salzmünde, which are only 30km apart. Around 3100 BC, Salzmünde was violently destroyed (ash horizon) by that GAC/FB coalition, at the same time Salzmünde ceramics disappeared from the record also elsewhere south of the a/m border. What that meant for the people that used such ceramics before is up to everybodys' imagination.
Don't get me wrong - I don't doubt a major change around 3000 BC. Even if we don't have a solid baseline for 3500-3000 BC, CWC carries a number of uniparental markers that were absent before. There is the plague germ found with the Estonian CWC guy, and a lot of indication for depopulation after 3000 BC. Here, for example, some figures taken from the archeological pre-assessment of the planned Fehmarnbelt Tunnel:
FB, southern half of previous Oldenburg/ Holstein county (approx. 400 km²): 283 megalithic graves*), 123 settlement finds, 1560 stray finds
Single Grave (CWC-related), Ostholstein county (approx. 1400 km², including the a/m): 92 finds in total, mostly stray finds (especially "battle axes" recovered after ploughing).
Even when considering that FB covered 1200 and Single Grave only 600 years, the massive decrease in find density becomes obvious. I'd say - they mostly got killed by the plague, and new EHG and CHG settlers moved into empty lands.
*) Rough estimate: 1 megalithic grave per household á 6 persons ~ 1,700 people; on 400 km² ~ 4.25 p/km² during peak times (3500-3300 BC).
Kristiina
I much respect your opinions, but surely you don't think Nostratric actually exists, much less as late as 3000 BC in a (relatively) confined area like Yamnaya ?
@Rob Why not?
I checked that Yamnaya is dated 3,600–2,300 which means that Nostratic would have formed c. 3600 BC which is very much in line with dates proposed for Proto-IE and Proto-Uralic.
According to Häkkinen, Proto-Uralic was spoken still 3000 BC. If I am not mistaken, Davidski has claimed that proto-IE is connected with Repin Culture or at least the origin of R1a1 is there, and Repin Culture is dated 3700-3300 BC. Proto-Turkic is usually considered quite recent and Tungusic and Mongolic may be even more recent. I admit that Dravidian languages are probably more distant and may be connected with the arrival of yDNA J2b to India (3000-4000BC?).
By the way, at the moment my idea is that Yamnaya Samara is connected with proto-Turkic, considering that R1b-M73 is typical of Turkic speaking peoples, including Teleuts, Bashkirs, Kumandines, Balkars, Kazakhs Uzbeks; and Uralic speaking Selkup Samoyeds.
Kristiina
* "Why not ?
I checked that Yamnaya is dated 3,600–2,300 which means that Nostratic would have formed c. 3600 BC which is very much in line with dates proposed for Proto-IE and Proto-Uralic.
"
Id doubt that for a couple of reasons, although I could be wrong for doing so.
I certainly would agree that FU being spoken still in 3000 BC, but this wouldn't put the split of FU and IE at 3200 BC (the actual commencement of Yamnaya, not 3600 BC). That's far too late. That's when the earliest PIE - FU loans / secondary contact could have occurred.
* " By the way, at the moment my idea is that Yamnaya Samara is connected with proto-Turkic, considering that R1b-M73"
But there isn't any M73 in the Yamnaya samples.
And the origins of Turkic are clearly toward Mongolia, and in the Iron Age not the Caspian steppe Eneolithic.
Kristiina,
Not sure if you're being sarcastic or not, but in any case, I think that when looking at the data via a PCA like this, where the dots are Indo-European and presumed Indo-European speaking populations, it's pretty easy to see that modern and ancient genetics gels rather nicely with historical linguistics, which puts Proto-Indo-European on the western steppe.
https://drive.google.com/file/d/0B9o3EYTdM8lQYkwtREtZSXhRc1k/view?usp=sharing
As for Yamnaya being proto-Turkic, well, that's just wrong both in terms of time and space.
Actually, I don't think Samara and Kalmykia Yamnaya were Proto-Indo-European, so we're in agreement there. Considering their Y-chromosomes, I'd say they were something like Proto-Anatolian, while other Indo-Europeans came from Sredny Stog, late Khvalynsk and/or Repin, via Corded Ware, Western Yamnaya, Sintashta etc.
@ FrankN
I don't think your list of Yamnaya admixture proportions invalidates the steppe theory in any notable way. Because there is no 1:1 correspondence between language and DNA. Languages can be learned and this by itself doesn't entail DNA change. But we can expect to see at least a trace of the DNA of the proto-people in all peoples speaking languages derived from theirs. And this new speadsheet is indeed satisfying this desideratum, because now even Armenians have a little bit of Yamnaya DNA. I think the biggest factor that diminished the shares of the proto-people's DNA is the fact that we can expect an original DNA impulse that gradually got weaker while it spread, because the PIE mixed with the substrate people they met, but gave them their language, and then these mixed people spread the language further, and mixed again, and so on, so that in the end there was little ancestry from the original PIE left. The same thing happened to the Germanics in their expansion, and with the Slavs, the Italics, the Turks, and with many other people as well, it's completely natural.
And BTW the Spanish Pais_Vasco are not specifically Basques, AFAIK, rather Spanish speakers in the Basque country.
@ Simon W
"The same thing happened to the Germanics in their expansion, and with the Slavs, the Italics, the Turks, and with many other people as well, it's completely natural."
I really doubt that you can lump them all those different examples together into an inverse snow-ball phenomenon. They had very different genetic impacts.
@Davidski I am happy that we are not in any disagreement.
@Rob in I do not know where you get 3200 for Yamnaya. According to Wikipedia Yamnaya is dated 3600 and 2300 BC. In Haak paper, Yamnaya Samara samples are dated between 3300 and 2600 BCE. To my knowledge, Samara hunter gatherer (c. 5600 BC) is a kind of proto R1b-M73. I would see that Proto-Turkic or Proto-Proto-Turkic formed from the contact between Samara hunter gatherer culture and Yamnaya Culture speaking a Proto-Anatolian language, as David proposes.
Bronze Age Altai seems to be yDNA Q, and Yeniseian speaking Kets are the closest population to ancient South Siberians such as Karasuk, which means that Yeniseian languages are the best candidate for a language family spoken in South Siberia. As for Mongolia, I agree that Turkic languages arrived to Mongolia during the Iron Age and that is when Mongolic and Tungusic languages formed. Before that, it looks like people who brought the Han Chinese language to China came from north of North China c. 1000 BC (see ”Ancient DNA Evidence Reveals that the Y Chromosome Haplogroup Q1a1 Admixed into the Han Chinese 3,000 Years Ago”). It is significant that Sino-Tibetan languages are closer to Yeniseian languages than to Turkic languages, so I do not see why a language that is structurally quite close to Uralic and IE languages, such as Turkic languages, would have been spoken in an area from where Sinitic languages arrived to China considering that Sinitic languages are completely different (they have tones, a monosillabic structure and they lack all cases and all coniugation).
As for the IE question, it is highly interesting that in the tree David posted Sintashta is aligned with Nordic LNBA! Sintashta is dated only 2100–1800 BC. R1a1-Z-93 was typical of Sintashta and Sintashta carry c. 25% Anatolian Neolithic. Indo-Iranian R1a1-Z-93 has also been identified in Poltavka outlier who was autosomally more western than previous inhabitants. Moreover, Indo-Iranian languages are relatively close to Balto-Slavic languages and are not at all among the most divergent IE languages such as Hittite or Armenian. Xiaohe R1a1 is dated c. 1500 BC. That leaves only Tocharian in the East but Tocharian texts are from the 6th to the 8th century AD, so there are not so many indications of a strong presence of IE groups in Central Asia and Altai before Sintashta 2100–1800 BC.
Kristiina
Sure 3300 BC/ 3200 BC are the earliest Yamnaya, but most of it falls 2900-2600 BC,
which is a far cry from 3600 BC.
i do obviously agree with you about the similarities of Balto- Slavic and IA .
Rob,
According to you , is Afanasevo contemporary of Yamnaya or Older?.
EHG and CHG are both older than Afanasievo and Yamnaya. In fact, they're ancestral to Afansievo and Yamnaya.
My question is related to archaeological suggestion :) ..
Nirj
There's been some debate
But I think they're more or less contemporary
How strong the case of ''Repin origin'' looks?. I think there is also no consensus on the validity of it?.
Repin will look like Khvalynsk overall, depending on what is deemed to be "Repin", and which date the sample is from,
but which specific Y DNA haplogroup mix it'll have is a guess at this stage
I think some Yamnaya groups; such as those currently sampled from Kalmykia and the Don might be from Repin (otherwise from the Kuban), but I'm not sure about Afansievo being from Repin
Thanks for the suggestions :).
I think also Botai, an important piece of the puzzle is a must to sample. Perhaps Pinhasi already has them..
This Samara Eneolithic dude with Y-DNA Q who got whacked over the head and thrown into a ditch might be a lot like the Botai people.
http://eurogenes.blogspot.com.au/2015/11/the-khvalynsk-men.html
Does anyone know when Pinhasi's group will publish next paper?
Nope, no idea.
You should e-mail them and ask when those Mesolithic and Neolithic genomes from northern Iran are coming out.
Right now it's really hard to tell if Yamnaya spoke IE or not. I think there are good chances they did, but they might have spoken anything else too. For having any certainty about the language, we'll probably need to wait for aDNA from around 1500 BC from the locations where we know they spoke IE: Mycenaean Greece, Anatolia and North India/Pakistan. Then we can try to find what's in common between these people and start to go back in time following the thread.
I was looking at he Bronze Age Armenians, which are close in space and time to where we know IE was spoken:
Armenia_BA
"Anatolia_Neolithic" 41.9
"Caucasus_HG" 39.8
"Karelia_HG" 10.7
"Dai" 5.4
"Esan_Nigeria" 2.2
"Loschbour_WHG" 0
"Yamnaya_Samara" 0
distance = 0.005291
Adding Karelia_HG removes the Yamnaya. Difficult to say why exactly, maybe because Yamnaya adds more WHG. And by the high Dai, it looks like these guys came from the east. Adding a more "pure" ANE sample (less WHG affinity, though too mixed with East Asian):
Armenia_BA
"Anatolia_Neolithic" 43
"Caucasus_HG" 37.2
"Okunevo" 13.7
"Karelia_HG" 4
"Esan_Nigeria" 2.1
"Dai" 0
"Loschbour_WHG" 0
"Yamnaya_Samara" 0
distance = 0.004594
It's just a model, but it does look like they have an eastern origin rather than any Yamnaya admixture.
It does look like they have an eastern origin rather than any Yamnaya admixture.
Certainly not.
They have admixture from the European steppe, probably from a population similar to Yamnaya Kalmykia. It shows in their genome-wide and Y-chromosome dna.
Certainly? Nothing is certain here. Including the Y-DNA origin.
But yes, Yamnaya Kalmikya does take some admixture:
Armenia_BA
"Anatolia_Neolithic" 41.3
"Caucasus_HG" 34.2
"Okunevo" 12.7
"Yamnaya_Kalmykia" 9.6
"Esan_Nigeria" 2
"Dai" 0.2
"Karelia_HG" 0
"Loschbour_WHG" 0
distance = 0.004465
But this still leaves the high ENA and even the SSA to be explained. I see those easier coming through Iran than from the steppe. Let's see what the next important aDNA paper brings. It should be really interesting for these matters.
Regarding the possible ASI in the Mediterranean, here Cypriot and Greek with the "basic" 6 populations, showing high Dai admixture:
Cypriot
"Anatolia_Neolithic" 63.7
"Caucasus_HG" 21
"Karelia_HG" 6.6
"Dai" 5.4
"Esan_Nigeria" 2.8
"Loschbour_WHG" 0.5
distance = 0.001534
Greek2
"Anatolia_Neolithic" 59.4
"Caucasus_HG" 19.8
"Karelia_HG" 9.3
"Loschbour_WHG" 6
"Dai" 4.4
"Esan_Nigeria" 1.1
distance = 0.001566
Adding Dravidian_India and Kalash:
Cypriot
"Anatolia_Neolithic" 60.7
"Caucasus_HG" 15.8
"Kalash" 14.7
"Karelia_HG" 3.1
"Dai" 2.8
"Esan_Nigeria" 2.3
"Loschbour_WHG" 0.6
"Dravidian_India" 0
distance = 0.001121
Greek2
"Anatolia_Neolithic" 55.9
"Kalash" 16.4
"Caucasus_HG" 13.7
"Loschbour_WHG" 6.2
"Karelia_HG" 5.3
"Dai" 1.1
"Dravidian_India" 0.9
"Esan_Nigeria" 0.5
distance = 0.001294
They both take pretty high Kalash and improve the model. I haven't tried alternatives yet, but it's rather intriguing.
@ Davidski, until now, I didn't spot this post of yours in the other thread:
"Try this sheet with Karelia_HG as a test pop and Samara_HG as an outgroup. And try Hungary_HG instead of Loschbour. You should see more sensible results.
https://drive.google.com/file/d/0B9o3EYTdM8lQSWdTREYxWHJhaDQ/view?usp=sharing"
Thanks for that.
I combined it with the BedouinB stats you gave as will in the other comment thread and ran 4mix models.
You're right that the Hungary_HG sample gives higher WHG, which must be because it doesn't have quite as tight a relatedness to Iberia, relative to Samara, Motala, LBK, etc., as Loschbour does. So substructure in the HGs might affect the outcome, and probably WHG admixture came from all over different parts of Europe.
Using the average of Loschbour and Hungary_HG as WHG, I got this:
http://i.imgur.com/b7gWRiT.png
One of the interesting things I find about it, and I don't know if this is also true with nMonte fits, is that Yamnaya and other steppe groups seem to like a little Anatolia_Neolithic ancestry, although only 16% or so. If correct, that would maybe help make more sense, with archaeology, as I'm aware some people thought from the archaeology that there was some contribution from the Southeast Europe Neolithic to the Yamnaya (although it looks like not a lot of it).
Btw, what do these nMonte Atayal numbers in Europe look like with BedouinB used as a column, or a Palestinian or the like? Esp. with BedouinB as a column in the underlying datasheet as well.
Re: Ust Ishim - I find it interesting that he was equally related to WHG, Han, Karitiana and Papua, but more distant to modern Europeans. I assume this I'd because of basal Eurasian admixture in Europe? Or is that rather because of African admixture in Europe?
Alberto,
thank you for this interesting post. It is fascinating that Armenian_BA shows such a high affinity to Okunevo. Okunev culture seems to be indispensable when it comes to understanding the Asian IE phenomenon. It is in this culture that we first see much of the socio-religious symbolism that later re-appear among Indo-European speaking populations of South Asia.
Lyudmila A. Sokolova even found early traces of a caste system in the 'Sayan archeological complex' (the merger of Okunevo and Afanasyevo) that, once it was unified, would migrate south. In this complex it was the Okunevo people who wielded socio-religious power and appear to have constituted a type of early priestly caste.
If Okunevo stringently showed a stronger signal across South Asia than the people of the Western Steppe this would be quite significant. It would at least in part explain why the early cultures of Asian Indo-Europeans are so dissimilar from those found in West Eurasia. Perhaps even some of the curious linguistic affinity of IE to non-Uralic North Eurasian languages (Chukotko-Kamchatkan, ...) could be accounted for (though I suspect that the spread of the Indo-European language was, in the Asian case, a purely utilitarian phenomenon of rather negligible significance).
@FrankN
"1. Afanasievo displays some 9% CHG excess over Yamnaya. Considering the structure of what Chinese archeology terms the "Afanasievo package", namely arsenic bronze, wheat and sheep, a predominantly Caucasian rather than EHG root of Afanasievo makes sense."
Or they expanded from the west picking up extra CHG along the way.
Arsenic bronze is the result of a particular copper alloy so if copper workers don't live in a region with that alloy they can't make it whereas if copper workers move into a region which does have that alloy then they can eventually become bronze workers.
#
"I could run a formal analysis, but visual inspection already makes obvious that there isn't any even remotely statistically significant relation between Yamnaya and IE."
Really? Seems to me there's a pretty solid correlation between small populations in mountainous regions not having IE languages - apart from Hungarian which arrived *much* later and could in fact be a counter example vis a vis the relative ease of mounted elites changing the language on flatland.
The main exception to that seems to be Estonian - how swampy did Estonia used to be as that would hinder horsemen also.
#
"Whoever comes with "elite dominance""
Elite dominance in the context of a warrior elite is a double-edged sword e.g. Magyars.
Conquest followed by a long peace might lead to a lot of elite genes - especially male - flowing down into the substrate.
Conquest followed by endless wars with a high enough casualty rate and promotion from the ranks and the elite could get replaced by the substrate - possibly even mostly male substrate and widowed elite.
The historically attested cases show different outcomes depending on circumstances.
#
"If at all, its 2015, but we are talking of some 1.5% fresh immigrants compared to the existing population in Germany"
Off-topic but this is a fundamentally dishonest argument because it ignores the age distribution. They are a far higher percentage of the breeding age population - so it's replacement level immigration.
Why do some Spaniards cluster with Northern Europeans while others cluster with Southern Europeans? What mechanism determines the initial 'split'? Shouldn't the Spaniards be clustering together?
@Matt
Yes, with nMonte the models for Yamnaya are the same:
Yamnaya_Samara
"Karelia_HG" 52.4
"Caucasus_HG" 30.5
"Anatolia_Neolithic" 15.1
"Loschbour_WHG" 2
"Dai" 0
"Esan_Nigeria" 0
distance = 0.00216
Alberto,
I've already explained the SSA. The Eastern Asian will be explained by non-Anatolian/Caucasus ancient DNA from the Near East.
Arch,
Well, the split has to be somewhere. But in these types of trees the long vertical line doesn't indicate high genetic differentiation.
@ Alberto, thanks. I'm having a play with nMonte now, using a calculator file with Anatolia, BedouinB, CHG, Dai, Esan, Karelia, Masai, Ulchi, Ust Ishim, WHG and Yamnaya_Samara, and datasheets with the columns as BedouinB, CHG, Han, Iberia_Chalcolithic, Iberia_Mesolithic, Karitiana, LBK, Motala, Samara and Yoruba.
I haven't tested many populations as the single population file output (unless I'm using it wrongly) takes some time to deal with.
Yeah, I did find that Anatolia_Neolithic fraction in Afanasievo with nMonte as well (47.5% Karelia, 35% CHG, 14.25% Anatolia_Neolithic). Interesting to know if this is modelling a real Anatolia_Neolithic contribution to the EBA steppe populations, before the contributions we already knew about in Sintashta, Srubnaya et al. Like I say, that might help them make more sense in light of archaeology showing some link from the European Early Neolithic to Yamnaya via the Ukraine.
Another thing I'm finding is that with both Yamnaya_Samara as a calculator pop together with Karelia_HG and CHG, there is a definite preference in modern Europeans to prefer using Yamnaya, to Karelia+CHG+AN combinations (with basically zero of Karelia and CHG once Yamnaya is fitted), while the ancient steppe tends to prefer Karelia+CHG combinations to being made up by Yamnaya. This seems like it is probably because variation from the CHG+EHG proportions from Yamnaya is low in modern Europe, so the simpler model for the software to prefer Yamnaya. While in the steppe, it is "worthwhile" for the software to use CHG+EHG combinations to get it right (to decrease distance). I might find the same with using MN Europeans, I'll have to try.
An exception to that pattern is Hungary_BA though, which seems to want to have a different ratio of EHG:CHG than Yamnaya, so needs an extra dash of Karelia_HG.
I tend to agree with Davidski around what you saw that looks like an ENA pattern in some populations btw - it seems like some populations in the nMonte's I've tried seem to have a preference for BedouinB and ENA combinations, which seems pretty likely to be approximating some non-Anatolian/Caucasus ancient Near East dna.
Haven't done much with South Asia. I found Pathan fitted ("distance% = 0.1279 %") as Caucasus_HG - 28.1, Ulchi - 18.05, BedouinB - 16.65, Karelia_HG - 16.1, Anatolia_Neolithic - 12.25, Ust_Ishim- 5.05, Yamnaya_Samara - 2, Esan_Nigeria - 1.3 (no Dai).
While Dravidian_India fitted ("distance% = 0.01 %") as Ulchi - 34, Caucasus_HG - 19.8, BedouinB - 12.5, Dai - 6.25, Karelia_HG - 5.6, Ust_Ishim - 5.5, Anatolia_Neolithic - 4.35, Yamnaya_Samara - 4.25, Masai_Kinyawa - 3.3, Esan_Nigeria - 2.8, WHG_Average - 1.65. So really a mash up, since it's not too close to any of the column stats in particular.
I suspect Ulchi is preferred over Dai in the above, simply because Ulchi is less close to Han (as the only ENA with Karitiana in the stats I'm using), even though its closer to Karitiana, and Ulchi doesn't itself have much to do with South Asia.
On another note, I found nMonte seemed to make some strange choices when given too many closely related populations in the calculator (e.g. both Loschbour_WHG, Hungary_HG and the average of both, etc.). Maybe that's because when it finds it hard to choose between populations as improving fit, its evolutionary method (randomly changing population contributions, then checking if fit improves) gets dominated by random factors and has trouble mutating towards a good fit. (Or it could just be I made some errors in the files or something.)
@Matt
If you think that the evolutionary process may get dominated by random factors, repeat the run a few times.
'Domination by random factors' is by definition not consistent.
@ huijbregts, good point. I tried it again with the closely related populations in the calculator and go no such problems with the target output. I think it was rather errors in how I was putting together the target file when first using the script, and a fair number of similar populations is fine.
@Matt, Davidski
I'm not sure I get your point about ENA and the Near East. Do you mean that ENA is native to the Near East? I don't think you mean that. Rather that it arrived to the Near East around the LN/BA. And that's my point too. It arrived then, from the east, with people high in CHG and ANE (i.e, like Bronze Age Armenians).
Or do you mean that Dai people arrived to Transcaucasia at that time?
@Grey: Non-IE speakers aren't (werenn't) only found in mountaineous regions - Tuscany (Etrurians), e.g., has quite a lot of plains, and the highest French elevation along the Bay of Biscay (sic!) seems to be the Dune of Arcachon. Moreover, since those mountaineous regions have obviously not been immune to Yamnaya-related immigration, you need more than just that topography to explain why some of them (Alps, Carpathians, Balkans) became IEed, and others (Caucasus, Pyrenaians) didn't (at least not completely). For the quite mountainous Iran you need a compelling story to explain Iranian languages, which Yamnaya admix doesn't really provide.
Aside from Estonians and Hungarians (the latter are of course a special case), we also have Finns and Mordovians in the top 10 ranks. For Estonia we have CWC aDNA, which seems to demonstrate that the major population turnover ocurred by 3000 BC, with in general population continuity afterwards. So, if those CWC Estonians didn't speak some kind of proto-Uralic, you need a convincing scenario of Uralic elite dominance to explain how Yamnaya/CWC promoted PIE was replaced there, and in Mordovia. Reindeer-mounted warriors roaming the pine and birch forests?
Grey, Rob - re Arsenic Bronze
The fundamental logic seems to be as follows: During the 5th and early 4th mBC, the main prestige object was Jadeite axes, which are widespread across all of W. Europe (including Mesolithic Britain and Denmark), the Adriatic and even Bulgaria, with the largest secondary concentration found in Morbihan, Brittany (see map in the link below). Monte Viso in Piemont has been identified as the raw material source. Worsening climate blocked access to the quarry, which lies above 2000 m elevation, and from ca. 3500 BC onwards Jadeite axe finds become estremely scarce.
http://www.academia.edu/2047917/From_Mont_Viso_to_Slovakia_the_two_axheads_of_Alpine_Jade_from_Golianovo._Acta_Archaeologiaca_Academiae_Scientiarum_Hungaricae_62_2011_243-268
North Italian finds suggest that early copper metalurgists attempted to emulate Jadeite by copper; casting moulds closely imitated the traditional Jadeite axe shape. For that imitation to work, copper needed to be as pure as possible in order to quickly turn green by oxidation, and there is N. Italian/ Tirolean evidence for sysematic copper purification/ refining attempts. Copper axes appearing in the Nordic FB record by around 3500 BC have been made by such refined, jadeite-greenish oxidisng copper.
Hence, the rarity of arsenic bronze in Central Europe appears to have primarily stemmed from aesthetic considerations. The axes were anyway never meant as tools, signified by lacking indication of usage, and often also impractical, i.e. improperly balanced mounting. As the Jadeite axes before, they served purely ritual purposes.
Caucasian and Levantine early copper processing apparently followed a different logic. Here, several alloys, including arsenic, but also antimonal, zinc and nickel bronzes, were deliberately produced and combined for their different colours (CA antimony mining is evidenced from the Racha region in W. Georgia). An overview is in the following paper, more indepth analysis (mostly in German) is available from the Bergbaumuseum Bochum.
https://www.britishmuseum.org/pdf/6a%20Append%20text-opt-sec.pdf
Thus, one may conclude that early copper processing, also during the CW period (which, however, is generally rare on copper objects, they seem to completely lack from Scandinavia) followed the established Central European traditions, without signs of adapting the quite different Caucasian pattern/ fashion of multi-coloured bronze mixes.
Lack of Arsenic isn't an issue here - Unetice burials from the Nitra region in Slowakia show substantial signs of Arsenic intoxication in the "middle class" graves that, from bone stress analysis and for typical grave goods such as armshields, are interpreted as belonging to metalworkers.
So, if those CWC Estonians didn't speak some kind of proto-Uralic, you need a convincing scenario of Uralic elite dominance to explain how Yamnaya/CWC promoted PIE was replaced there, and in Mordovia. Reindeer-mounted warriors roaming the pine and birch forests?
Frank, again, very funny. Thanks for the comedy champagne.
But of course, we all know you're just messing around, because it would be utterly idiotic to suggest that CWC Estonians were proto-Uralics.
That's because the CWC Estonian genome we have is like the other CWC genomes, which come from places where no Uralic was ever spoken.
Moreover, modern Estonians are basically like Indo-European Northern Europeans who's ancestors never spoke Uralic.
Also, almost all Uralic speakers show strong signals of recent East Eurasian admixture, like the Baltic Finns here with 10% Karasuk admixture. This is very unusual for Northern Europe.
So obviously the natural conclusion is that Uralic speech arrived relatively recently near the Baltic and was adopted by people in large part of CWC origin. Of course, as you know, something very similar happened on the Hungarian Plain within recorded history.
@ Frank
Thanks. From what Ive read, there appears to be no exclusivity of type of Copper used. For BB culture, where several forms appear to have been employed- such as Arsenic-Copper, Tin ad Silver Copper, but also Carpatho-Balkan 'pure' type). Lead and tin additions seem to have featured in German BB and CWC groups.
@Davidski,
Can you post Anatolia_Neolithic's ANE K8, K15, and K13 results? I'm comparing nMonte results from your D-stats and ANE K8, and they're very similar.
How do you change the number of cycles in nMonte?
@Dave: "Modern Estonians are basically like Indo-European Northern Europeans who's ancestors never spoke Uralic"
I think we agree that it is fairly difficult to genetically distinguish Balts, and NE Europeans in general (Finns, Karelians, Belorussians, Northern Russians, Modovians, etc.). They all form some sort of genetic continuum. Yet, there are those linguistic splits (Baltic, Slavic, Uralic) - but the language family that would have been most likely to have exercised influence, namely Turkic (Tartars, Golden Horde) is missing west of Kazan. Explaining those splits is not trivial. I am not in the position to bring forward a qualified theory on, e.g., the emergence of the Estonian language (maybe Kristina is?), but its pretty clear to me that CWC/Yamnaya genetics alone aren't even remotely able to shed light on the evolution of the NE European linguistic landscape.
In fact, all I have seen so far, including Karelia_HG, is more supportive of EHG having spoken a kind of Proto-Uralic than PIE. If one looks at those "Northern Europeans who's ancestors never spoke Uralic", reindeer hunting (Ahrensburg Culture) comes to mind, with evidence of similar practices still in medieval middle Norway. There is a Northern European group that has preserved the reindeer tradition, albeit switched from hunting to pastoralism, namely the Uralic-speaking Sami.
There is furthermore quite a number of Uralic-Germanic linguistic parallels. Take, e.g., "to live", possibly among the most "basal" terms imaginable. Quite close to Proto-Uralic *el(o), definitely closer than Lat. "vita", "vivere", OGr. "zoe", Latv. "dziwe", Polish "zyc" etc. The relation of the "elk" or "Elen" (a OHG variant) to that Proto-Uralic root (in the sense of "being, animal") is also pretty obvious. "to hop" vs. finn. "hyppia" is another such isogloss, though here I am not sure whether the Finnish term isn't a Germanic borrowing. French aller "to go", with unclear etymology and highly irregular flexions (je vais, j'irai etc.) "smells" pre-IE substrate, and finds its closest parallel in Finnisch ajaa "to drive, run, ride"
IOW - when trying to approximate the language spoken by SHG/EHG (also WHG?), my best bet is something in-between Proto-Uralic and the pre-IE substrate in Germanic. The latter comprises terms such as "hand", "bone", "link", "sea", "bow", "eel", "knight", "thing/think", "drink", "leap", "bride", "sick", "little", "evil" that all lack plausible IE etymologies. [Kristina - is there anything in the list above that has Uralian cognates?].
@Rob: Do you have more information on CWC/ German BB metalurgy. My general impression has been that metal finds get pretty rare during that period, to only take up again with Unetice after around 2200 BC.
Silver Copper seems like a waste of precious silver, so I'd assume it having been created by accident. The first major silver mine also having producied copper duing the BA that comes to mind is the Rammelsberg (Western Harz, UNESCO World Heritage), where so far mining is evidenced from the LBA onwards. Your Silver Copper might indicate a much earlier start of mining there.
Tin points towads the Ore Mountains ("Erzgebirge") along the Bohemian-Saxony border, Europe's dominating tin supplier during the Medieval, but with little evidence of prehistoric tin exploitation.
Lead isn't that diagnostic, as it frequently accompanies copper deposits. We might, e.g., talking Mitterberg near Salzburg here, but also the Sauerland (SW Westfalia), a major lead mining area in Roman times with scanty indication of LBA copper mining.
In any case, information on what has been found where could help to shed some light on the initial stages of Central European mining before it took off during the BA. [For Unetice, initially stretching from the Slovakian Copper district around Nitra via the Ore Mountains to the Harz, the bronze-making economic base is fairly obvious].
In fact, all I have seen so far, including Karelia_HG, is more supportive of EHG having spoken a kind of Proto-Uralic than PIE.
Nonsense.
Karelia_HG did't speak Proto-Ualic or PIE. Both language groups arrived in Karelia well after the Mesolithic.
@ Frank
Ive looked briefly , but nothing definitive (which is surprising)
I've found a couple:
- "The Metallurgy of Pastoral Societies in the light of Copper Processing in the northern Pontic steppe." by Klochko. Mentions CWC in Ukraine
- Beaker Metallurgy and the Emergence of Fahlore-copper USe in Central Europe. Merkl (from where I got my comments; and it even mentions Pfyn & Mondsee !)
- In His book, Chernykh mentions briefly about CWC - stating essentially he sees CWC metal as a continuation of old Carpatho-Balkan traditions (ie not new Caucasian stuff) but this treatment his not definitive for CWC, and now probably a little dated.
It seems BB and CWC did not have any monopoly to metal ores or a type of craft which might contribute toward their 'success'
@ Frank
About your debate with Davidski about language in NE Europe.
We should remember that language borders are not static. So lets assume that the CWC R1a spoke PIE (although the reality is we dont know what they spoke because later IE langauges of northern Europe owe nothing to CWC), or maybe some para-IE language; and lets also assume N1c is related to FU, and para-Uralic languages. Further, it might very well be the case that IE weren;t always the 'winners' and expanders, that is to say, these very early IE groups arriving in the East Baltic might have been absorbed into expanding FU groups, in which case IE expanded back in somewhat later (such as the late BRonze age- with balto-Slavic, and later still with Germanic, as well as other hypothetical IE languages like Ventic. Still, Finnic might have again expanded south, as seen by the expansion of long barrows in early Middle Age Russia.
Its all very dynamic.
See the very good article by kallio 'The language contact situation in prehistoric Northeastern Europe'
But really, as I have maintained & Alberto also has, we need aDNA from Greece, northern India and BA Anatolia, so we are starting with fact, rather than assumptions. It might all fall nicely into place, more or less, within the steppe hypothesis. Or the genetics might turn out to be a little more blurry, so then we'd need more nuanced socio-linguistic models
I did a run on Bell_Beaker_Germany:
Bell_Beaker_Germany
distance 0.00194
"Yamnaya_Kalmykia" 37.75
"Hungary_EN" 26.2
"Germany_MN" 15.9
"Yamnaya_Samara" 12.25
"Loschbour_WHG" 7.9
"Anatolia_Neolithic" 0
"Caucasus_HG" 0
"Iberia_EN" 0
"Iberia_MN" 0
I also did a run with the datasheet of
https://drive.google.com/file/d/0B9o3EYTdM8lQSWdTREYxWHJhaDQ/view?usp=sharing
Which contains a few different pops:
Bell_Beaker_Germany
distance = 0.000637
"Yamnaya_Kalmykia" 43.6
"Germany_MN" 16.5
"Hungary_EN" 13.4
"Hungary_HG" 12
"Anatolia_Neolithic" 11.4
"Karelia_HG" 1.85
"Caucasus_HG" 1.25
"Iberia_EN" 0
"Iberia_MN" 0
"Loschbour_WHG" 0
"Yamnaya_Samara" 0
@Rob, Krefter etc.
From a linguistic point of view it looks like proto-Finnic (the ancestor of Estonian, South Estonian, Finnish, Vepsian etc) was somewhere in Estonia or Pskov-Novgorod region (having arrived at some point during the Bronze Age) and expanded after 500 BC. Before the recent expansion of Finnic, Saami was spoken in Karelia and Southern Finland, and in the latter perhaps some kind of Germanic langage was also spoken given the lexical strata situation. This should account for the genetic difference between Estonians and more northern Finnic speakers, and it's most likely that the Proto-Finnic population itself was substantially different from Proto-Uralic (though ancient DNA is needed to see whether they resemble modern Estonians the most). Saami in itself was preceded at least in northern Fennoscandia by non-Uralic cultures of eastern origin like Ymyakhtakh which includes the Bolshoy Oleni Ostrov site.
Papers to read (google search will find them):
"On Germanic-Saami contacts and Saami prehistory"
"The Language Contact Situation in Prehistoric Northeastern Europe"
Krefter,
Anatolia_Neolithic_I0709
K8
ANE 0
South_Eurasian 0
ENF 78.25
East_Eurasian 0
WHG 21.74
Oceanian 0
Pygmy 0.01
Sub-Saharan 0
K13
North_Atlantic 8.64
Baltic 0
West_Med 46.18
West_Asian 0
East_Med 40.77
Red_Sea 4.42
South_Asian 0
East_Asian 0
Siberian 0
Amerindian 0
Oceanian 0
Northeast_African 0
Sub-Saharan 0
K15
North_Sea 0
Atlantic 16.78
Baltic 0
Eastern_Euro 0
West_Med 41
West_Asian 0
East_Med 36.23
Red_Sea 5.98
South_Asian 0
Southeast_Asian 0
Siberian 0
Amerindian 0
Oceanian 0
Northeast_African 0
Sub-Saharan 0
@Krefter
Yes, ANE K8 did a very good job at isolating ANE, and this method confirms those results to a good extent.
But I think the best achievement is that it is a better version of qpAdm that overcomes its limitations (trying to reproduce dubious models with statistically perfect results is no longer possible here), and that's quite important when you think that qpAdm was written by the Harvard University guys (I think?) to be used in their publications about ancient DNA. In this sense, we really have to give credit to all the people who helped to come out with the idea (IIRC, Matt first used 4mix with Dstats to produce models, then Davidski used it systematically and provided all the stats, and whoever wrote 4mix and huijbregts who wrote nMonte to allow more populations, maybe others that I forget or don't know about. Kudos too all of them, I hope to see this method used in upcoming papers!).
Alberto: I'm not sure I get your point about ENA and the Near East. Do you mean that ENA is native to the Near East? I don't think you mean that. Rather that it arrived to the Near East around the LN/BA. And that's my point too. It arrived then, from the east, with people high in CHG and ANE (i.e, like Bronze Age Armenians).
My thinking is that what I mean is that there's probably some ancestry in the ancient Near East that has a similar relatedness to East Asian as Anatolia_Neolithic or CHG (who do have different relatedness to East Asian, mind) but are less related to LBK_EN or CHG_2.
So the nMonte does not evolve towards giving them East Asian and evolves towards just giving some low level East Asian slice to compensate, but this will disappear in time, with sufficient ancient Near Eastern dna. This data is limited by the outgroups we have, and I think I would be skeptical of reading some low proportions literally because of that, in areas like the Middle East or South Asia where there is limited / no adna.
@Everyone,
I plugged in a proxy Near Eastern ancestor for Cyrpiot called ENF1 into my spreadsheet. ENF1 is as close to EEF and WHG/EHG as CHG is but is as distant from CHG as EEF is.
I then modeled West Asians as ENF1+CHG+Anatolia Neolithic+Exotic(Yamnaya, East Asia, South Asia, Africa).
In 4mix this model works perfectly for all West Asians. CHG is pretty much completely absent out side of Northern West Asia.
Interestingly, Anatolia_Neolithic vs CHG/ENF1 ratio was highest the further south you go(Bedouin, Arabia, Yeman Jew score the highest).
I'm still working with West Asian D-stat results. ENF1 is the first clue to the non-EEF/CHG ancestors of West Asians. Unless, West Asian's non-CHG ancestors were more distant from CHG than EEF is, I don't see how there can be significant CHG ancestry outside of Northern West Asia.
EDIT: Modern Europeans don't show more affinity in this test to CHG than Neolithic. SO, I guess it's possible there's significant CHG in most West Asians, but not significant enough to make a big impact on D-stats.
@Matt
I see. It could be something like that, but then again it could be real. The lack of ancient DNA from most of Asia is an obvious limitation, so these models for them are still quite speculative.
Trying to model CHG without CHG, does produce these strange results:
Caucasus_HG
"Anatolia_Neolithic" 67.1
"Karelia_HG" 18.1
"Dai" 10.8
"Esan_Nigeria" 4
"Loschbour_WHG" 0
distance = 0.079475
But the model is pretty horrible. So I don't see a strong reason to speculate about a possible greater relatedness to ENA of other parts of the Near East when we have ANE arriving at that time, which provides a good explanation for ENA arriving with it.
Could be anything, though. Only ancient DNA can really tell us.
@Krefter
How do you change the number of cycles in nMonte?
You have to find the line at the beginning of the script that says:
Ncycles = 1000
And there you can change the number (I think that 100 is good enough and makes it much faster, but if you want the best accuracy keep 1000, or even try to increase it if you don't mind waiting longer for the results).
@Alberto,
That line doesn't exist in my script. By script you mean R-script, right?
No, the script you have to edit is nMonte.R, that should be in the same folder where you keep the D-stats and the source and target files.
@Matt
Yes, ideally I guess that we want the best references (ancient, unadmixed) in the columns, but then we also want those references as rows to be able to model other populations based on them. Having LBK_EN in the columns is what might be causing the ENA in the Near Eastern populations?
The good thing is that this method can only improve as we get more references, and you can keep adding columns for better accuracy. So it really is a great find. Again, well done too all those who came up with the idea and scripts/data to execute it.
@Alberto: Yes, ideally I guess that we want the best references (ancient, unadmixed) in the columns, but then we also want those references as rows to be able to model other populations based on them. Having LBK_EN in the columns is what might be causing the ENA in the Near Eastern populations?
If there are a number of samples (Anatolia_Neolithic), populations can be split to be a row and column. David did this with BedouinB on the datasheet with that.
I think if I get your drift there might be some effect whereby there is extra close relatedness between the ancient rows and ancient columns that's lacking then that might skew things, possibly.
I'm a bit less confident in the "unsampled West Eurasian ancestry" idea today though after making a few dummy test samples (using regression equations on the ancients to find an "unaffiliated" set) to see what they do when used as calculator pops in nMonte, and that didn't really change matters (but the dummies could've not been quite right). Possibly there really is some low level ENA improving fits and it's just been hidden in the general pattern so far, and getting absorped into the modern ADMIXTURE components. Time will tell.
For now to try and resolve the ENA issue, it might work to add on a few extra Oceanian adn Siberian ENA groups as columns, as if there is any disjoint in relatedness to them, that might push nMonte away from selecting ENA ancestry.
Couple other things might be interesting - http://imgur.com/a/4oYko. Among ancient West Eurasians, relatedness to Han correlates really well with relatedness to Samara, less well with relatedness to La Brana, and a PCA of just ancients rows with D-stats (not proportions).
nMONTE
Alberto has suggested to drastically reduce the number of iterations by resetting the default value of Ncycles.
He reports a 10 times faster runtime, while maintaining 99% of the accuracy.
A few comments:
1. nMonte is designed as a general purpose tool. I have granted ample iterations, so as to be safe in all circumstances.
It is encouraging that Alberto could reduce the number of iterations with only a minor loss of accuracy.
However, Alberto has only tried a few specific examples. It is quite possible that other examples are far more vulnerable.
So check by rerunning your data.
2. Hacking a source file is defying Murphy's law.
It is better to leave the source file intact and enter an optional parameter.
So, I have updated the version of nMonte.R in my dropbox
https://www.dropbox.com/sh/1iaggxyc2alafow/AACIjLtnkuaNNsJ5oKME_3XHa?dl=0
You can now use getMonte(datafile, targetfile, Ncycles=300)
but getMonte(datafile,targetfile) is still valid.
Also in the new version the output starts with printing the value of Ncycles
3. Do you know the other optional parameter to save the output to a file?
getMonte(datafile, targetfile, save='myFile.txt')
Davinski
You keep saying things that imply that you believe language is genetically coded. I am trying to resist the implication, because you could not possibly believe in something this silly.
So please tell me, when you say that certain groups 3000 years or more time ago cannot possible spoke this or that language, you say that on what base? Just because their material culture and general genetic makeup is a likely carrier for another language group? If there is anything known history teaches us that groups with close genetics and material culture can speak totally different languages. There is no reason to believe this did not happen in the Bronze Age or the Neolithic.
Slumbery,
Just as surely as my nick is not Davinski, I never said nor implied that language was genetically coded.
But I do think it's often possible to work out whether specific language groups were spoken in particular places at certain times, and genetics can be used to help make such judgements.
random thought on ENA in Yamnaya/CHG etc
IIRC pottery is supposed to have started in East Asia somewhere
Apparently there was a sedentary pottery using HG culture on the west shore of Black Sea before pottery got to the main proto-farming region so pottery possibly took a steppe or near-steppe route
so east asian potters?
Shaikorth, this is what I proposed on an other forum regarding the origin of Finnish yDNA haplogroups mostly on the basis of age estimates on yFull site:
Comb Ceramic: N1b (whole N1b branch formed 7600 ybp, TMRCA 4100 ybp); and/or R1a (of Karelian HG type); and/or Q (all nearly or completely extinct)
Corded Ware (3200-2300 BC): N1c-L1026 formed 6100 ybp with TMRCA of 4500 ybp; and its sub-branch N1c-VL29 (formed 4100 ybp, TMRCA 3500 ybp); and R1a1-Z280
Kiukainen Culture (farming culture, 2300-1700 BC): I1-L22 (I-CTS2208 formed 4100 ybp, with the TMRCA of 2800 ybp, and I-L287 formed 2800 ybp with the TMRCA of 1900 ybp)
Net Ware (inland Bronze Age culture 1500-500 BC): N1c-Z1935 (formed 3700 ybp, TMRCA 2600 ybp)
Iron Age under the influence of Ananyino culture: N1c-Z1939 (formed 1850 ybp, TMRCA 1300 ybp) and/or N1c-Z1941 ’Karelia’ (formed 1850 ybp, TMRCA 1750 ybp) and/or N1c-CTS4329 ’Savo’ (formed 2100 ybp, TMRCA 2100 ybp)
For the sake of clarity, N1c-Z1939, N1c-Z1941 and N1c-CTS4329 are all under N1c-Z1935. IMO, N1c-Z1935 arose somewhere close to Tver Karelia and did not originate directly from Ananyino culture (as Ananyino N1c is clearly a much earlier branch), but was influenced by it and may have thus adopted Ananyino language.
As for Serteya N1c, N1c-M2126 formed 7200 ybp with TMRCA of 6100 ybp. N1c found in Smolensk in Serteya context 2500 BC should belong to this branch as the cultural horizon of Zhizhitskaya culture of piledwellings goes back to 4–3 mill. cal. BC.
N1c-M2126 has two branches: N1c-M2019 formed 6100 ybp with TMRCA of 3300 ybp and this line is found in Estonia, and N1c-L1026 formed 6100 ybp with TMRCA of 4500 ybp.
For its part, N1c-L1026 has three sub-branches:
N1c-Z1936 formed 4500 ybp, TMRCA 4500 ybp (Finland, Karelia, Western Siberia)
N1c-F4205 formed 4500 ybp, TMRCA 3000 ybp (Central Asia, Northeast Asia)
N1c-CTS10760 formed 4500 ybp, TMRCA 4100 ybp (Baltic Sea area)
You should take note that the Uralic N1c-Z1936 is older than the Turkic-Tungusic N1c-F4205.
Kristiina, Russian scientists have studied M2019. The branches' regional affiliations in West Eurasia (they are older "brothers" of the Yakut clade) are West/Central European and Russian in case of the earliest branch and Southeastern European and Near Eastern for the second one.
http://rjgg.molgen.org/index.php/RJGGRE/article/view/157/183 (table 4)
If there was N1c in Corded Ware, so far no evidence it was anywhere but its northeasternmost reaches. Without sampling even that is not certain.
The N1c branches relevant to the spread of Baltic Finnic languages from Z1935 to various VL29 branches have Y-full MRCA's around 2600 bp, consistent with an expansion of proto-Finnic around that point as recent linguistic studies suggest. This would be out of Estonia/Pskov-Novgorod area. Estonia is undersampled when it comes to next-gen sequencing but Str-wise has the highest N1c haplotype diversity in the East Baltic region, while East Karelia has the lowest ("Migration Waves to the Baltic Sea Region", 2008). Finland and Karelia have an excess of U5b1b-clades that now have peak frequency in Saami. MtDNA D5 was brought by Bolshoy Oleni Ostrov's culture or something earlier (but likely post-Mesolithic since Yuzhny EHG's lack it), and the Viena (northernmost) Karelians have about 10% of that. This is all consistent with the scenario of the previous post and reconciles available DNA with linguistics.
I also certainly doubt "Ananyino" language was adopted by Z1936 (never mind the question of Z1936 ever existing as a sole tribal marker in Ananyino times), there is not even substrate evidence of such.
Shaikorth, to my knowledge, D5 in Finns and Saami is more recent, i.e. the whole European D5 is c. 2.6–3.5 kya.
“Saami haplogroup D5 lineages, with the HVS-I motif 16126-16136-16360 and its derivatives, have been identified only in some northern and eastern European populations (among Karelians, Finns, Estonians, North-Russians, and Komis) and in some Siberian populations but not in Samoyeds.
“It has been shown earlier that D5a mtDNAs, with the specific control region motif 16126-16136-16360, are present at a very low frequency in several populations of northeastern Europe (Saami, Karelians, Finns, Estonians, Komi, Russians of Arkhangelsk and Novgorod regions) as well as in central Asian Tajiks and Siberian Altaians and Mansi. Analysis of complete mtDNA phylogeny indicates that these mtDNAs belong to subhaplogroup named D5a3 defined by the only transition at np 16360 (Figure S2). It is obvious that mitochondrial genomes of Russian, Mansi and FamilyTreeDNA project individual belong to D5a3a branch harboring the entire HVS1 motif. Lineages, belonging to the D5a3a subgroup participated in a more recent European expansion around 2.6–3.5 kya (Figure 3).”
Is’nt Bolshoy Oleni Ostrov's D4 and not D5.
I do not understand why all significant variation in N1c in the Baltic Sea area should only be understood in terms of one language and one time frame. On the other hand, it is obvious that both West and East Finns are autosomally very much Corded Ware and the Finnish language is full of possible Corded Ware roots, so I do not understand why N1c could not be Corded Ware. By comparison, the Scandinavian R1a1 branch, R-Z284 formed 4500 ybp with TMRCA of 4200 ybp, which means that it formed c. 300 years after the start of the Scandinavian Battle Axe culture.
I did not understand what you wanted to say with “I also certainly doubt "Ananyino" language was adopted by Z1936 (never mind the question of Z1936 ever existing as a sole tribal marker in Ananyino times), there is not even substrate evidence of such.”
I wrote what I wrote because my impression is that there is not (much) Z1936 in Volga Ural groups who are scattered close to the centre of Ananyino Culture, but correct me if I am wrong.
With ‘Ananyino language’ I referred to the arrival of a Uralic language to Finland.
Of course you stick to what you think is true and what you want to be true as everybody else here, but, ultimately, we need ancient yDNA and autosomal data from Finland to resolve the question. I readily accept what ancient yDNA tells us.
Frank, yes, the words you listed do not have Uralic cognates. These are the correspondences I could find:
hand: cf. Kabardian ?wÉ™na ‘(human) nail’
bone: cf. Basque oin ‘foot’
little: cf. Greek ligaki ‘a little’ Georgia lek’vi ‘young animal’
knight: -
The following words could be derived from IE roots:
drink: cf. Swedish dricka ‘drink’, Greek trekho ‘to run’, Sanskrit dru ‘to run’, Icelandic renna/randa ‘to run’, Russian reká ‘river’ (in Uralic languages ‘to run’, ‘to drink’ and ‘river’ are connected, e.g. in Finnish juosta, juoda, joki)
bride: cf. Lithuanian martì ‘daughter-in-law’, Latvian marsa ‘sister-in-law’; Latin maritus ‘husband, Greek meiraks ‘young’
Isn’t bow connected with verb biegen?
Several roots exist in Finnic and Saami languages
link: cf. German Schlinge ‘mesh’, ‘noose’, Lithuanian lankas ‘noose’, Finnish lanka ‘thread’, Finnish lenkki ‘mesh’, ‘loop’, Vatja lenga ‘thread’; Kabardian bλa-n ‘to tie, to twine’
sea: cf. Saami saiwa ‘lake’, Gothic saiws ‘lake’, Armenian cov ‘lake’, Finnish saukko ‘otter’
leap: cf. Finnish laputtaa ‘to run away’
sick: cf. Icelandic sjúkur ‘sick’, Estonian haige ‘sick’, Finnish haika ‘odour’, ‘longing’, ‘pain’, Saami suoike ‘currant of air’
evil: cf. Swedish elak ‘evil’, Icelandic illgjarn ‘evil’, Finnish ilkeä (also hilkiä in some dialects) ‘evil’, Finnish ilves ‘lynx’
eel: cf. Finnish iilimato ‘leech’, Welsh ele ‘leech’
For the roots ‘thing’/’think’ I could find a Permic parallel: Udmurt dun, Komi don ‘price’.
Among the Uralic languages to arrive to Finland were Saami from the southeast and Baltic Finnic afterwards from the south, but no Ananyino. I don't get where you got that from to be honest. The linguistic evidence I stick to comes from linguists. The variation of N1c shows two recent founder effects long past CW times (in particular East Finns and Karelians with Z1927 and Balts with M2783) and Estonians with greatest diversity, this we can match with linguistic history. Meanwhile Corded Ware samples so far came out R. If they start finding N1c1 CW it will probably be in Fatyanovo (through assimilation) and it would be useless to generalize that to Corded Ware in general.
BOO was defined only as far down as D* now that I recheck. The distribution still fits arrival with that culture, D5a is much much older than 3500 years as is even D5a3, we're looking at the very specific D5a3a. Z1a was in BOO too. Since the culture is assumed to have come through the Arctic coast, presence in populations like Komis and Mansis is no surprise
D4 clades can be found more to the south, in Ukraine and Russia, occasionally also in Central Europe.
@Rob: Thanks for the metallurgy links. From the Merkl paper I conclude that there hasn't been any common CWC and/or German BB metallurgical tradition. Instead, there was a patchwork of local cultures with little regional outreach.
So far, e.g., not a single copper artefact for 3000-2200 BC has been found in Franconia. Most of Northern Germany also seem to lack metal finds, though several copper axes from Mecklenburg have been attributed to the Single Grave Culture.
Two of the local metallurgical traditions are easily identifiable, namely (i) Slovakia (Nitra) with its arsenic traces, and (ii) the rather pure Austrian copper (Mondsee, Mitterberg, Wörgl/Tirol).
The Bismuth- and silver rich bronzes point towards the Erzgebirge, possibly somewhere between Jachymov and Aue. Grube Clara in the southern Black Forest (LBA copper supplier to Sweden) might also be considered. The find concentration speaks against the Western Harz.
I attach a German-language paper on EBA “Stabdolche” (“halberds”).
https://www.researchgate.net/publication/262116194_Studien_zu_den_europaischen_Stabdolchen
Interesting are the distribution maps. Some types are geographically restricted,, e.g. Type 2 Italy, 3 Andalusia (El Argar ”warrior graves”), 13 Ireland, M1 Western Baltic/ Middle Elbe. Others display surprising patterns::
- T 1: Lazio and N. Ireland
- T 5; Algarve and Scotland
- T 6: SW Baltic, Franconia, N.. Alps, Liguria, S. Spain
- T 8: S. Italy, Hungary, Middle Elbe, Burgundy
- T 9: All over NW Europe (France, Britain, Denmark, Germany, W. Poland Bohemia), concentration in the Elbe-Saale region and the Paris Basin
- T 10: Ireland, Scotland, Denmark, NC Germany, Bohemia, offshoots in Andalusia, C. Italy, Burgundy and Austria
- T 11: Concentration in NW Iberia, plus Brittany, Ireland, Scotland, Paris Basin, Elbe-Saale, Denmark, Bohemia, Hungary
- T 12: almost pan-European (up to Romania), concentration on Ireland, Wales, NW France
- T 14: Iberia, esp. El Argar, Tuscany & Lombardy, offshoots in Brittany, Upper Rhone/ Upper Rhine, W. Poland and Hungary
- T 16: Carpathian Basin, offshoots on the Middle Elbe, Normandy, Brittany, and Catalonia
Interesting here (and the original reason I consulted the study) is that tin bronzes are mostly restricted to two areas, namely (i) Ireland & Scotland, and (ii) Unetice plus the Western Baltic. They are virtually absent from Iberia, France and Italy, which use pure or arsenic copper instead. Pan-European type 10 (see above) is most closely associated with tin bronze. The “original”, earliest type is Type 16, still primarily of (im-)pure copper; its distribution points at a spread out of the Carpathian Basin.
Often, the halberds stem from ritual depositions, so the find context can significantly post-date the manufacture. They may have only been deposited when they became technologically or culturally obsolete. This makes it difficult to infer on innovation flows. Nevertheless, the study proposes that tin bronze spread from Ireland to Unetice.
Impurities and alloy compositions vary widely, and the study is unable (but also doesn’t really try) to trace the metal’s origin, and put together a consistent, trans-European picture of which type was manufactured where from which metal, and how it spread elsewhere. Some possible sources, e.g. Rio Tinto, are however pointed at.
Anyway a picture emerges of quite war-like people (83% of the halberds, were used in fight, some of the buried died violently) who were distributed and interlinked across most of Europe, from Portugal to Lithuania, Skane to Sicily. The phenomenon reached a first peak between 2900-2200 BC in the Western Mediterranean, and a late CE culmination under Unetice, where it disappeared with the yet mysterious Unetice collapse.
Shaikorth, Bolshoy Oleni Ostrov was defined only 223T-362C and that is typical of D4. HVR1 of D5 is more complex. D5a3 from Yuxi Yunnan is 182C 183C 189 223 311 360 362 and from Han-Qinhai 189 223 319 360 362.
So, you think that the Uralic language came to Finland during the Iron Age, c. 600 BC, and all Finnish N1c clades came at the same time, although N1c-VL29 which looks like having an origin in Finland formed 2100 BC and has a TMRCA of 3500 ybp.
The article by M.G. Kosmenko, “The Culture of Bronze Age Net Ware in Karelia” concludes that “Archaeological research over the past few decades has shown that the Net Ware culture (textile ceramics) in the territories to the north of the Volga was completely overlapped by and mixed with the Uralic Ananyino culture during the Early Iron Age. (…) A comparative analysis of the strata of ancient place names in Karelia suggests the conclusion that the earliest 'Volgic' layer of local names for bodies of water most probably corresponds to the Net Ware culture, while the Lapp (Sami) hydronyms correspond to the Ananyino stratum of the Iron Age and the Baltic-Finnish place names to the early medieval culture of the 10th and 11th centuries in southeastern Karelia (Kosmenko 1993).”
You are right that I really think that N1c can pop up in Fatyanovo.
That Russian paper is not against what I propose. Serteya should be N1c-M2126, formed 7200 ybp with TMRCA of 6100 ybp. The next branch divides N1c-M2019 and L1026. So, it could very well be that it is N1c- N1c-M2126 which was present in Fatyanovo. Its sub-branch N1c-M2019 went south and got caught in Turkic groups, because the paper proposes that the paternal origin of any Europeans with a derived allele at M2019 to be an ancient Avar and the common ancestor of European and Yakuts lived 3100 ypb (CI: 2300-3900 years). Moreover, N1c-M2019 is also found in an Estonian as yfull shows it.
I cannot see any problem in that N1c-L1026, formed 6100 ybp with TMRCA of 4500 ybp, would have moved to Finland 3000 BC. It does not exclude that it could have moved to the East as well.
Davidski, your own graph shows that Finns have 10% Karasuk. Karasuk yDNA is not N, but it is Q1a1 and R1a1; Iron Age Russia yDNA was 2xQ1a1, 2xJ2a2 and R1a1. The Ket - Paleoeskimo paper shows that Uralic speaking Mansis with 87% N are on the Western Eurasian side of East West divide while yDNA Q carrying Selkups and Kets are on the Eastern Eurasian side. The graph also shows that Mansis have received important geneflow from Kets. Therefore, genetic evidence points to Uralic groups having mixed with Yeniseians and ENA/ANE looks like coming from there. I have all the time claimed that there is a Siberian / Native American type (=Yeniseian) substrate in Uralic languages.
Karasuk_outlier RISE479 is a female. I can't check at the moment what the mtDNA haplogroup was, but judging by the East Eurasian genome-wide DNA, it was probably something very eastern.
By the way, the Iron Age Russia samples aren't from European Russia. They're from near the Altai, very East Eurasian, and probably related to early Turks in some way.
aBOO D* was only typed for 223t and 362c. This means it can be D5 and the only thing it excludes are certain D4 clades with 223c! For D5a3a which has the specific Northeastern European occurrence you need to look for more specific motif https://www.familytreedna.com/public/D?iframe=mtresults.
The Iron Age arrival of Finnic to Finland and Karelia (preceded by Bronze Age arrival to Estonia) is not my idea, you can take it up with linguists. It happens to coincide with the marker ages we get from next-gen sequencing. VL29 doesn't look to have born in Finland but in what's now Russia alongside its preceding form CTS10760.
RISE 497 A+152+16362 http://www.ancestraljourneys.org/ancientdna.shtml
Shaikorth, D4 is a huge group, and according to Ian Logan site only the following are 223C and they are oddly all Japanese:
D4c1 C2766T T16223C Japanese
D4g2a1a T195C A546G G8994A A14793G T16223C Japanese
D4m1 A1148G T6620C A9667G C12088T T16223C G16244A Japanese
Specifically, D4j2a is found in Mansi and Ket and Mansi HVR1 is C16223T C16291T T16362C T16519C, so it matches at least as well Bolshoy Oleni Ostrov D* as D5 that you propose.
D5a3a1 from China is G16129A C16169T A16182- A16183- T16189C 16193.1C 16193.2C C16223T C16360T T16362C, and it is upstream of Finnish D5a3a1a.
If the main branch of N-CTS10760* is firmly present in Finland and only a parallel branch is found in Russia, it does not exclude that N-CTS10760* arose in Finland or close to Finland. By the way, do you know the place in Russia where yfull YF04468 comes from?
I am not against the Iron Age arrival of Finnic to Finland and Karelia preceded by Bronze Age arrival to Estonia. I only argue that N-M2126 has been in the Baltic area since the Corded Ware period and may represent several cultures and may have spoken several languages.
I know that RISE497 is female, but I was referring to other Karasuk samples that were males. My point is that you do not need N1c to explain Karasuk DNA profile. Among others, MtDNA A is a typical Beringian haplogroup. So it does not seem to be ancient Siberian A10 (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0127182)?
in Davidski's tree Estonian is the closest population to Corded Ware samples and other ancient samples. Shaikorth notes that the highest N1c haplotype diversity is in Estonians and, in spite of this, you are against N1c being incorporated into Corded Ware. What's the logic?
By contrast, you all agree that I1, I2 and J2 were incorporated into IE groups.
Well, Serteya shows N1c was in there at least in small numbers. I didn't dispute that, my point was more about the spread of Finnic which was later. In that scenario it would not be surprising if the proto-Finnic population's closest modern relatives would be Estonians. Karasuk has little to do with this.
The Chinese D5a3a1's 15530C 16129A 16169T are specific to it, it is a diverged local branch that has not been named.
http://www.ianlogan.co.uk/sequences_by_group/d5a3_genbank_sequences.htm
http://www.phylotree.org/tree/D.htm
Mansis have the D5a3a1 found in NE Europe and its frequency also grows when moving towards Kola Peninsula so why should it be D4 which is absent from the region?
According to Ian Logan, Karasuk female could be A6:
(http://www.ianlogan.co.uk/sequences_by_group/a6-11_genbank_sequences.htm)
A6a with 152C is found in Chinese and Tujia.
A6b with 152C is found in Tibeto-Burmans.
A8a1 with 152C is found in Buryat and Tungusics.
A11a and A11b are found in Tibeto-Burmans.
'Why should it be D4 which is absent from the region'
My presumption was based on HVR1 data. If Bolshoy Oleni Ostrov D* is D5, it means that in the analysis concerned they did not get the full HVR1 information as HVR1 of D5 is much more complex than HVR1 of D4j. On the other hand, there are also other mtDNA haplotypes that did not survive from Bolshoy Oleni Ostrov such as C5 and C*, while U4a with 134T is found in Norway and not in Finland or Karelia. However, I am not adamant in this.
@huijbregts
Thanks for keeping improving nMonte. It's very useful to have the number of cycles as an optional parameter rather than changing the script. In my limited testing with 8 source populations 300 cycles seems to be a sweet spot. Adding populations obviously increases exponentially the number of combinations, so for more than 8 it might be better to stick to the default 1000 (while for 4-6 even 100 seems to be fine). But as you said, I can only test a few cases, so all the warnings apply.
Kristiina, aBOO site's culture was found all over northern Fennoscandia and with small population sizes an uniform survival is not to be expected. Z1a was also present.
Might be noteworthy that the Scandinavian and Finnish Z1a1a clades of presumed Saami origin share 315.1c with FJ493512 (a Ket) and upstream Z1a but Russian Volga samples of Ingman (DQ902708-DQ902711 )miss it.
http://www.ianlogan.co.uk/sequences_by_group/z_genbank_sequences.htm
Regarding S-C Asians, the absence of a pure ANE sample does seem to be a limitation. Using EHG provides enough ANE, but it also provides a lot of WHG, so any steppe population gets 0%:
Kalash
"Caucasus_HG" 30.8
"Dravidian_India" 26.5
"Karelia_HG" 17.9
"Anatolia_Neolithic" 16.3
"Dai" 7.35
"Esan_Nigeria" 1.15
"Sintashta" 0
distance=0.002929
And the fit is not great. If instead I use an ANE reference with less WHG affinity (Selkup or Okunevo are the options), then it allows steppe populations to show up, though it removes a lot of Dravidian because these populations have a lot of East Asian (though it still takes some Dai, so not sure what to think. Maybe Okunevo represents a realistic option as someone suggested above? The fit it quite better):
Kalash
"Okunevo" 28.85
"Caucasus_HG" 24.75
"Sintashta" 17.25
"Anatolia_Neolithic" 14.1
"Dravidian_India" 9.35
"Dai" 3.3
"Esan_Nigeria" 2.4
distance=0.001943
Adding both still improves further:
Kalash
"Caucasus_HG" 30.1
"Okunevo" 22
"Anatolia_Neolithic" 19.4
"Karelia_HG" 9.1
"Dravidian_India" 7.25
"Dai" 6.45
"Sintashta" 3.2
"Esan_Nigeria" 2.5
distance=0.001521
Not sure which of the options is more realistic. For reference of what gives an almost perfect fit, we get there by adding Chechen. Not that Chechen is useful historically, but to have an idea of best possible fit:
Kalash
"Chechen" 54.4
"Okunevo" 17.15
"Caucasus_HG" 14.35
"Dai" 6.9
"Karelia_HG" 3.7
"Esan_Nigeria" 2.6
"Sintashta" 0.85
"Dravidian_India" 0.05
"Anatolia_Neolithic" 0
distance=0.000428
From limited testing with other populations, it's interesting that Kalash and Tajik_Ishkashim take less Sintashta than Pathan or GujaratiA. With Karelia_HG and Okunevo (but without Chechen):
Pathan
"Caucasus_HG" 22.9
"Okunevo" 20.95
"Anatolia_Neolithic" 19.35
"Sintashta" 11.65
"Dai" 8.5
"Dravidian_India" 7.9
"Karelia_HG" 5.65
"Esan_Nigeria" 3.1
distance=0.001993
GujaratiA
"Okunevo" 21.1
"Caucasus_HG" 19.05
"Anatolia_Neolithic" 16.55
"Sintashta" 14.65
"Dai" 11.3
"Dravidian_India" 8.3
"Karelia_HG" 5.35
"Esan_Nigeria" 3.7
distance=0.001494
@Davidski,
Making Dravidian an outgroup will improve S/C Asian results. It'll erase any phony East Asian/African scores S/C Asians get.
@Alberto,
Yep, and it's important to note S/C Asians certainly have WHG ancestry. MA1 alone doesn't work, but EHG does. More testing will resolve the issue. I think most is Sintashta-derived.
Alberto,
Its premature at this stage at large but still I am interested in seeing whether North Indian Brahmins and South Indian Brahmins , to see what kind of proportions they show.
@Krefter
Maybe a Paniya outgroup? Most large groups of Dravidians tend to share their ancestries with more northwestern pops, only proportions differ.
nMonte results in ANE K8, Eurogenes K15, and D-stats are very consistent. For K15, I combined North Sea, Baltic, and East Euro into one component. This is because they all formed when Steppe and EEF mixed.
We're getting very close to the truth. IMO, it's clear Spain and Italy has significant Steppe and Near Eastern ancestry. I don't think the Yamnaya scores in Haak 2015 were inaccurate, 20-30% sounds correct.
ANE K8.
Tuscan: 29.75% MN10, 36.45% German Bell Beaker, 33.8% Cypriot: 0.006936
D-stats.
Tuscan: 32.25% MN10, 32.25% German Bell Beaker, 35.5% Near East(26.6% Turksih, 8.9% Georgian): 0.003571
Eurogenes K15
Tuscan: 28.95% Otzi, 36.45% Irish, 34.6% Near East(20.45% Leban_Christain, 6.15% Georgian, 4.15% Samartian, 3.85% Palestinian): 0.008142
ANE K8
West Sicilian: 16% MN10, 22% German Bell Beaker, 58% Cypriot, 4% Mozabite: 0.001823
D-stats.
West Sicilian: 14.1% MN10, 20.85% German Bell Beaker, 60.6% Near East(44.55% Cypriot, 16.05% Turkish), 4.45% Mozabite: 0.00211
Eurogenes K15
South_Italian: 20.9% Otzi, 16.35% Irish, 61.05% Near East(29.75% Cypriot, 22.75% Leban_Druze, 8.55% Georgian): 0.00376
ANE K8
Central_Greek: 27.5% MN5, 29.75% BeloRussian, 42.5% Near East(21.4% Georgian_Laz, 21.1% Cypriot): 0.002436
D-stats.
Greek1: 23.8% MN5, 37.85% BeloRussian, 38.35% Near East(16.3% Cypriot, 13% Turkish, 9.05% Georgian): 0.00164
ANE K8
Spain_Aragon: 54.85% MN30, 31.1% German Bell Beaker, 13.55% Near East(5.3% Cypriot, 8.25% Turkish), 0.5% Mozabite: 0.008961
D stats
Spain_Aragon: 36.6% MN20, 38.2% German Bell Beaker, 23.5% Turkish, 1.7% Mozabite: 0.004955
Eurogenes K15
Spain_Aragon: 37.35% Spain_MN, 47.2% Irish, 15.28% Near East(9.15% Leban_Druze, 6.3% Palestinian): 0.018645
@Nirjhar
Sorry, there are no Brahamins in this list. I guess that when a good model is found David can run stats with more S-C Asian populations.
@Krefter, Shaikorth
Yes, having some ASI-rich outgroup should help. Though on the other hand, if we don't have a good ANE reference we might worsen the problem: Okunevo won't work to provide ANE because of too high East Asian instead of ASI, so it will force more EHG and that removes any steppe ancestry.
there are no Brahamins in this list
I see.
@Matt, I tried to look for patterns on how David's run selected the three ENA-affiliated components (Karasuk-outlier, Ulchi, Atayal, whose ANE should be in that order too) in moderns.
Only (or clearly <1% percent something else) Atayal
West Sicilian, Albanian, Estonian, Hungarian, Bulgarian, Ukrainian_East, French, some Iberians, Bergamo, Sardinian, Czech, English.
Only Karasuk
Kumyk, Chechen, Finnish, Iranian Jew, Georgian Jew, Lezgin, Abkhasian, Iraqi Jew, Syrian, Lebanese, Armenians, Bedouins, Georgians, Maltese, West Ukrainian, Croatian, some Spanish, Lithuanian, Norwegian
Only Ulchi
Yemenite Jew, Scottish, Basque, Icelandic
Karasuk + Ulchi/Atayal >1%.
Uzbek, Kyrgyz, Turkmen, Nogai, Turkish, Iranian, Kargopol Russian, Mordovian, North Ossetian, Adygei, Saudi, Sephardi, Ashkenazi, Druze, Palestinian, Jordanian, Greek, EastSicilian.
It's possible Karelian EHGs spoke a language that was a father to both proto Uralic and proto IE. The father language splits, the Northern language becomes proto Uralic while the southern EHG language then gets influenced heavily by the CHG language becoming "PIE".
@Arch Hades,
I'm not saying you don't know this, just that it is important to remember when discussing Uralic origins. The Northern edges of Europe were pretty much completely repopulated between 3000 and 2000 BC. Karelia_HG has few descendants today. 99% of the EHG in modern Europe is from Yamnaya-like people, not directly from EHG.
The people who repopulated Northern Europe(Like Corded Ware) did mix with the natives, which is why Finnish, Saami, Balts, and Estonians have so much excess WHG. But it's a minority.
Finnish are basically Lithuanians with Swedish and Siberian/Volga-Ural(either one works). I know nothing about Uralic languages. Still, I wouldn't be surprised if Finno-Urgic speakers, are primarily descended of Indo Europeans who became Uralic speakers because of a post-3000 BC migration from Siberia and or Volga/Ural.
My overall point is, it is unlikely Karelia_HG spoke anything related to Indo European or Uralic. Connecting Finno-Urgic to Mesolithic NorthEast Europe doesn't make sense, because most of their ancestors arrived from Central Europe after the Mesolithic(hence their genetic similarity to people as far as way as Ireland).
BTW, Srubnaya are a better reference than Sintashta (there always something a bit weird with the Sintashta genomes). And it basically makes the problem of EHG go away with the populations that really don't need any extra from it (Tajik_Shungan, Pathan, GujaratiA...), while it can still stay for those who do need it (Tajik_Ishkashim, Kalash, Burusho).
Tajik_Shugnan
"Srubnaya" 29.4
"Okunevo" 28.05
"Caucasus_HG" 17.5
"Anatolia_Neolithic" 15.9
"Dravidian_India" 6.75
"Esan_Nigeria" 1.5
"Dai" 0.9
"Karelia_HG" 0
distance=0.001977
Pathan
"Okunevo" 23.6
"Caucasus_HG" 22.3
"Srubnaya" 18.75
"Anatolia_Neolithic" 16.5
"Dai" 7.45
"Dravidian_India" 7.4
"Esan_Nigeria" 3.2
"Karelia_HG" 0.8
distance=0.001799
GujaratiA
"Okunevo" 23.05
"Srubnaya" 20.35
"Caucasus_HG" 18.75
"Anatolia_Neolithic" 14
"Dai" 9.9
"Dravidian_India" 9.15
"Esan_Nigeria" 3.7
"Karelia_HG" 1.1
distance=0.001281
Kalash
"Caucasus_HG" 29.7
"Okunevo" 22.5
"Anatolia_Neolithic" 18
"Dravidian_India" 7.55
"Karelia_HG" 7.25
"Srubnaya" 6.4
"Dai" 6.1
"Esan_Nigeria" 2.5
distance=0.001462
Iranian
"Anatolia_Neolithic" 38.7
"Caucasus_HG" 24.4
"Okunevo" 11.7
"Srubnaya" 11.65
"Dravidian_India" 5.95
"Dai" 4.1
"Esan_Nigeria" 3.5
"Karelia_HG" 0
distance=0.001901
Punjabi_Lahore
"Dai" 19.95
"Srubnaya" 18.15
"Caucasus_HG" 17.6
"Okunevo" 16.9
"Dravidian_India" 11.75
"Anatolia_Neolithic" 8.9
"Esan_Nigeria" 4.9
"Karelia_HG" 1.85
distance=0.00254
Balochi
"Caucasus_HG" 28.05
"Anatolia_Neolithic" 26.3
"Okunevo" 17.75
"Dravidian_India" 7.55
"Dai" 5.9
"Karelia_HG" 5.85
"Esan_Nigeria" 4.5
"Srubnaya" 4.1
distance=0.001766
Burusho
"Okunevo" 25
"Caucasus_HG" 21.55
"Anatolia_Neolithic" 18.75
"Dai" 15.65
"Dravidian_India" 6.95
"Karelia_HG" 6.55
"Esan_Nigeria" 3
"Srubnaya" 2.55
distance=0.000968
@Alberto,
You should probably cut Okunevo from the list. Can you try using Dravidian, Caucasus_HG, Georgian, Armenian, Anatolia_Neolithic, Srubnaya, Karelia_HG? Without Okuevo in there, we should see whether Srubnaya is needed or if Karelia_HG does the job.
@Krefter
Well would not EHGs (of some sort at least) be the ones who spoke Proto Uralic?
"The Homeland of Proto-Uralic probably was in the forest zone centered on the souther flanks of the Ural Mountains. Many argue for a homeland west of the Urals and others argue for the east side, but almost all Uralic linguists and ural-region archaeologists would agree that Proto-Uralic was spoken somewhere in the birch-pine forest between the Oka River on the west (around modern Gorky) and the Irtysh River on the east (around modern OMsk)." -David W. Anthony
If the population of Samara was EHG circa 5,500 BC, then why not this area? But yeah, Uralic's presence in Finland and the Baltic would probably be a migration east to west from this area.
My first post was just a conjecture that EHGs spoke a father language of both PIE and proto Uralic. There's linguistic models that at least believe PIE and Uralic have a common ancestor. You can google "the Indo-Uralic hypothesis" and find arguments. Though I'm no linguistic so I don't know. We know they seem to have an intimate relationship, but a common ancestor may or may not be the case. So perhaps Eastern HGs as far West as Karelian HGs just spoke the common ancestor. Or maybe Karelian HGs spoke a daughter language of the common ancestor of Indo-Euralic. So on a linguistic tree we have the mysterious Karelian HG language as one branch and Indo-Uralic on the other. Then of course later in the east the Indo-Uralic would split from one another with the southern branch of Indo-Euralic getting heavily influenced by the CHG language and becoming "PIE".
"The people who repopulated Northern Europe(Like Corded Ware) did mix with the natives, which is why Finnish, Saami, Balts, and Estonians have so much excess WHG. But it's a minority.",
Well the 'natives' of these areas before Yamnaya/Corded Ware dispersals would be ANE packed EHGs though, right? Not normal WHGs.
@Arch Hades,
Okay, for some reason I thought you were suggesting Uralic was in Karelia in the Mesolithic. Couldn't Uralic be a Siberian(part EHG) thing not EHG, considering all European Uralics have Siberian ancestry?
"Well the 'natives' of these areas before Yamnaya/Corded Ware dispersals would be ANE packed EHGs though, right? Not normal WHGs."
I'm referring to natives of Baltic states and southwest Finland, who could have been mostly WHG.
cool pic which helps imagine how flows of ppl might be affected by geography
http://p2.pstatp.com/large/6214/6699489310
#
interesting thing about the copper ores imo is (I assume) there's a spectrum of arsenic composition in regional ores so the result might go:
soft copper
-> slightly harder copper
-> much harder copper
-> arsenic bronze
-> tin bronze (not sure if it's better or the same?)
-> iron
so you can imagine regional military advantage swinging back and forth as particular regions with the right ores developed a better edge.
@Krefter
Okunevo is key for the steppe model to work. Whether it's realistic or not is hard to say. But if i leave it out, Srubnaya goes to 0% (which could be correct, I don't know):
Pathan
"Dravidian_India" 39.75
"Caucasus_HG" 22.6
"Anatolia_Neolithic" 17.95
"Karelia_HG" 15.4
"Dai" 3.5
"Esan_Nigeria" 0.8
"Srubnaya" 0
distance=0.002099
GujaratiA
"Dravidian_India" 33
"Caucasus_HG" 21.15
"Karelia_HG" 17.4
"Anatolia_Neolithic" 17.35
"Dai" 9.2
"Esan_Nigeria" 1.9
"Srubnaya" 0
distance=0.002264
Adding Armenian and Georgian is not very informative and redundant having already Anatolia Neolithic and Caucasus_HG:
Pathan
"Georgian" 43.6
"Dravidian_India" 27.75
"Karelia_HG" 13.7
"Caucasus_HG" 6.55
"Dai" 6.3
"Esan_Nigeria" 1.2
"Srubnaya" 0.9
"Anatolia_Neolithic" 0
"Armenian" 0
distance=0.001953
GujaratiA
"Dravidian_India" 25.45
"Georgian" 22.65
"Karelia_HG" 16.4
"Caucasus_HG" 13
"Dai" 11.2
"Anatolia_Neolithic" 7.85
"Esan_Nigeria" 2.2
"Srubnaya" 1.1
"Armenian" 0.15
distance=0.00238
Krefter, I am tired of repeating the same evidence over and over again. Do you have any capacity to absorb evidence if it contradicts your old preferential concepts?
Does this tree really suggest to you that Uralic groups represent non-EHG part, i.e. pure ENA ancestry?
http://www.nature.com/articles/srep20768/figures/2
Uralic-speaking Mansi and Saami are clearly on the Western Eurasian side while true Siberians are not and they are usually not Uralic speakers but speak a BIG VARIETY of other languages: Yeniseian, Yukaghir, Tungusic, Turkic, Koryak, Chukchi, Itelmen, Eskimo, Na-Dené. Of course, Selkups are Uralic speakers but they are Q folks and of course Nganasans are Uralic speakers but they carry only Asian N1b-A and C2b and lack the so called Ket-Uralic component and are completely Tungusic autosomally.
yDNA C2 which is old in Northeast Asia and surely Tungusic/Arctic/Northeast Eurasian autosomally is distributed in groups as follows:
C2a-M93 Japanese
C2b Japanese
C2b1a American Indian
C2b1a1a North American Indians
C2b1b Manchu, Mongol, Korea, Hezhe, Oroqen, Evenk, Ulchi, Negidal, Udega, Nivkh Koryak, Itelmen, Chukchi; age 11 kya
C2b1c Mongolian Genghis Khan group
C2b1d Europeans
C2b2 Evenks, Yukaghir
C2b2a Yukaghirs, Evenks, Tuvans, Yakuts, Altaian Kazakhs, Tuvinians, Mongols, Buryats, Kalmucks, Ulchi, Negidal, Nivkh, Koryaks, Itelmen
C2b2a1 Kazakhs, Kalmucks, Tuvinians, Mongols, Todjis
C2b2a1 Kazakhs
C2c Northern Han, Mongols, Uygurs, Hui, Xibe
IMO, it is clear that C2b folks spoke paleo-Siberian, including North-American languages, and turned to Turkic and Tungusic speakers only recently.
YDNA N is not old in Northeast Asia. TMRCA of N1c-F4205 is only 3000 years according to yfull. It is recent in Northeast Asia and with all likelihood came from Volga forest area and could only have been in part ENA. Please, be not so obtuse.
Krefter, the natives of the Baltics couldn't have been too much WHG because Estonian Corded Ware was not extremely WHG compared to more southern Corded Ware.
Newer linquistic studies are showing that Uralic languages and their expansion are quite recent. Post-mesolithic there were movements from Siberia and the Volga-Ural region towards the Baltic states and/or Finland, the non-Uralic culture of Bolshoy Oleni Ostrov folks for instance preceded Saami and those people were probably quite Siberian if their autosomal DNA is anything like their mtDNA.
On the proto-Finnics, read what I posted earlier in the comments. Would be unsurprising if Estonians turn out to be the closest population to them. The spread of Saami happened beyond the CW horizon and they probably were quite different. I expect that ancient DNA will show if either was similar to proto-Uralic by the time they reached Eastern Baltic and Karelia but that will take time.
I'll try to join in tomorrow, with the Onge. That will help resolve this.
David,
Is it possible to have a sheet that includes Kharia (or any other tribal Indian population) as an outgroup? It'll help immensely with figuring things out in South Central Asia + West Asia + Caucasus.
Maybe. I'll have a talk with other users who have extra samples.
Shaikorth, Proto Uralic needs to at least date back to 3,000 BC and when PIE existed, or else no one would try to connect PIE with Uralic in any way.
Onge D-stats for nMonte
https://drive.google.com/folderview?id=0B962TtPkX1YneWlmOGZpUHJmeWs&usp=sharing
@ Chad, thanks. Wow. Those intra Onge-Onge2 sharing values are huge. 0.5465, where the Loschbour-La Brana stat is only 0.5085 (and that was huge compared to other highest stats). The Onge must have been very isolated for a very long time, phylogenetically very tight. I can't see it making much of a contribution to the Indian population with that (Indians are not much closer to Onge relative to Han than others; Onge is only a very loose proxy for their ancestors, as might be expected from islands isolated so long that are really closer to Southeast Asia than India). Too late for me to run any nMonte fits, be interested to see what you guys have.
qpAdm has South Asians as a majority Onge (73% in Paniyas), with some CHG and rejecting Dai and Atayal. Except, Austroasiatics getting 15-25% Atayal.
@Shaikorth,
"Krefter, the natives of the Baltics couldn't have been too much WHG because Estonian Corded Ware was not extremely WHG compared to more southern Corded Ware."
That's because they weren't admixed yet.
@Chad,
Are you able to convert Onge D-stats to a spreadsheet format?
They are in spreadsheet format.
Thanks Chad!
With all of the data in, this is how Pashtuns stack up:
13.50% GujaratiB
9.05% Georgian
7.70% Poltavka
7.15% Azeri_Baku
6.10% Kalash
5.65% Caucasus_HG
4.80% Chechen
4.80% Itelmen
4.70% Brahui
4.45% Iranian
4.45% Punjabi_Lahore
3.75% Tajik_Shugnan
2.40% Onge
2.15% Afanasievo
1.70% Altai_IA
1.65% Turkmen
1.25% Yamnaya_Kalmykia
1.10% Balochi
1.00% Dravidian_India
1.00% Karasuk_subset
Too confusing for comment.
After keeping only ancient steppe populations/Onge/other ancient West Eurasians/Ulchi/BedouinB:
24.50% BedouinB
17.95% Karasuk_subset
17.10% Yamnaya_Kalmykia
16.85% Caucasus_HG
10.30% Ulchi
5.30% Yamnaya_Samara
4.55% Onge
3.45% Ust_Ishim
I guess this fit is putting Pashtuns at around 40% steppe-admixed. For whatever it's worth, the strong Ulchi percentage probably involves extra ANE (like Alberto has been saying with Okunevo and Selkup). And just like Matt assumed, the Onge percentage is very small, they haven't really contributed much genetic ancestry to South Asia (rather, they are marginally closer to the ENA ancestry of South Asians, compared to East Asians in relation to South Asia).
Also, my notion concerning Ust-Ishim seems to have been justified, as he still appears, despite the use of both Onge and Ulchi. Again, I think Ust-Ishim is acting as a proxy for either a third Eurasian lineage, or perhaps an undiscovered West Eurasian meta-population.
For comparison, Iranians with the same set of populations:
37.60% BedouinB
17.70% Caucasus_HG
15.35% Anatolia_Neolithic
8.05% Karasuk_subset
7.30% Yamnaya_Kalmykia
6.25% Poltavka
5.60% Ulchi
2.15% Onge
Their Onge percentage is pretty close to Pashtuns.
Arch Hades, 3000 BC is not out of bounds for proto-Uralic existing somewhere, it's more that the language did not start to spread until 4000 years ago.
Krefter, Estonia is both small and lacks the mountainous hiding places of Sardinia or the Alps. If there was a WHG culture contemporary of Corded Ware hiding there, why hasn't it shown up on record?
Some other models:
Tajik (Shugnan)
52.70% Karasuk_subset
15.50% Caucasus_HG
12.60% BedouinB
8.50% Yamnaya_Samara
5.75% Anatolia_Neolithic
2.90% Yamnaya_Kalmykia
2.05% Onge
Tajik (Ishkashim)
44.30% Karasuk_subset
15.40% Caucasus_HG
11.00% BedouinB
9.55% Yamnaya_Kalmykia
7.80% Anatolia_Neolithic
4.60% Ulchi
4.20% Onge
3.00% Yamnaya_Samara
0.10% Poltavka
0.05% Andronovo
Kalash
34.35% Yamnaya_Kalmykia
21.10% BedouinB
20.40% Caucasus_HG
12.90% Ulchi
6.35% Karasuk_subset
3.80% Onge
1.10% Ust_Ishim
This is very encouraging. Based on what I've read, Pamiri peoples are supposedly direct descendants of Scythians. And the most European-shifted Karasuk samples should be good proxies for Scythians (if I'm not mistaken, Karasuk_subset is genetically quite similar to the IA Scythian sample we have).
By contrast, the Kalash have nothing to do with Scythians, as they speak a Dardic language. Their steppe ancestry should have been mediated, for the most part, via Indo-Aryans. And as we can see in the model, they are much more Yamnaya-like than the Pamiri samples.
Karasuk_subset:
38.80% Sintashta
30.60% Yamnaya_Samara
30.05% Nganasan
0.55% Ami
Again, very encouraging, since it makes complete sense.
nMONTE
Constructing a subset of a datasheet is a humble but consequential step in the workflow.
I wanted a utility to automate this task. I have added it to the script of nMonte as a function 'subset_data()'.
If you want to use this to utility, you first enter: source('nMonte.R')
Next you can enter something like subset_data('DavidMadeThis.csv', 'IselectedThis.csv' ,'Abkhasian', 'Adygei', 'Afanasievo', 'Altai_IA')
The subsetted datasheet is saved as 'IselectedThis.csv'. If this filename is already in use in the active directory, you get an error message.
You also get an error message if you have entered a wrong filename like 'Beall_Bekker_Germany' or 'Piltdown'.
The updated version of nMonte is at:
https://www.dropbox.com/sh/1iaggxyc2alafow/AACIjLtnkuaNNsJ5oKME_3XHa?dl=0
Trying a couple models (just for information, I patched the Onge stats into the set with BedouinB and KareliaHG that I've been using, which did involve using a couple of estimated values), trying to keep the ancestral groups fairly simple:
Pathan:
Caucasus_HG - 29.2, Nganasan - 17.85, BedouinB - 15.05, Karelia_HG - 15, Anatolia_Neolithic - 13.45, Onge - 5.5, Ust_Ishim - 1.15, Masai_Kinyawa - 1.1, Loschbour_WHG - 1, Esan_Nigeria - 0.7, Dai - 0, Hungary_HG - 0, Ulchi - 0
distance% = 0.0881
Kalash:
Caucasus_HG - 34.3, Karelia_HG - 16.95, Nganasan - 16.7, BedouinB - 13.9, Anatolia_Neolithic - 11.55, Onge - 4.45, Esan_Nigeria - 1.1, Ust_Ishim - 1.05, Dai - 0, Hungary_HG - 0, Loschbour_WHG - 0, Masai_Kinyawa - 0, Ulchi - 0
distance% = 0.137 %
GujaratiC:
Caucasus_HG - 26.6, Nganasan - 20.35, Karelia_HG - 11.7, BedouinB - 11.15, Onge - 9.65, Anatolia_Neolithic - 8.15, Dai - 4.4, Ust_Ishim - 2.7, Loschbour_WHG - 2.05, Masai_Kinyawa - 1.9, Esan_Nigeria - 1.35, Hungary_HG - 0, Ulchi - 0
distance% = 0.188 %
Dravidian_India:
Ulchi - 29.1, Caucasus_HG - 20.6, Onge - 14.3, BedouinB - 9.2, Karelia_HG - 6.35, Anatolia_Neolithic - 6.15, Ust_Ishim - 5, Dai - 2.8, Masai_Kinyawa - 2.7, Esan_Nigeria - 2.2, Hungary_HG - 1.6, Loschbour_WHG - 0, Nganasan - 0
distance% = 0.0553 %
The percentages of Onge ancestry are pretty good, but still lose out to North Asians, for the most part. The high fractions of Siberian ancestry suggest you might need some source that is MA-1 like as well? One that isn't Yamnaya or EHG (or ideally a Siberian)?
Since I expect that a higher than expected Karitiana and Samara_HG stat for the South Asian populations than can be explained by the Karelia_HG ancestry is what is leading to the North Asian ancestry here, and may have something to do with the Karasuk fractions in other runs.
Chad or Davidski, if either of you, could run off:
D(Chimp,Ref)(Mbuti,MA-1) for
Ref=BedouinB, Caucasus_HG2, Han, Iberia_Chalcolithic, Iberia_Mesolithic, Karitiana, LBK_EN, Motala_HG, Samara_HG, Yoruba, Onge2
that would be appreciated, so I could slot MA-1 into my datasheet and try fits involving it.
I think Matt is right, we probably need MA1.
Also, Chad, could you add BedouinB as an outgroup/reference to your datasheets, so that we don't have to use estimated values? That would be greatly appreciated, thanks in advance.
There should be a sheet floating around with BedouinB in the test pops and BedouinB2 in the outgroups. I don't have it on me now because my computer crashed last night, and I don't know what the link is.
I'll e-mail Chad now and send him the list file to make this easier.
Actually, Matt, I think you've got MA1 in the wrong place there.
Yes, a good ANE reference is a must for S-C Asian populations. Using Siberian samples (Okunevo, Nganasan, Ulchi, Selkup) kind of work because there is also a good amount of ENA in S-C Asia, but it might not be a realistic option. While including EHG does away with any other WHG-rich population. MA1 should be the better choice, but usually MA1 shows such low affinity to all Eurasians (compared to more modern samples) that I'm not sure it'll work too good.
Regarding the runs with Karasuk_subset, those samples do work very good, but the problem is that Karasuk are a very mixed population (largely a steppe + Siberian mix), so it confounds things further. It's more informative to keep the "European" part separated so we can account for it on its own. At least for any Sintashta (or Srubnaya, just because the genomes work better) we know what they represent and where they come from. And then we can keep the "Siberian" part also separated, knowing that we don't really know what it represents (maybe a substrate that was there since the Upper Paleolithic? Maybe later migrations (when?)? Who knows).
Kalash
"Karasuk_subset" 32.2
"Caucasus_HG" 27.6
"Anatolia_Neolithic" 12.95
"Okunevo" 11.5
"Dravidian_India" 6.45
"Onge" 3.25
"Srubnaya" 2.25
"Esan_Nigeria" 2
"Dai" 1.65
"Eastern_HG" 0.15
distance=0.00055
The fit is excellent, but less informative than without Karasuk:
Kalash
"Caucasus_HG" 27.55
"Okunevo" 22.7
"Anatolia_Neolithic" 15.3
"Srubnaya" 12.4
"Dravidian_India" 11.5
"Dai" 3.2
"Eastern_HG" 2.75
"Onge" 2.6
"Esan_Nigeria" 2
distance=0.00097
Still an excellent fit, and we leave to Okunevo all the "unknown" possible origin. Removing Dravidian to let Onge as the ASI reference doesn't change things much:
Kalash
"Caucasus_HG" 29.65
"Okunevo" 25.45
"Anatolia_Neolithic" 17.1
"Srubnaya" 11.85
"Dai" 5.75
"Onge" 4.3
"Eastern_HG" 3.25
"Esan_Nigeria" 2.65
distance=0.001007
I think that Okunevo removes much more ASI than other Siberians (Ulchi and Nganasan in Matt's runs). Which might make sense if indeed the Okunevo is representing an ancient S-C Asian substrate. But only ancient DNA can tell us about this ANE origin in the area.
@Shaikorth:
"Newer linquistic studies are showing that Uralic languages and their expansion are quite recent."
Are these analyzises amongst the studies you refer to?
http://www.linguistics.fi/julkaisut/SKY2006_1/1FK60.1.9.WIIK.pdf
http://paabo.ca/papers/CommentaryOnWiik2.pdf
Or is - perhaps- this one more on target?
http://www.continuitas.org/intro.html
Interesting result for Okunevo:
Okunevo
"Itelmen" 47.45
"Eastern_HG" 29.65
"Caucasus_HG" 16.9
"Dravidian_India" 5.75
"Ami" 0.25
"Loschbour_WHG" 0
"Anatolia_Neolithic" 0
"Dai" 0
"Nganasan" 0
"Yakut" 0
"Onge" 0
"Ulchi" 0
distance=0.003852
Is that Dravidian noise?.
There is high enough ANE and East Eurasian in South Asians that not only lends itself to a Karasuk representative for Steppe, but even within the South Asian or South Indian components, if you break them down, they've got a substantial Siberian/Mongolian fraction.
Karasuk is entirely a realistic possibility for a source of real Siberian-like admixture in South Asia. But it seems unlikely, though not impossible judging by the spread of R1a, that Siberian/Arctic-type admixture is endemic and correlated with ancient India or South India between 4000 and 2000 years ago. Especially since the European HG-like stuff only shows up in the Indo-Aryan heartlands of the Northwest.
So I think the realistic possibilities are a direct injection of something very ANE/MA1-like into India a very, very long time ago that became intertwined with ASI or came with it.
For example, using the Eurogenes K7/K8 components, we can see the following:
Southeast Asian (Malay/Dai): ~50% ASE, ~50% East Eurasian (the latter of which peaks in Siberia/Mongolia)
Northeast Asian (Japanese): ~2% ANE, ~12% ASE, ~86% East Eurasian
Siberian: ~16% ANE, ~84% East Eurasian
Amerindian: ~37% ANE, ~63% East Eurasian
EHG: ~37% ANE, ~58% WHG, ~3% ASE, ~2% East Eurasian (just making these up from memory)
CHG: ~31% ANE, ~64% ENF, ~5% ASE (Admixture used to split this into two simulated components... modern Caucasus with low-mid 20s ANE and Gedrosian with ~35% ANE, ~9-10% ASE)
So we have populations representing high ANE fractions with neighboring components (East Eurasian in Amerindian, and WHG in EHG).
We could be missing the following:
~30% ASE, ~70% East Eurasian ("Continental East Asians in South Asia", or perhaps Tibeto-Burman)... this kind of component does attract admixture in South Asians, a few percent at least.
~70% ASE, 30% East Eurasian (it had to exist at some point, it might have entered the South Asian soup)
~37% ANE, 63% ASE or some combination of high-ASE and East Eurasian (the source of ASI in modern South Asians)... this works for South Asians (as an 'ASI' component) very well with some variation in the 'mixture'. It would be like a Southern version of EHG or Amerindian. It would explain the high ANE in South Asians (such that South Asians will continue to have admixture picked up by MA1 components even in the presence of other ANE-heavy components).
~50% ASE, ~50% Oceanian... some combination of this always attracts Oceanian admixture away from an actual Oceanian component in South Asians
As for the purely Steppe theory, perhaps the more Eastern Karasuk swept into India en masse which disseminated R1a and more Andronovo-like people only came into the Northwest.
But we haven't found R1a-L657 in Karasuk. I'm beginning to think L657 mutated and existed somewhere in South Central Asia and not Central Asia to begin with.
Something like a southern version of EHG/Amerindian might also be responsible for all the Ust-Ishim results.
@Batman
Definitely no. 'The language contact situation in prehistoric Northeastern Europe' is a pretty good start and has good references to go further.
@Nirhjar007
Probably dravidian daugther-in-laws...
@Shaikorth
What "language-contact" do you base your assumptions on, actually - if you can't define their respective points of origin?
If you're going to question the traceability of languages go take it up with linguists, not me.
Ok so its not noise ...
@Gill
In my opinion Karasuk is not important for South Asia and Indo-Aryans. They are too late and too much northeastern. The siberian-stuff South Asians show is in my opinion rather something very old . You already mentioned that they are not L657 . Indo-Aryans/L657 arrived in my opinion via Turkmenistan/Usbekistan into Afghanistan/South Asia and were more western in the beginning.
I'll post the rest of those stats this afternoon.
BTW, Karasuk people are more likely to be the reindeer herders that brought Seimi-Turbino to NE Europe.
Karasuk uniparentals are pretty far from Europe, except maybe some Turkic regions and even that I'm not sure about. Seima-Turbino was a network of horsemen and charioteers, not reindeer herders.
@Nirjhar
I don't think that the Dravidian in Okunevo can be called noise. They have some 17% CHG, which means they have Basal Eurasian (unlike Native Americans and probably some Siberians). So at some point these people had contact with people from further south. But the difficult part is to know when, where, how.
Dravidian showed up in too many populations and in an inconsistent geographical pattern, it's noise 200%, (also why it didn't showed up in other calculators?)
Alberto,
Only Okunevo has notable Dravidian among steppe related samples right?.
@Ariele
These results are with the stats kindly provided by Chad Rohlfsen, which include Onge as a column. This should give more certainty than all the previous ones when it comes to the ASI component.
But we're still all just testing things, so this is all experimental to a good degree. I'll check if the Dravidian component now behaves differently with other populations than when using the previous Dstats.
@Nirjhar
Using the exact same populations as source:
Yamnaya_Samara
"Eastern_HG" 49.95
"Caucasus_HG" 30.45
"Anatolia_Neolithic" 15.2
"Loschbour_WHG" 2.7
"Onge" 1.2
"Dai" 0.35
"Yakut" 0.15
"Nganasan" 0
"Ami" 0
"Dravidian_India" 0
"Ulchi" 0
"Itelmen" 0
distance=0.000288
Srubnaya
"Anatolia_Neolithic" 35.3
"Eastern_HG" 33.9
"Caucasus_HG" 18.8
"Loschbour_WHG" 10.5
"Itelmen" 0.95
"Onge" 0.55
"Dai" 0
"Nganasan" 0
"Ami" 0
"Yakut" 0
"Dravidian_India" 0
"Ulchi" 0
distance=0.00242
Afanasievo
"Eastern_HG" 45.35
"Caucasus_HG" 35.5
"Anatolia_Neolithic" 14.75
"Itelmen" 3.95
"Dravidian_India" 0.25
"Ami" 0.2
"Loschbour_WHG" 0
"Dai" 0
"Nganasan" 0
"Yakut" 0
"Onge" 0
"Ulchi" 0
distance=0.00345
Andronovo
"Eastern_HG" 37.35
"Anatolia_Neolithic" 29.15
"Caucasus_HG" 21.65
"Dravidian_India" 5.75
"Loschbour_WHG" 4.95
"Onge" 0.6
"Ami" 0.55
"Dai" 0
"Nganasan" 0
"Yakut" 0
"Ulchi" 0
"Itelmen" 0
distance=0.001518
Interesting, Andronovo do have some.
Lets see Sintashta :)
Alberto, Matt, Sein e.a. - Re: UI and other ancients
I have been thinking a bit about how those results, including the early Kostenki-UI stats Matt had posted on a previous post, might be explained. Actually, uniparental markers may be a good start. I restrict myself to the yDNA tree, it is more obvious, but one might later look also into mtDNA.
yDNA in general is of course inferior to aDNA. However, when dealing with 45-35 ky old samples, the effect of admix having autosomally overformed the original uniparental pattern diminishes, simply because there hadn't been that much DNA with substantially different drift around to admix with.
So, let's start with Kostenki. He came out pretty central to today's West Eurasian population. No surprise, his yDNA C sits quite upstream the yDNA tree.
UI, in contrast, has a quite low position in the tree (surprisingly low for his age), namely K2. This means, he is differentiated from, and excluding drift present with
- EEF (heavy on G2)
- WHG (heavy on I)
- CHG (so far exclusively J2)
- Bedouins, which should be heavy on J1, since J1 is considered a Semitic marker [One implication is that the J1 glacial refugium probably shouldn't have been too far away from the Arab Peninsula. The southern side of the Persian Gulf looks like an apt candidate].
- East Africans, who have quite some T (and cryptic T-related ancestry seems to be present across virtually all of SSA).
- Maritime SEA/ W. Indonesia (dito). [As T was present in LBK, it should have found LGM refuge in or not too far away from Mesopotamia. The Western Zagros foothills look good to me].
- S. Iran/ SW India, where L is concentrated (which makes L another apt candidate for the Persian Gulf refugium, possibly more the north side).
From that list it becomes pretty obvious why UI attracts North Africans, East Eurasians and Americans, or, putting it differently, rejects West Eurasians, Near Easterners and most of SSA.
By adding more groups, UI's K2 is broken down further [I am not sure if UI is actually basal K2, or already differentiated towards one of the subgroups, but most likely one of you guys has already run Dstats to figure that out):
- MA1 takes out R, possibly also a bit of Q
- Karelia_HG serves for further differentiation. First of all, it splits off R2 (which afterwards may be collected by "Dravidian", if they are included in the comparison), but it of course also attracts R1a
- Yamnaya collects R1b
- The NE Asians attract whatever flows around there - C, D, N, O, Q, with different precision, depending on the reference group chosen
- Onge cater for the SEA variants of D, and also contribute the Denisova admix that all other reference groups lack.
- The SSA samples, finally take care of the upstream groups, especially E.
What is then left, for UI to deal with? Any combination of
a) balancing out the Neandertal admix (assuming UI is far less admixed than most other groups in the panel);
b) Haplogroups M and S, i.e. the Papuan( Oceanian component;
c) Specific East Asian subclades that aren't well approximated by NE Asians - this could in particular entail quite a lot of Tibeto-Burmese O.
a) to c) might be tested by introducing adequate reference groups. Since the number of reference pops doesnt seem to be a limiting factor anymore, it could make sense (if technically possible) to also include Neandertaler and Denisova, just to see what happens.
Does that mean that you're basically unaware of the linguistical theories I referred to?
So what "recent studies" - made by professional linguists over the last three decades - do you actually refer to, in
claiming that "Uralic languages and their expansion are quite recent"?
I mentioned the studies already.
Wiik's theories and paleolithic continuities I'm not going to bother with.
Davidski: Actually, Matt, I think you've got MA1 in the wrong place there.
Ah, yeah, row not column.
for an illustration of swamps as good defensive, guerrilla terrain i'd recommend the movie "Southern Comfort"
@Kristiina
That leaves only Tocharian in the East but Tocharian texts are from the 6th to the 8th century AD, so there are not so many indications of a strong presence of IE groups in Central Asia and Altai before Sintashta 2100–1800 BC.
What about Afanasievo (3300-2500 BC)?
@Everyone,
I've figured out how to find ghost(unsampled ancient DNA) ancestors using D-stats. I used the ancient European genomes aviable in 2012(EEF, WHG), to find the non-EEF/WHG ancestors of BeloRussians. There are 12s of possibilities, and 1 possibility is almost identical to Yamnaya. So, if I use the same method with West Asians, with EEF and CHG, one of the possibilities will be a ghost ancestor or combination of ghost ancestors.
@Davidski
I took the liberty of rewriting the 4mix_multi R-script in python - on the computers I can access I get a substantial speed up - but ymmv. I hope this is not stepping on anyone's toes.
Here's the url:
https://github.com/tchaz/4mix_multi-python
@Shaikorth: Thanks for the links to the papers of Kalio and Aikio on Uralic paleo-linguistics. (Batman: This is what Shaikorth has been talking about):
http://www.academia.edu/20252178/The_Language_Contact_Situation_in_Prehistoric_Northeastern_Europe
My first takeaway has been the distinction between Proto-Uralic, Para-Uralic (now-extinct siblings and cousins of the former), and Pre-Uralic as the ancestor to all of them. Using that terminology may possibly prevent all of us (including myself in previous posts) from getting misunderstood.
The Kalio paper identifies the Pre-Uralic area along the Volga, and links it to the Mesolithic (EHG) Pit-Comb culture. In interaction with PIE (see below), during the 4th mBC Proto-Uralic emerged somewhere on the Upper Middle Volga near Nizhniy Novgorod, with uncertainty whether it extended rather westerly, i.e. in the Volga-Oka region including the Moscow area, or easterly in the Volga-Kama region towards the Ural. Within the spread of Proto-Uralic in different directions.
Proto-Finnic is seen to have developed somewhere between Nizhniy Novgorod and the southern shores of the Gulf of Finland, including Southern Estonia, and is linked to later stages of Combed Ware (earlier stages would still have been Pre-or Para-Uralic). Kalio deems it likely that Proto-Fiinic overformed Para-Uralic languages that had previously been spoken on the SE Baltic coast. According to him, there are no bronze-age IE hydronyms “much north of the Daugava Basin”, though a few Germanic ones in S. Finland.
According to the Aikio paperProto-Sami would have been its northern neighbor, somewhere around Onega and Ladoga Lakes and the St. Petersburg areat. The Sami expansion into Karelia and Lapland is dated to the first centuries of the Common Area, possibly benefitting from the Sami occupying a central role in the emerging fur trade. In that process, the Sami would have absorbed previous "Paleo-European" populations.
In another paper, Aikio has analysed the non-Para-Uralic substrate in Sami. It includes terms with a Germanic “feel” (e.g. initial “sk-“) but without obvious etymological link, also terms shared with either Lithuanian or Icelandic/Old Norse (but in both cases no other IE language, so apparently shared substrate), but most stuff looks impossible to linguistically assign further.
Among the several Germanic borrowings identified in Sami, ruovdi “iron, from PGerm *raudan (liter. “the red, rusty one”) is most instructive, since it allows to date Sami-Germanic language contact to the early Iron Age. [Kristina: My question on Finn hyppia vs. “to hop” was also answered, it is in fact a loan from Germanic.]
Even though Kalio places the Proto-Finnic homeland in/close to Southern Estonia, he refutes assigning CWC to Pre-Uralic. Instead, it is tentatively assigned to NW PIE (as ancestor of Proto-Balto-Slavic), noting however that this association is at odds with archeologists' interpretation that focuses on prehistoric cultural continuity in Estonia, i.e. little archeological indication for an Iron Age culture/language shift from (Pre-/Para-) IE to Finnic in Northern Estonia and Livonia.
Toponymic evidence for a Baltic linguistic expansion into today's Russia (Volga-Oka region) is dated as recent as 200-600 AD, though a thin Baltic superstrate may already have been present there before, but there are far less Baltic loanwords in Mordwin than in Finnish and Saami.
Another interesting takeaway from the Kalio paper: Setting forth previous arguments by Witzel and others, hee sees Proto-Indo-Iranian (->Proto-Tocharian) only arriving in the Altai with Sintashta/ Andronovo. According to Kalio, the first "Steppe" culture that can be plausibly associated with Proto-Indo-Iranian is Abashevo, i.e. Yamnaya's successor in the Samara area, ca. 2500-1900 BC. The interaction of Abashevo with Pre-Uralic resulted in the emergence of Proto-Uralic, which Kallio ties to the early CA Volosovo Culture on the Middle Volga.
This is interesting for two reasons:
1. Volosovo was after ca. 2800 BC succeeded by the Fatjonovo culture, which is generally regarded as part of the CWC phenomenon. Yet, for the region to serve as Proto-Uralic homeland, CWC NW PIE speakers must have been assimilated into Proto-Uralic, and that is also what Kalio explicitly states (p.79). He even goes further in suggesting substantial IE substrate in Uralic, enhanced by later (Proto-)Finnic absorption of Para-Baltic speakers on the Estonian coast.
2. If Yamnaya didn’t arrive via the Volga (blocked by Pre-Uralics), nor via the Dniestr (blocked by GAC), the only route to the Baltics was up the Dniepr via Belarus. Belarussian C14 dating is a bit shaky,, and appears to be neither calibrated nor controlled for reservoir effects. Nevertheless, what is there suggests that Northern Belarus remained firmly in Narva Culture hands up to at least 3000 BC, leaving only one route, namely along the Pripyat. And that is also where most of the “finds from the CWC “A-level horizon” concentrate (link 2, map p. 171). Whether they document a route up north, or already the way back south, is difficult to say with the bit of C14 data available (3960 BP).
But the find context is interesting - wetland settlements, with strong aquatic orientation (fish, waterfowl, and turtles, possibly domesticated), only (or already?) 22% bone share of domestic animals, especially cattle.
For the exceptional LN/CA settlement density and stability recorded in CE’s second largest wetland area on Spree and Havel, plus what we know about Ljubljana Marshes, Alpine pile-dwellings etc., I had already been expecting a similar pattern in the largest wetland area, i.e. the Pripyat Marshes. But as CWCs main migration path, and center of settlement? Not really. Though - Baltic Coast, Switzerland, wooden trackways over Frisian swamps - there seems to be a pattern .shining up, which has a bit to do with pastoralism, but very little with horse-drawn carts. Battle axes? Or rather the timberman’s preferred tool for building boats, pile-dwellings and trackways?
As link 1, from which most of the information has been taken, concludes:: “Such wetland settlements may considerably surpass the discovered “elevated” settlements, and this gives grounds to a questioning of the concept of the “corded” component’s minor importance in the cultural shift of III–early II millennium BC in [the Pripyat] Palesse “.
.
http://www.academia.edu/8610540/Belarusian_Wetland_Settlements_in_Prehistory
http://talpykla.istorija.lt/bitstream/handle/99999/2411/LA_19_167-174.pdf?sequence=1&isAllowed=y
Btw, quick attempt to see how East Asian populations model without an East Asia ancestor pop (using Anatolia Neolithic, BedouinB, Caucasus_HG, Esan_Nigeria, Hungary_HG, Karelia_HG, Loschbour_WHG, Masai_Kinyawa, Onge, Ust_Ishim):
Dai - Ust_Ishim - 49.5, Karelia_HG - 15.65, Onge 34.85 - Distance% - 11.5892%
Ulchi - Ust_Ishim - 45.15, Karelia_HG 22.35, Onge 32.5 - Distance% - 11.4561%
Nganasan - Ust_Ishim - 41, Karelia_HG, 29.45, Onge 29.55 - Distance% - 10.4603%
Yakut - Ust_Ishim - 41.5, Karelia_HG - 30.6, Onge - 27.9 - Distance%: 9.7497%
Itelmen - Ust_Ishim - 33.3, Karelia_HG - 36.45, Onge - 30.25 - Distance%: 10.8044%
Just for curiosity. Pretty horrible fits! Within that though, what nMonte does logically makes sense - Ust Ishim is a neutral but "Crown Eurasian" group, Onge is ENA, Karelia is the closest West Eurasian to East Asian and East Asian is closer to West Eurasia than Onge is - although that's not going to be the way it happened.
Onur, I do not know if the evidence is conclusive for Afanasievo to be IE. It is a pity that we do not know anything about Afanasievo yDNA. Autosomally they seem to be like South Russian Yamnaya, so we are back to the question if Yamnaya were IE or not.
Andronovo remains which were tested in an earlier paper were mostly R1a1 and Andronovo carries European Neolithic farmer ancestry which Afanasievo and Yamnaya lack. IMO, it is significant that according to David’s file:
Afanasievo 0% Anatolia Neolithic, 9% Caucasus HG
Andronovo 18-20% Anatolia Neolithic, 0-2% Caucasus HG
Yamnaya Kalmykia 0.4% Anatolia Neolithic, 5.75% Caucasus HG
In Allentoft et al, Afanasievo mtDNA is T2c1a2, 2xJ2a2a and U5a1a1. According to Eupedia, T2c is found mostly in the Near East and Mediterranean Europe; and T2c1 is found in Iran, Iraq, the Arabian peninsula, Italy, Sardinia, Spain and Central Europe. J2a2 is found mostly in the Near East and North Africa, and J2a2a in Italy, Anatolia, the Levant and Yemen. Only U5a1a1 is a typical Steppe haplogroup.
So, if we presume that Afanasievo spoke Tocharian which is the only Eastern branch, excluding Indo-Iranian, we must recognise that they are very close to Yamnaya unlike other IE groups which seem to carry Anatolia Neolithic. However, the time difference between Tocharian texts and Afanasievo Culture is any way very big: 600AD and 2900-2500 BC, and I am more inclined to link Tocharian with R1a1 and Altaian Andronovo (1700-1500 BC) and Xiaohe (2000-1500 BC) which both have R1a1 and are contemporary.
By the way, Wikipedia tells about Abashevo that the Abashevo ethno-linguistic identity is a subject of speculation, although it likely reflected a merger of the earlier Iranian steppe Poltavka culture, an extension of Fatyanovo-Balanovo traditions, and contacts with speakers of Uralic; Abashevo was likely the area in which some loan-words entered Uralic. The skulls of the Abashevo differ from those of the Timber Grave culture, early Catacomb culture, or the Potapovka culture. Abashevo probably witnessed a bilingual population undergo a process of assimilation.
@anon
Can you scale it up to 10Mix?
Matt,
Chimp BedouinB2 Mbuti MA1 0.333
Chimp Caucasus_HG2 Mbuti MA1 0.3583
Chimp Han Mbuti MA1 0.3429
Chimp Iberia_Chalcolithic Mbuti MA1 0.3571
Chimp Iberia_Mesolithic Mbuti MA1 0.3774
Chimp Karitiana Mbuti MA1 0.3938
Chimp LBK_EN Mbuti MA1 0.3501
Chimp Motala_HG Mbuti MA1 0.3995
Chimp Samara_HG Mbuti MA1 0.4205
Chimp Yoruba Mbuti MA1 0.0929
Chimp Onge2 Mbuti MA1 0.341
Chimp Onge2 Mbuti BedouinB 0.3155
Chimp MA1 Mbuti BedouinB2 0.3476
Chimp MA1 Mbuti Caucasus_HG2 0.371
Chimp MA1 Mbuti Han 0.3519
Chimp MA1 Mbuti Iberia_Chalcolithic 0.3643
Chimp MA1 Mbuti Iberia_Mesolithic 0.3877
Chimp MA1 Mbuti Karitiana 0.4042
Chimp MA1 Mbuti LBK_EN 0.3615
Chimp MA1 Mbuti Motala_HG 0.4094
Chimp MA1 Mbuti Samara_HG 0.4298
Chimp MA1 Mbuti Yoruba 0.0933
Chimp MA1 Mbuti Onge2 0.3489
Chimp BedouinB Mbuti Onge2 0.309
https://drive.google.com/folderview?id=0B962TtPkX1YnWDhUNkZZVFNSZ3c&usp=sharing
D-stats4 EHG in columns
D-stats5 EHG in rows
Kharia are included
@hujibregts
Haha! I'm still not very clear,even after reading Davidski's posts :( , why any of these n-mix are actually useful things. However ..
The problem with increasing n is that as it stands one loops (100 choose n) times to fill the proportions array - that's not pretty time-wise as n increases. Then one has to multiply a (100 choose n) x n - matrix with an n x 8 one and do the searches.
There may well be a more efficient way of filling the array, but I haven't spotted it yet. One probably wants to calculate it once and save it and then load from file for n >=6, say.
Perhaps a better strategy would be to choose the proportions randomly rather than deterministically. That ought to work pretty well, but I haven't done experiments on it.
Anyway, very roughly, on my (mid-range+) laptop: 5-mix_multi works ok - array takes a minute to populate, running get_mix() takes a few seconds. On the other hand, 6-mix takes ~5 minutes to populate and longer to run the get_mix(). (Again, a less exact search would speed things up and presumably not make much difference to the results.) Any larger n looks something for a much more powerful machine (or an overnight run) - for even one combination of populations.
Same (naive) approach to 10-mix: the array took longer than my patience - but one could pursue it if it was really useful.
@Chad
Thank you for the MA1 stats, and all the others with Onge. Finally we can make good models for S-C Asia.
Ma1 seems to work fine. It doesn't "eat up" Srubnaya like EHG does, while not adding a lot of East Asian like Okunevo et al. And the fits are really good for the few I've tried:
Pathan
"Caucasus_HG" 25.75
"MA1" 18.75
"Anatolia_Neolithic" 17.8
"Srubnaya" 13.85
"Dai" 12.55
"Dravidian_India" 5.35
"Onge" 3.65
"Esan_Nigeria" 2.3
distance=0.001381
GujaratiA
"Caucasus_HG" 21.55
"Srubnaya" 16.65
"MA1" 16.6
"Anatolia_Neolithic" 14.7
"Dai" 13.7
"Dravidian_India" 9.3
"Onge" 4.9
"Esan_Nigeria" 2.6
distance=0.000773
Kalash
"Caucasus_HG" 30.35
"MA1" 19.1
"Anatolia_Neolithic" 15.65
"Srubnaya" 11.15
"Dai" 10.4
"Dravidian_India" 10.05
"Onge" 1.8
"Esan_Nigeria" 1.5
distance=0.000981
It does take quite some Dravidian. But that's somewhat expected with all the shared ancestry between MA1 and South Asian.
@tchaz
Yes, initializing these gigantic arrays deterministically takes an awful lot of time.
Filling the array randomly seems a better strategy and this what I have done in my tool nMonte.
But maybe anon can think of clever optimization tricks.
sorry hujibregts - I am that anon :(
Which means that you actually exclude what you claim to include - by your phrase "newer linguistic studies".
Doesn't that imply that you're actually transgressing common objectivity and basic scientific criteria?
batman,
Enough with the off-topic rants.
You're not going to convince anyone here that Wiik's or anyone else's fringe theories of Uralic continuity in Northern Europe since the Upper Paleolithic make any sense.
I certainly don't have much patience for fringe theories not supported by any new data. So save them.
huijbregts,
What is your e-mail? I'd like to ask some questions about nMonte. Thanks!
@Davidski,
With your D_stats, it's impossible for Georgians to be 60%+ CHG.
These are the results Georgians non-CHG side would have to get.
If Georgians were 70% CHG.
D(Chimp, Non-CHG)(Mbuti, CHG)=0.31.
D(Chimp, Non-CHG)(Mbuti, Hungary_EN)=0.43
D(Chimp, Non-CHG)(Mbuti, Karitiana)=0.35
If Georgians were 60% CHG.
D(Chimp, Non-CHG)(Mbuti, CHG)=0.34
Unless, it's possible for a population to be as close to EEFs as EEFs are to each other but as distant from CHG as they are from East Asians, it's impossible for Georgians to be 60% CHG.
The most likely scenario is Georgians are 50-30% CHG and the rest of their ancestry is mostly from close relatives of EEF.
I'll go with ~50% Kotias-related ancestry in Georgians.
Post a Comment