search this blog

Tuesday, March 20, 2018

The Iberomaurusians

I can honestly say that I've suddenly become a more open minded individual after running the five Iberomaurusian samples from M. van de Loosdrecht et al. 2018 in my Global25 Principal Component Analysis (PCA).

They're certainly a curious bunch. In many pairs of the 25 PCs, they sit alone, in parts of the plots that I never expected to see populated. Interestingly though, modern-day North Africans often "pull" towards them, suggesting moderate to strong genetic continuity in North Africa since the Pleistocene. The PAST datasheet used to produce the plots below is available here.

To analyze this in more detail, I ran a series of nMonte mixture models for seven North African populations using Global25 scaled data. The models show the Iberomaurusians as one of the two best reference options for all of these North African groups except the Egyptians, which, at the very least, is an outcome that fits nicely with geography.

[1] distance%=2.5772 / distance=0.025772


Levant_BA 30.9
Iberomaurusian 24.1
Iberia_EN 17.9
Iberia_BA 14.45
Yoruba 11.85
Ethiopia_4500BP 0.8
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0


[1] distance%=2.7927 / distance=0.027927


Levant_BA 73
Iberia_BA 7.7
Ethiopia_4500BP 7.55
Yoruba 5.3
Iberomaurusian 4.45
Iberia_EN 2
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0


[1] distance%=1.6931 / distance=0.016931


Levant_BA 56.8
Iberomaurusian 11.75
Iberia_BA 10.05
Yoruba 8.55
Natufian 6.55
Ethiopia_4500BP 3.4
Levant_N 2.9
Iberia_ChL 0
Iberia_EN 0
Iberia_MN 0
Iberia_Southwest_CA 0


[1] distance%=1.7158 / distance=0.017158


Levant_BA 35.3
Iberomaurusian 25.85
Yoruba 14.6
Iberia_EN 13.35
Iberia_BA 10.9
Ethiopia_4500BP 0
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0

[1] distance%=2.4367 / distance=0.024367


Iberomaurusian 29.6
Levant_BA 25.9
Iberia_EN 21.7
Iberia_BA 11.55
Yoruba 11.25
Ethiopia_4500BP 0
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0


[1] distance%=2.3656 / distance=0.023656


Iberomaurusian 36.5
Levant_BA 17.15
Levant_N 13.7
Iberia_EN 12.85
Iberia_BA 9.95
Yoruba 9.55
Ethiopia_4500BP 0.3
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Natufian 0


[1] distance%=2.0838 / distance=0.020838


Levant_BA 41.85
Iberomaurusian 20.85
Iberia_BA 13.9
Iberia_EN 11.45
Yoruba 9.4
Ethiopia_4500BP 2.55
Iberia_ChL 0
Iberia_MN 0
Iberia_Southwest_CA 0
Levant_N 0
Natufian 0

Using the same methods, I also basically reproduced the ancestry proportions from the main mixture model for the Iberomaurusians in M. van de Loosdrecht et al. (~60/40% Natufian-like/Sub-Saharan African-related). But clearly, the very poor statistical fits suggest that, much like for the model in the paper, something is way off.

[1] distance%=25.4991 / distance=0.254991


Natufian 55.85
Tanzania_Luxmanda_3000BP 21.5
Ethiopia_4500BP 21
Tianyuan 1.65
ElMiron 0
GoyetQ116-1 0
Levant_N 0
Malawi_Hora_Holocene 0
South_Africa_2000BP 0
Ust_Ishim 0
Vestonice16 0


[1] distance%=24.6253 / distance=0.246253


Natufian 65.45
Dinka 22.9
Yoruba 9.45
Tianyuan 2.2
ElMiron 0
Ethiopia_4500BP 0
GoyetQ116-1 0
Levant_N 0
Malawi_Hora_Holocene 0
South_Africa_2000BP 0
Tanzania_Luxmanda_3000BP 0
Ust_Ishim 0
Vestonice16 0

The updated Global25 datasheets are available at the links below. Here's a challenge for the people in the comments: try to come up with a coherent, chronologically sound, mixture model for the Iberomaurusians that shows a distance of less than 15%. I don't think that this is doable just yet, and won't be until we have at least a few more ancient forager samples from Africa and the Near East, but let's see what happens anyway.

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient


Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern


M. van de Loosdrecht et al., Pleistocene North African genomes link Near Eastern and sub-Saharan African human populations, Science 10.1126/science.aar8380 (2018)

See also...

Unleash the power: Global 25 test drive thread


Chad said...

Something is off. SSA > Iberomaurusian is being difficult.

Unknown said...

Yes! Great analysis.

Steven said...

So ancient North Africans were more closely related to Ethiopians and Somalis?

Shaikorth said...

Scratch 15%, it isn't possible to get below 20 even if overfitting and using modern North Africans.

With more plausible pops it's always something like this:
distance%=24.533 / distance=0.24533


Natufian 66.8
Mandenka 21.9
Ethiopian_Anuak 11.3

A curiosity perhaps, but even though they suppose Natufians can't have Iberomaurusian admixture they do help with fitting Natufians using nMonte. Nothing that can be done to model Natufians using the scaled sheet without Iberomaurusians is as good as this simple fit:
[1] "distance%=9.0292 / distance=0.090292"


Levant_N 81.1
Iberomaurusian 18.9

Open Genomes said...

I think the problem here is that Levant_N (obviously) has Natufian ancestry. What we need is a model of an ancient native Anatolian Neolithic-like population (minus and Iberomaurisan admixture) so we can see the true contribution fo the Iberomaurusians to Natufian ancestry.

This "excess" 18.9% of Iberomaurusian ancestry in the Natufians compared to the Levant_N is what was lost through drift and the subsequent admixture with an Anatolian Neolithic-like population to produce the Levant_N.

Basically, the order of admixture should be:

Iberomaurusians + "Anatolian Neolithic-like" > Natufians
Natufians + "Anatolian Neolithic-like" > Levant_N.

Is there anyway to reconstruct what this non-Iberomaurusian component in the Natufians would look like?

Chad said...

Iberomaurusian > Natufian doesn't go with the fact Natufians have no relationship with Africans. Natufians and Iberomaurusians share an ancestor.

Shaikorth said...

@Anthro Survey
No Levantines in this fit and it's even better, but I suppose these Yemenites have lots of Natufian ancestry too.
[1] "distance%=8.9298 / distance=0.089298"


Yemenite_Mahra 74.00
Iberomaurusian 17.75
Iberia_EN 8.25

Anatolian Neolithic + Iberomaurusian isn't as good, though barely below 15.

[1] "distance%=14.9118 / distance=0.149118"


Barcin_N 63.6
Iberomaurusian 36.4

zardos said...

Iberomaurusian = Proto-Natufian + Negroid-related African population. Thats what the data shows.
Two problems:
1. We still don't know the Basal Eurasian source population for West Eurasians and its locality. In any case, its not Iberomaurusian because of the SSA shift. Even if IM made it to Eurasia, they can't be the main source of BE nor Natufian.
2. The Negroid-related African population is unknown and seems to be not particularly closely related to any modern Subsaharan group. So it most likely went largely extinct. It might be an interesting relict population at the root of many modern SSA though.

Skordo said...

Could the natufian component explain the ev13 found in Spain in ancient times?

astenb said...

Natufian is Z830.
Torforalt is is V68.

IF we are attaching the shared Component to Haplogroup E, the ancestor is deep in the phylogeny where these two lineages split which brings you back to E-M35 25kya. The idea that Natufian cant be Partly *African* due to Torforalt having SSA that is absent in problematic for a number of reasons:

(1) Because the followup up to that commentary says that the common ancestry could have come from an older population "present in *North Africa*" or the Middle they just debunked themselves.

(2) This same argument was made for Ancient North Africans prior to IAM.....Post IAM....and is exactly what got the "Ancient MENA's have no SSA Affinity" proponents into the predicament in the first place. We dont even know what type of SSA Affinity we are looking at. Furthermore it may not even be correct that Torforalt has mixed ancestry going by our current model.

@Davidski...when you say you have become more "open mindned" minded about What exactly?

Ryukendo K said...

I recommend everyone just stop responding to xyman and let David delete all his posts.

@ David
Are the Malawian HGs in the Global 25?

EastPole said...

Interview with David Reich:

Nothing special, but the interview was made during testing some samples from Central Asia. He mentions: “This is from a 4,000-year-old site in Central Asia — from Uzbekistan”
And Reich is standing at a whiteboard with some writings on which may be related to it. His head is probably covering Yamnaya.
Some green arrows suggest interaction between South Andronovo and Northern South Asia:

Lank said...

Thanks for running this, David. Would you mind adding the Tanzania Pemba sample from 1400 ybp to the datasheets? This sample is more HG-like, it would be interesting to add as an ancient East African HG representative considering the "Hadza" proxy used in the paper.

bellbeakerblogger said...

@Open Genomes,
If Natufians have near zero Neanderthal ancestry, then I don't believe any model would work with Eurasian as a donor.
I think the problem here is modeling the SSA ancestry. Reasonably it fits that eastern mtdna M1 is on par with the percent of SSA ancestry. Makes sense that it would look the direction Luxmanda or Mota, but one or both of these may have a component too similar to Basal Eurasian to be useful, and maybe too much East Afrian-specific.
It might be overfitting Natufian (BE) ancestries when you look at the second model when Yoruba and Dinka are options. I goes down.

Just for laughs, I'd exclude all other Africans other than groups like Hadza or Bushmen. Obviously it didn't like ancient South African, but it could be searching for a population in the Savanna plateau on the western side of Rift Valley, eastern great lakes into S. Sudan and north to the Acacus. If Hadzabe or Sandawe had a cultural relation to Roundheads, then Iberomarusian might been something like Natufian + Roundhead or whatever.

No idea what pops are available, been too busy lately to get into all this.

bellbeakerblogger said...


Ha, I just posted that. Well, apparently not Hadza. Wasn't able to read the paper. It was a good guess though.

Eren said...

@David: nice that you finally made a blog entry on the Taforalt samples.

Regarding the Turkish samples, there still seem to be mix-ups. This is most noticeable when looking at samples from Trabzon:


These score East-Eurasian, when they shouldn't. Meaning they are falsely labelled. Could you remove all HO Turkish samples from your dataset and re-insert them directly from the HO dataset? Whenever you have time that is.

André de Vasconcelos said...


"Its funny how we actually have evidence of a race war based on archeology 13000 years ago in africa"

Any source for that?

zulla said...

interesting stuff.

Ryukendo K said...

That site (Jebel Sahaba) is very famous among anthropologists for containing one of the earliest archaeological attestations of organised inter-group violence, but 1) it's not new and 2) the whole "race war" angle is absurdly sensationist and built on the flimsiest evidential bases.

Anonymous said...


Are those North-Africans valid Berbers? Both Morocco and Algeria know a lot of Arab citizen that have different ancestry.

K33 said...


Did you include "KHOISAN" as a possible source population for North Africans? Please try that if you haven't... I bet the fits improve.

DNA paper is coming soon with some divergent Medieval Moroccans best modeled as half Sardinian-like and half-Khoisan!

Please see my post here, and let me know what you think:

zulla said...


Thanks for the input.

The point that @bronze made is that no such proof of violence is available to justify aryan "invasion" into south asia, even though the purported date is much later.
If you are aware of any relevant archaeological proof, do share.

K33 said...

I would also note:

Certain Sahrawi and Haratin people have CLEAR Khoisan features..



Rob said...

@ EastPole
With an arrows going north to Yamnaya from in between Anatolia and SC Asia ?

capra internetensis said...

trying with East Africans - first Luxmanda. she will consistently take Iberomaurusian if it is included, but the fit is always worse than just using Natufian or LevantN, e.g.

= 58% Mota, 42% Iberomaurusian, 0% Dinka - distance 13%
= 47% Mota, 35% LevantN, 18% Dinka - distance 5.8%
= 46% Mota, 37% Natufian, 17% Dinka - distance 4.6%
= 55% Mota, 21% LevantN, 19% Iberomaurusian, 5% Dinka - distance 6.9%
= 53% Mota, 28% Natufian, 12% Iberomaurusian, 8% Dinka - distance 6%

so i don't know whether if it's just plain no good or whether it is partly good and partly bad.

Arza said...

@ Rob
Rather not. Yamnaya is the number one here, not a D.

Green arrows going from the top - West_Siberian_HG. There are also "ADMIXTURE bars" drawn everywhere, showing that West_Siberian_HG ancestry was acquired between Sintashta/Andronovo_NW 1900 BCE and Andronovo_SE 1500 BCE. It's probably the Srubnaya_outlier-related ancestry that pops up constantly in India.

Can anyone decode what's written above "Northern South Asia" and probably to the side of 2200(0)BCE - 1500 BCE cline?

Rob said...

@ Arza
Well we won’t see an arrow if he’s standing in front of it :)
But maybe A-C are differently episodes to 1-4

And it’s “Turan”

EastPole said...

„With an arrows going north to Yamnaya from in between Anatolia and SC Asia ?”

I think Reich’s head is covering No.1 (Yamnaya or Sredny Stog steppe population);
No. 2 is CWC (2900 BCA made of No.1 Yamnaya +WHG +Anatolia in red);
No. 3 is Sintashta – Andronovo_NW (1900 BCA and is derived from No. 2 CWC);
No. 4 is Andronovo _SE (1500 BCA derived from No. 3 Sintashta and admixed with No. 1 Yamnaya shown by black arrow and some West-Siberian Hg in green arrows)
1500 BCA is the time of interaction of Andronovo_NE with North South Asia shown by green arrows.

Samuel Andrews said...

Using apADM or D_stats with only Eurasian outgroups should remove the problem of finding a good SSA reference for Iberomaurusian.

Shaikorth said...

@Samuel Andrews
They did a test like that, fitting Iberomaurusians as Natufian+SSA pop with only Eurasian outgroups the result was 35-38% SSA (whether it be ancient Khoisan, Mende, Pygmy, Mota or Dinka) rest Natufian.

Rob said...

@ east pole
Yes it’s interesting that andronovo only shows bilateral “interaction” with SCA instead of migration (?), which makes it look like Krause’s model
Anyhow, as fun as it is, mere speculation at this stage

Davidski said...


I'm seeing a lot of arrows pointing east and south, and only a few pointing north.

So this isn't Krause's model. It's a model that explains steppe admixture and R1a-Z93 in South Asia.

Shaikorth said...

From the supplements, even the best-fitting tree for Iberomaurusians has issues. Problematic F-statistics listed for that model involve stuff like:
Probable extra affinity between Yoruba and Iberomaurusians (less likely alternative is East Asian or EHG in Mbuti)
Probable extra affinity between Mbuti/Yoruba and Natufians (less likely alternative is East Asian in Iberomaurusians)
Extra affinity between WHG and Natufians
Nothing suggesting Euro-HG extra in Iberomaurusians

Arza said...

@ Rob
Reich's arrow - it points both ways, unless observed. ;)

@ EastPole
I think that green ones are rather representing local substrates and the whole graph shows situation just before the interactions started.

Davidski said...

@ryukendo kendow

Not sure which Malawi HGs you mean? Do a text search in the datasheets, and if the samples you're interested in aren't in there, send me a link to their genotype data, and I'll try and run them.


I didn't change the labels for the Turkish samples. They're the same as in the Human Origins. I'll double check later today, but I don't think I made a mistake with the labels.


I don't know if these North Africans are "valid Berbers" or not. I'm just running the North African samples that are available and using the labels that came with them.

Rob said...

Ah yes it could be the situation on the Eve, so to speak, c. 1500 BC

Samuel Andrews said...

"interesting stuff."

Media hysteria. Iberomaurusian should not be called white or Caucasian or anything like that. Even if there was race-based conflict between them and some black SSAs it wouldn't match up the white vs black war this article is going for.

K33 said...

Adding South_Africa_2000BP and/or Ju_Hoan_San to the nMonte runs for the modern North Africans (but especially Saharawi and Mozabite) will improve the fits.

Bet on it...

Simon_W said...

Interesting, for the inferred coords of my North Italian grandfather nMonte wants some Iberomaurusian:

"French" 30.55
"Beaker_Northern_Italy" 25.15
"Croatian_MBA" 11.6
"England_Roman_outlier" 10.95
"Baden_LCA" 10.9
"Croatian_vLBA:I3313" 5.7
"Peloponnese_N_outlier:I3920" 2.55
"Iberomaurusian" 2.2
"Yoruba" 0.4

It's getting more and more difficult to explain this away as spurious.

Davidski said...

@Lenny Dykstra

Adding South_Africa_2000BP and/or Ju_Hoan_San to the nMonte runs for the modern North Africans (but especially Saharawi and Mozabite) will improve the fits.

Go for it. Let us know what you find.

Matt said...

Rob: But maybe A-C are differently episodes to 1-4

And it’s “Turan”

(Briefly surfacing to comment on this; won't be around for any more discussion until much later in the week towards weekend, sorry).

My.... a picture tells a thousand words? (Or rather a thousand words we're waiting in suspense for!) So, from this model that we can assume Reich lab is playing with:

a) I may be blind here, and the writing is fairly indecipherable, but surely what is written is "Swat" not Turan? Indicates Swat Valley = "Northern South Asia", (we know they are sampling Swat after all, and this seems to indicate that must surely have good samples?).

b) there are differences in the Andronovo horizon from NW to South (SE?)

c) input of West Siberian HG into the past leading between Andronovo_NW -> Andronovo_SE(?)

d) mutual influence between the path leading from Andronovo_NW -> Andronovo_SE and the path leading from what seems likely to be Iran_N/CHG -> Swat; not one way influence.

(Note that influences between Andronovo_SE and Andronovo_NW are a one way influence - Andronovo_SE does not influence Andronovo_NW, and differs because of the absorption of what looks like Iran and West Siberian related streams. The bar suggests differences from NW amount to about 10-20% of ancestry. )

e) no indication of a path from ASI -> Swat, though perhaps this is just a function of model form

Assuming I've read this right, this means the following are at least plausible to Reich lab with the data they've got that we don't have:

Looks like vindication for Seinundzeit's models using the Srubnaya_Outlier, surely more likely now to be simply be a sample of "West Siberian HG" that we have previous described as Ancient North Eurasian (because the earliest antecedents of this population *were* pan-North Eurasian in their distribution... but later ones, perhaps not?).

Looks also likely vindication also for those (I believe Sein and Ryu?) who were pretty sure that Andronovo contribution into Sarmatian/Scythian was masked by absorption of Siberian / Central Asian HG. Against the idea that I played with after the Sarmatian / Scythian paper (though I haven't really given it much though) of Sarmatian/Scythian being more about re-emergence of Yamnaya like ancestry?

Personally, I was pretty sure early on there would be some difference in the Andronovo horizon from the samples we know, but became a bit self doubting, as I still don't fully understand how the relationship between Andronovo_NW and Andronovo_S can be unidirectional...

Davidski said...

Srubnaya_outlier has been pointed out as potentially a key sample for South Asia at this blog for at least a couple of years. It'll be awesome if this is vindicated by the ancient data from West Siberia/Central Asia.

Arza said...

IBM (G25) vs. IAM PCA

Rob said...

Not to toot my own horn....but I will. Pretty sure i first pointed out Srubnaya Outlier whilst the focus was on Ulan IV. But before we all start patting each others backs, I'm getting different vibes from that pic than some others here appear to be.

Grey said...

Lenny Dykstra said...
"Certain Sahrawi and Haratin people have CLEAR Khoisan features.."

i know phenotype stuff bugs some people but the hint of eye folds among some of those Saharan populations always seemed like it might be a big clue.

Although if there was originally another branch of San north of the equator who disappeared they might not match the southern version exactly.


Shaikorth said...
From the supplements, even the best-fitting tree for Iberomaurusians has issues... (snip)
...(less likely alternative is East Asian in Iberomaurusians)"

Going back to the eyefolds thing I don't suppose that could work the other way instead i.e. Iberomaurusian in East Asian?

Davidski said...


The vibe I'm getting from that NYT pic is very similar to the vibe that I described here.

These models look fine in terms of the statistical fits. In fact, much more than just fine in most cases. My prediction is that a population like Potapovka2-Srubnaya_outlier will eventually be discovered on the Late Bronze Age steppe, perhaps even at a site linked to the Andronovo horizon, and it'll fit the bill as a main player in the story of the peopling of South Asia.

Through time AND space?

Let's not kid ourselves that things will turn upside down when this paper with Central Asian samples finally comes out.

Arza said...
Ancient genomes from North Africa evidence prehistoric migrations to the Maghreb from both the Levant and Europe

Neolithization of North Africa involved the migration of people from both the Levant and Europe

Guanches were added to the analysis. In f3(IAM,population;Ju_hoan_North) they are above Natufians.

Tobus said...

@Matt: surely what is written is "Swat" not Turan

It's messy writing, but compare the "n" in "Northern" just below it (and on the rest of the board). Compare the "t"s in "Northern", "South" etc. The scrawl in question definitely ends in "n", not "t".

There are 3 other capital "S"s on the board, none of them anything like the "T"-like letter at the start of the scrawl... I'm calling this for Turan, with 90+% confidence.

bellbeakerblogger said...

Katzman offers a clear picture of Iberomaurusian which would be useful to many readers.

If it did indeed have origins or connections to the Late Ahmarian/Early Kebaran then it might be the case that the much later Natufian has more actual affinity to WHG than supposed, and that is one of several problems.

Regardless, both Iberomaurusian and Natufian poke around a low level of Yoruba, which may be a real thing drawing something up from the Nile Valley, not Mushabian, but maybe something teasingly prior to this.

"Extra affinity between WHG and Natufians"
I think that's likely, whatever that is. It solves several of the problems you quoted, plus your first post.

Ryukendo K said...

@ Davidski

David, found the Malawi HGs, Malawi_Fingira and etc.

A question: how much East Asian ancestry gets picked up in Norwegians, Swedes, and Lithuanians in qpAdm?

Rob said...

@ Dave

Nope, couldn;t beat your fit for IBM.

BTW, for Iron Gates:

Iron Gates (normal)
Villabruna 83%
Afantova 11.5%
Boncuklu 5.5%
(f 0.047%)

I_G outlier
VB 39%
Boncuklu 56%
Natufian 4%
Afantova 1%
(f 0.03%)

Open Genomes said...

This model makes certain assumptions:

1. That the Natufian descends from an Iberomaurusian-related Proto-Afro-Asiatic population (even though the Natufian was E-PF1962* and the Iberomaurusians were E-M78*, on "opposite sides" of E-M35)
2. That based on the widespread presence of E-Z1515,a sister clade of Natufian E-PF1962*, which is common in East Africa among Chadic speakers and pastoralists down to Nambia and which is is almost non-existent in the Near East, the East African Afro-Asiatic "Natufian-like" component is not Eurasian in origin, but rather native to Africa and never left Africa.
3. That the Natufians are a mix of an African-derived people (the Ramonian/Mushabian archaeological culture?) and native Levantine Near Eastern LGM hunter-gatherers (the Geometric Kebarans). Since we don't know exactly what the Geometric Kebarans looked like autosomally, this model uses one Barcin Neolithic individual (with as little "Natufian" as possible) and an Early Neolithic Iranian pastoralist.

This model shows that the Natufian was 62% "Proto-Afro-Asiatic" from Africa ("Basal Eurasian"?) and 38% native Near Eastern Mesolithic Kebaran hunter-gatherer:

"distance%=8.2995 / distance=0.082995"


Barcin_N:I0745 38
Tanzania_Luxmanda_3000BP:I3726 36
Iberomaurusian 26
Iran_N:AH2 0

Davidski said...

@Lank and ryukendo kendow

I've added the following ancient Africans to the G23 datasheets just now, but they're low quality samples, so keep that in mind.


@ryukendo kendow

qpAdm shows 0-5% East Asian/Siberian admixture in non-Finnish/non-Russian Northern, Central and Eastern Europeans, depending on the model. But I'm not sure whether it's a useful method for estimating minor admixture between modern-day populations.

@Open Genomes

The distance in your last model is a little too high, which suggests that it can be improved.

Also, I wouldn't use Africans younger than the Natufians to model the Natufians, because Natufian-related gene flow into Africa might be making it look like Natufians have African admixture, which is something that has been strongly argued against in recent scientific literature.

Anonymous said...

In all of your PCA dimmensions, which direction does it tend to point to? We know by a recent study that:
1. The direction towards SSA reflects Ancients and non-Hominids.
2. The direction towards Eurasians+Amerindians reflects the Neanderthals.
3. The direction towards Papuans reflects the Denisovan.

So, this might as well be a signal of the Basal (pre or post Anatolia/Natufian split) coupled with either:
1. An ancient hominid. We know they existed in Africa and contributed to modern SSA DNA.
2. A population similar to the ANE (not genetically) but in the role played. Call it Ancient Meridional Africans, AMA, or anything else you would like to call it.
3. An exctint African people.
4. A very mixed African population.

I really don't know though.

Chad said...

There is no SSA in Natufians, fellas. You need to stop modeling Natufians as part Iberomaurusian. It's a total failure all around. You're wasting time and completely ignoring relevant stats.

Davidski said...


I don't know either.

But I can tell you that on such global plots Denisovans and Neanderthals cluster with Sub-Saharan Africans in dimensions 1 & 2.

capra internetensis said...

looking at West Africa:

100% Esan (distance 0.9%), 100% Mende (distance 3.4%), 100% Dinka (distance 21%), 100% Mota (distance 27%) all with 0% Iberomaurusian. only using Pygmy or San as the only other option fits it - 73% Biaka, 27% Iberomaurusian (distance 27%).

88% Yoruba, 5.4% Dinka, 4.2% Iberomaurusian, 2.4% Biaka - distance 0.86%
with 3% Mozabite - distance 1.5%; with 2.8% Saharawi - distance 1.5%; with 2% Natufian - distance 1.7%; with 2% BedouinB - distance 1.8%.
Iberomaurusian actually seems to fit best.

59% Yoruba, 33% Mandenka, 6% Biaka, 1% Dinka - distance 1.4%
88% Yoruba, 7% Biaka, 4% Dinka, 1% Iberomaurusian - distance 1.5%
88% Yoruba, 7% Biaka, 5% Dinka - distance 1.6%
looks like a little bit via Mandenka maybe

Anonymous said...

"But I can tell you that on such global plots Denisovans and Neanderthals cluster with Sub-Saharan Africans in dimensions 1 & 2."
Yep, they're Ancients after all. ADMIXTURE runs their ancestry as a SSA component too.
But I was referring more generally to this here:
From this paper:

Now, who are these guys in here?
They may or may not be a clue - if they're the Khoisan for instance or the Mbuti, then the Iberomaurusians might as well have some deeply diverged, if not ancient hominid component.

Open Genomes said...

Natufians with Barcin_N, Iberomaurusian, and other African populations unlikely to have Natufian ancestry. In this case, the fit isn't quite as good, but the Iberoamaurusian goes up somewhat and Mota is much less that Tanzania_Luxmanda.
It that that this shows that Tanzania_Luxmanda cannot be modeled as a three-way mix between Natufian, Mota, and Hadza or Dinka.

[1] "distance%=8.9576 / distance=0.089576"


Barcin_N:I0745 56.2
Iberomaurusian 32.5
Ethiopia_4500BP 11.2
Dinka 0.0
Hadza 0.0

Samuel Andrews said...


It would make sense to me Natufians have direct Barcin-like ancestry in the same sense LevantN does but just less of it. I remember, in the paper with Neolithic North African genomes in the ADMIXTURE the Natufians scored in the north African and EEF components.

postneo said...

"veddoid australoid etc"

Phenotypically Veddas look completely unlike australoids. South Asia has people that resemble both Veddas and Australian aborigines.

bellbeakerblogger said...

@Chad, "You need to stop modeling Natufians as part Iberomaurusian"

Nope didn't suggest that would work. But Natufian is also not a 100% related to itself group.

Nirjhar007 said...

Nice interview of Reich . That board picture is obviously a teaser....

Folker said...

I read "Turan" too. Central Asia?

Rob said...

I’m inclined to agree with OG
Basal provably expansed from NE Africa after LGM ; but I disagree it never left Africa
Rather, it probably did back migrate 35 kya

zulla said...

Most interesting part of the Reich article was that Nick Patterson is ex British intelligence.

Davidski said...

Yeah, Nick actually sounds like a James Bond character when he talks.

Ric Hern said...

Maybe the Kiffians have something to do with the Iberomaurusians ? No Kiffian samples yet...

Jaydeep said...

what is going to be a huge disappointment is that the Harvard paper probably does not have any South Asian aDNA older than Swat which is around 1500 BC I think.

Hopefully there will be older samples from Central Asia and Iran.

Anonymous said...

One thing nobody commented on yet: We should stop calling this culture the Iberomaurusians. They are the Maurusians.

Aram said...

Ok let's test some linguistic theories.

NW African farmers are Pre-Chadic speakers and not Pre-Berber.
Proto Berbers came much later with that E-L19->M81.
Iberian farmers massively expand into Africa. They carry mtdna H1 and Y dna R1b V88, also other Y dnas like G2-L91.
After Iberian farmers mix with NW African farmers the Proto Chadic forms circa 5000 year ago somewhere in Algeria.
This favour the idea that the Chadic languages are close to Ancient Egyptian language who also is based on E-M78.

And the most interesting question. What kind a language the Iberian farmers were speaking? :)

Davidski said...


what is going to be a huge disappointment is that the Harvard paper probably does not have any South Asian aDNA older than Swat which is around 1500 BC I think.

You might be right.

But so what, if the ancient samples from Central Asia show no steppe admixture before the Middle Bronze Age or so? What are you going to claim then, that it was already present in India earlier? Did it hop on a plane?

Anyway, seems like you've heard things. I've heard things too, and I can tell you that the next few days are going to put your pet theories under some considerable strain.

Archaelog said...

@East Pole Good catch bro and Rob too.

I got the impression the the line going to South Asia is separate from the Sintashta-Andronovo branch. And the green arrows probably represent the same Siberian_HG population that influenced both the branches

Anyway since we don't know how old this photograph is, it's better not to read too much into it.

Davidski said...

Yep, just wait a few days.

Davidski said...


But as they are seemingly going with the Krause Model, I think some changes in popular conceptions will come.

If Krause modeled ANI as mostly Yamnaya then yeah. Otherwise no.

Man, you're in for a shock very soon.

Davidski said...


I have no idea if this is true but a poster on an OIT forum claimed to have spoken with dr. thangaraj recently. He is quoted as saying the rakhigarhi paper isn't ready and will take seral months to be published.

This is probably true, but it makes no difference. There's data from Central Asia coming out that will show a migration from the steppe to India during the Bronze Age.

Eren said...


So, the South-Asian paper is finally coming out in the next couple days? Since David Reich's new book is coming out March 27th, and includes a chapter on South Asia, the paper should come out earlier.

Regarding the Turkish samples, the samples are definitely not mislabled in the HO dataset. I downloaded the dataset January or February, so maybe you have an older version which includes false labels? I've uploaded all the Turkish samples to gedmatch:

Davidski said...


So, the South-Asian paper is finally coming out in the next couple days? Since David Reich's new book is coming out March 27th, and includes a chapter on South Asia, the paper should come out earlier.

Something like that. Just wait a few days.

Regarding the Turkish samples, the samples are definitely not mislabled in the HO dataset. I downloaded the dataset January or February, so maybe you have an older version which includes false labels? I've uploaded all the Turkish samples to gedmatch:

Alright, have a look now at the Turkish samples in the updated G25 datasheets.


Considering that it was suggested that Indo-Europeans peoples ancestral dna was already discovered and they already have data for a deep presence of the certain clade , may be they are just piling on the evidence.

Ancient samples from India aren't necessary to prove that there was a Bronze Age migration from the steppe to India across Central Asia. Ancient samples from Central Asia will do just fine. Think about it.

And if the Indians weren't in a panic then they would've released a paper by now with the ancient samples that they already have, and they do have them, because they admitted this in the media several times in recent months. Think about that too.

Jaydeep said...


You underestimate the Indians very much. Trust me, none of us is going to be disappointed.

It is likely that we are not going to have old samples (4000 BC or earlier) from South Asia anytime soon. Maybe in a year or two.

So now if we are going to have samples from Central Asia and South Asia which have steppe affinity but are younger than Yamnaya, how can I propose the OIT model ? It is just plain common sense.

But I wonder if these South/Central Asian samples can be used as admixture for the Mycenaeans or the Scythians.

Davidski said...


And what are you going to say when all of the Neolithic and Chalcolithic samples from Central Asia between the steppe and India are missing the steppe component?

Eren said...

@David: Sounds exciting. And thanks regarding the samples, I'll have a look.


Look at the title of chapter 6 - "The collision that formed India". That choice of words implies AIT to me.


Part I - The Deep History of Our Species
1: How the Genome Explains Who We Are
2: Interbreeding with Neanderthals
3: Ancient DNA Opens the Floodgates

Part II - How We Got to Where We Are Today
4: Humanity's Ghosts
5: The Making of Modern Europe
6: The Collision that Formed India
7: In Search of American Ancestors
8: The Genomic Origins of East Asians
9: Rejoining Africa to the Human Story

Part III -The Disruptive Genome
10: The Genomics of Inequality
11: The Genomics of Race and Identity
12: The Future of Ancient DNA


Anthro Survey said...


Nice comparison.

Well, the reason Natufians take Mahra could be due to what I've proposed as an excess of"BasalRich2" ancestry(as opposed to para-Eurasian) in them. They take IberoMaurisians because I take it they're primarily but ofc not exclusively comprised of this component, too.

B.Rich2: In other words, effectively a sister branch of BasalRich1(which I see as being more important in modern MEs and Europeans) but still w/in the Basal Eurasian clade. So, f4(Chimp,Mbuti;B.Rich1,B.Rich2)=0 while for divergent ParaEurasians it wouldn't be 0 if subbed in for either of the B.Rich as they're be overlap on the path.

Assuming that the difference between said constructs is mainly due to the split in the Basal portion(and not in their HG), it's easy to envision a geographic dispersal from a core Gulf Region(if you subscribe to that theory). Basically, Basal1 continued to hang out in the Gulf and Mesopotamia, while Basal2 trekked Westward, potentially accumulating in sinks like Yemen along the way, making very solid gains in Egypt, eventually reaching the Atlantic coast. Later on, B.Rich2 refluxes back to the Levant and mixes with B.Rich1 Levantines, culminating in Natufians. Very little Para ancestry is brough. Harifians and early Circum Arabian Pastorals(CAP) stay relatively unmixed B.Rich2 people.

A short while later, PPNB(B.Rich1-heavy) mix with CPA----material evidence from north KSA suggests this. Finally, an"Armenian" infusion introduces J1 and CHG component into Arabia and largely completing proper Arabs and putting them into the "proper ME spectrum".

Prior to release of these genomes, Maghrebis gave horrible fits, while Saudis were tricky, but not impossible to get in the convex hull. Both liked Natufian. For Saudis, I guess some extra B.Rich2 drift was missing while for Maghrebis, some divergent para-Eurasian drift was key and Iberomaurisian DNA nicely satisfies it now. :-) Speculations, I admit, but they seem to fit atm.

Jaydeep said...

And what are you going to say when all of the Neolithic and Chalcolithic samples from Central Asia between the steppe and India are missing the steppe component?

That's quite impossible. You do remember that we have had only sample from Central Asia which is the Mesolithic Iran Hotu and it clearly had steppe affinities. So I m pretty sure that even without migration from steppe or South Asia, the Central Asians already had significant EHG affinities since long.

Davidski said...


That's quite impossible. You do remember that we have had only sample from Central Asia which is the Mesolithic Iran Hotu and it clearly had steppe affinities. So I m pretty sure that even without migration from steppe or South Asia, the Central Asians already had significant EHG affinities since long.

Iran Hotu doesn't really have any ancestry from the steppe. Real steppe ancestry only arrived in Central and South Asia during the Bronze Age, along with R1a-M417. This is what you'll see in the ancient results.


hmmm sounds like the ANI ASI stuff

Yes, but only partly, because ANI is an Iran_N/steppe mix. The collision happened when the steppe people arrived.

Anthro Survey said...

Excellent post!

Since we're probably dealing with a ghost component, I'll pass on the challenge.

Of course, the next logical step is to model Iberians and see whether they get extra Yoruba or just Iberomaurisian. Galicians should be most interesting because they are expected to carry mainly Roman Age+Early Islamic admixture and, hence, minimal Yoruba since it'd predate major trans-Saharan camel traffic. Not sure about other Iberians, though, because later Almoravid expansion hailed from an area very close to Senegal.

It's worth noting the lower Levant_BA signal in the Mozabites. They are, after all, a Berber speaking and identifying group with minimal post-Arab "conquest" gene flow. Of course, they're also more isolated from circum-Mediterranean dynamics in general.

Surprised to see Saharawi not get extra Yoruba. Hmmm.

Chad said...

That is due to a lack of a better reference. Admixture components aren't real populations. You should know that by now.

Jijnasu said...

Seems like both groups of scientists (The pro-OIT researchers as well as the mainstream group) seem quite confident of their positions. From the cryptic by Razib on his blog it does seem as though the situation might be different from what we expect. (It might not really be the andronovans in mid 2nd millenium who moved into India)

EastPole said...


“That happened around ~2000YBP not ~2000 BC and that was not a collison , you can't have '' The collison'' with having a rock on one side and a gigantic mountain on the other “

This would be very interesting from the linguistics point of view.
Slavic languages are closer to Avestan than any living Iranian language and closer to Sanskrit than Hindi.
If those steppe people from Eastern Europe arrived to South Asia ~2000YBP how do you explain that their languages are closer to the languages of Avesta and Rigveda than your languages?

Arza said...

New linguistic analysis finds Dravidian language family is approximately 4,500 years old

A Bayesian phylogenetic study of the Dravidian language family

Jijnasu said...

How would you explain the the fact that the 'steppe like' ancestry exhibits a gradient amongst the caste hierarchy even in NW groups like punjabis. It seems improbable that these ethnic date back to the neolithic or earlier

Folker said...

About SSA admixture in Iberomaurusian, a point must be taken into consideration: Sahara was hyper arid from 20 000 BP to 10 000 BP, with no human settlement. So any admixture must have taken place before 20 000 BP.

Anonymous said...

@ Nirjhar007

"I said archaic not originating :) ..if such ancestry ''exists'' that can only be determined with ancient data."

No, there are other ways to find that:

Folker said...

@Anthro Survey
The paper based on modern Iberian population is showing a clear W/E structure, coherent with colonisation from the North during Reconquista. NA in Iberians was very likely part of the ethnogenesis of each medieval kingdom at different levels, around 1000 years ago, and didn't change much after the beginning of Reconquista (and the 1st period of Taifas). The Almoravid were more present in the East, where is NA is low.
Early medieval DNA is indeed needed to know the level of Iberomaurusian in pre-Muslim Iberia. My guess would be something like 1/3 pre and 2/3 post Muslim period, with local variations (given previous studies)

Ryan said...

Just curious - are there any signals of Iberomaurusian-like gene flow to the south? IE are any SSA populations well modeled as partly Iberomaurusian? Or do we lack good reference populations for that even.

Open Genomes said...

@David, even lower:

[1] "distance%=7.9836 / distance=0.079836"


Tepecik_Ciftlik_N:Tep003 48.3
Iberomaurusian 26.9
Tanzania_Luxmanda_3000BP:I3726 24.8

The Natufians were in E-PF1962*. The main subclade of E-PF1515 is E-M123, and E-123 has a Near Eastern distribution, with almost no representatives in Africa. As we know, all most all the Y haplotypes of the Levantine Neolithic were in E-M123* (and signficantly, one was E-M78*, but negative for both major subgroups).
The immediate sister clade of E-PF1962* is E-V1515.
As one can see, the distribution of E-V1515 is wholly East African, all the way south to cattle herders in Namibia. E-V1515 is primarly associated with Afro-Asiatic Cushitic-speaking peoples:

Trombetta et al. (2015) Maps of the observed frequencies for haplogroup E-V1515 and its major subhaplogroups.

Tanzania Luxmanda from 1000 BCE is a better fit than other Africans, and predates Arabian Semitic admixture in the Horn of Africa. He's also too far south for such admixture.

YFull tMRCA of E-Z830 at 19,200 ybp, showing East African E-Z1515 and Natufian E-PF1962

This is not about any "Sub-Saharan" admixture among the Natufians. This is about finding common Afroasiatic ancestry among the Iberomarusians, Cushitic-speakers, and Natufians, corresponding respectively to the Berber and Egyptian,
Cushitic and Omotic, and Semitic branches of Afroasiatic.

Afroasiatic Languages

The tMRCAs here are during or before the LGM. It would seem that the Afroasiatic speakers had two LGM refugia, one in the Atlas Mountains for the early branches of Y-DNA E-M35 (E-M78 / E-L539 and E-L19) and another one for at least E-V1515 in the Ethiopian highlands and perhaps for all of E-Z830. If the Natufians were the descendants of the Ramonians / Mushabians, they would have arrived in the Sinai region just after the Bolling Interstadial at 14,700 ybp, when it was wet enough for travel down the Nile valley and over to the eastern Sinai and Negev, where they mixed with the Near Eastern Geometric Kebarans.

Context Database map of the archaeological sites in the Near East, 18,000-14,500 ybp

Tepecik Ciflik TEP003 appears to be a better proxy for the Geometric Kebarans, without having the WHG ancestry found in Barcin. This Geometric Kebaran ancestry among the Natufians appears to be somewhere around or below 48.3%. We don't have a better early proxy for ancient pre-Semitic Cushitic-speakers than Tanzania Luxmanda from 1000 BCE. Neither are perfect of course. The connection to the Iberomaurusians would of course be immediately pre-LGM, but the Natufians and the Iberomaurusians were small hunter-gatherer groups from immediately after the LGM, and they both would have preserved much of their common Pre-LGM presumably Proto-Afroasiatic ancestry. Tanzania Luxumanda is about 50% admixed with a local East African (Nilo-Saharan or Hadza-like) component, but still shares a substantial amount with the Natufians.

It's the Y-DNA of course and the distributions of the related E-M35 subclades that shows the timing and perhaps the locations of the Afroasiatic LGM refugia, and which parts of E-M35 would have been located in each refugium. Afroasiatic is of course based on comparative linguistics. The LGM refugia are of course based on paleoclimatology. The respective cultures and locations of the associated Y-DNA E-M35 subclades are based on archaeology.

Open Genomes said...

The nMonte2 Global25 admixture evidence of course is completely independent of these other independent lines of evidence, yet it seems to correspond rather well with them, in spite of the later dates of the Kebaran and Cushitic proxies. There's no reason that all these independent lines of evidence should converge, on their own, unless this represents some kind of signal of a common Proto-Afro-Asiatic origin somewhere in Northern Africa, something that in and of itself isn't controversial.

Open Genomes said...

@Ryan, try the Fulani, the Mende, and some Chadic-speakers.

@David, so we have any Dogon samples? I think I remember seeing some at one point, years ago. Those might be most similar to the Kiffians of the Mesolithic Sahara. The Kiffians are also morphologically similar to the Iberomaurusians.

Open Genomes said...

@Salden, E-V12 in E-M78 sure looks like the Upper Egyptian Late Neolithic Naqada culture. These moved to the Nile valley after the sudden desiccation of the Sahara at around 5900 ybp. The tMRCA of E-V12 fits nicely with this event.

capra internetensis said...


why put E-V1515 in the Ethiopian Highlands though? that requires E-M35 to split up and pass across the Saharan zone in the arid period of MIS2. or if all Z830 in the south, then it is still pretty bad in early MIS 2, and the northern branch has to race down the Nile to reach the Levant in time.

why not have E-V1515 be a branch originating in Egypt or thereabouts that dispersed southward at an early date and survives best in the Horn of Africa?

Trombetta et al have no samples from Sudan and their most basal E-V1515 is in Eritrea. is there any archaeological parallel between Ethiopia and North Africa in the earlier period? - i don't recall any.

capra internetensis said...


what settings are you using? with the new Global25 datasheet and default nMonte settings i get:
Natufian - 69% Tepecik-Ciftlik, 21% Iberomaurusian, 10% Luxmanda - distance 14.7%

without Luxmanda:
Natufian - 65% Tepecik-Ciftlik, 35% Iberomaurusian - distance 14.3%

setting pen=0:
Natufian - 65% Tepecik-Ciftlik, 35% Iberomaurusian, 0% Luxmanda - distance 14.0%

different datasheet?

Open Genomes said...


I'm not saying that all of E-M35 originated either in the Atlas Mountains or the Ethiopian highlands. It could very well be possible that E-M35 originated somewhere in between, in the paleolakes region of the Central Sahara. Yes, MIS2 was pretty arid, but the real serious period of aridity was the LGM itself, 22 kya to 18 kya. This would have forced E-M35 in two directions, including the earliest branch of E-Z827 (E-L19) joining E-L539 and E-M78 in the Atlas Mountains, but early E-Z830 (E-Z1515) moving over to East Africa. In fact, the Nile stopped flowing altogether during the LGM. In the later Saharan pluvial period, the Nile originated in the lakes of the Central and Southern Sahara rather than Lake Victoria and the Ethiopian highlands.

It is possible of course given the tMRCA of E-V1515 with Natufian E-PF1962 that the Natufians spent at least part of the LGM in the Northeastern Sinai, but this seems extremely unlikely. Regardless, the Mushabians / Ramonians first appear in the Sinai and Negev just after the Bolling Interstadial (H1, 14.7 kya) but the culture already has links to the lower Nile valley. It's all a question of when the Nile began to flow - was it right after the LGM, at about 17 kya? I that case, there's no "race" to the north, just a population of the Nile valley by hunter-gatherers.

Some researchers (Tattersal et al. [2004]) have speculated that the Mushabian originated with the Iberomaurusians:

Mushabian culture

Howver, a Mushabian presence in Sinai before the Bolling Interstadial at 14.7 kya contradicts the radiocarbon dating evidence.

It's also possible that the Mushabians moved northward after the height of the LGM at about 18 kya along the western Red Sea coast. We do know that the Levant experienced an expansion of woodland just after the LGM, on either side of Lake Lisan. which then had a very high water level.

Regardless, the Mushabians were a small hunter gatherer group compared to their Geometric Kebaran contemporaries to the north. A small group didn't need a lot of resources to migrate along the western Red Sea coast to get within reach of the Levant when conditions significantly improved at the Bolling Interstadial, 14.7 kya.

There's no problem with leaving behind E-V1515 somewhere around the Ethiopian highlands. Ray & Adams (2001) in fact show an extension of the type 4 vegetation (Tropical thorn scrub and scrub woodland) eastward to very close to the Red Sea coast in Djbouti.

Ray and Adams (2001) vegetation map of Africa during the LGM

Red Sea during the Last Glacial Maximum: Implications for sea level reconstruction

It may not be that simple, but the archaeology combined with the paleoclimatology and the Y-DNA tMRCAs would seem to place at least E-Z830 (with E-V1515) somewhere near the Red Sea during the LGM, with a movement just afterward of E-PF1962 to the Sinai.

Open Genomes said...

@Capra, I'm using nMonte2, with these, from the Global_25_PCA.txt, and the Iberomaurusian average:


The Iberomaurusian could of course be changed to a specific Iberomaurusian, but they all cluster very closely together, unlike the Natufians. Perhaps a different Natufian and a specific Iberomaurusian would give better results:


capra internetensis said...


ah, scaled vs unscaled datasheets it is. using the scaled sheet Luxmanda does not improve the fit, whereas using the unscaled coordinates the model with 29% Luxmanda is slightly better (8.0% vs 8.3% without her). Luxmanda's distance is ~11 with unscaled PCs (closest to Natufian) and ~38 with scaled PCs (furthest from Natufian), so yeah, quite a difference.

it seems to me much simpler to just put E-M35 in North Africa and not reach south of the Sahara before the Late Glacial. though we still need to get E-M35 into North Africa in the first place of course. by YF dates E-M215 splits up ~38-32 kya and E-M35 ~26-22 kya.

from papers i've read Lake Tana, Lake Turkana, and possibly Lake Victoria were dried out ~17 kya and with some fluctuation stayed pretty much dry until ~15 kya, with Lake Tana and Lake Victoria flowing into the Nile again ~14.5 kya.

before then what i've seen is somewhat inconsistent. the paper below (based on Nile sediments in the Eastern Mediterranean) says conditions were relatively dry from 50-38 kya, then there was a wetter period 38-30 kya, then dry again, with a "drastic decrease in Nile discharge" during the LGM 25-17 kya.

though of course the confidence intervals on both the genetic TMCRAs and the climatological data are big enough that any precise correlations are dubious.

Alberto said...

@capra internetensis

= 47% Mota, 35% LevantN, 18% Dinka - distance 5.8%
= 55% Mota, 21% LevantN, 19% Iberomaurusian, 5% Dinka - distance 6.9%

so i don't know whether if it's just plain no good or whether it is partly good and partly bad.

There's something strange going on with your models. For example, in the above, if Iberomaurusian makes the fit worse (in the second run), it should not be picked and stay at 0% giving you the same distance as the first one. So there's something wrong going on there.

(Not sure if this is some "feature" of nMonte3, which I haven't used, maybe others can tell and explain what's going on).

As a side note, if you run models with a selected number of sources, you might find this script useful:

capra internetensis said...



yeah, i don't grok it either. seems nMonte is getting caught in a local optimum, can't decrease Iberomaurusian too much without making the distance worse? running for 10000 cycles doesn't help.

i don't know what i could be doing wrong, i'm just running the default settings. i do find it quite often happens that nMonte will include a population in the model if its offered as a source even though the fit ends up being worse.

Alberto said...

That really should happen. I used nMonte (first version) for quite a while and it never did that. So not sure if it's something about this version 3. I don't think there's anything you could be doing wrong, since it's only including the populations and running a command.

You can run Xmix (the one I linked above) side by side to double check the results.

Alberto said...

^^^ That really *shouldn't* happen.

Rob said...

IM not understanding some of the above model attempts- using barcin for Natufians and what Not
That’s sort of like modelling EHG as a descendant of Yamnaya
Doesn’t seem sound

Arza said...

@ capra
Can you post the coordinates that you use? I'm unable to replicate this.

"distance%=5.3826 / distance=0.053826"
"Ethiopia_4500BP" 42.2
"Levant_N" 33.1
"Dinka" 18.1
"Iberomaurusian" 6.6


nMonte3 with Nbatch set to 100

BTW Have you manually rounded any results?

Open Genomes said...

Ok, got it:
The Natufians are E-Z830*, not E-PF1962.
The Upper Nile valley was densely inhabited during the LGM.

Wind-blown sand dunes blocked the flow of the Nile in several places during the LGM. The Blue Nile and the White Nile still flowed at a lesser rate, but were unable to clear the sand dunes. Freshwater lakes formed in Upper Egypt behind these dunes and this provided LGM refugium for hunter-gatherers. This region is dense finds of lithics from the LGM.

After the Bolling Interstadial at 14.7 kya, or maybe even somewhat before, the Nile began to flow again and some of hunter-gatherers must have moved down the Nile and over to Sinai to become the Mushabians and then, the Natufians.

The lakes of the Nile Valley of Upper Egypt must have been the LGM refugium for E-Z830. In fact, the locus of greatest diversity of E-Z830 is among the Cushitic-speaking Beja just east of the Upper Nile valley in Upper Egypt and Sudan over to the Red Sea coast.

Vermeersch and Van Neer (2015) Nile behaviour and Late Palaeolithic humans in Upper Egypt during the Late Pleistocene

"The reconstruction of the environment and the human population history of the Nile Valley during the Late Pleistocene have received a lot of attention in the literature thus far. There seems to be a consensus that during MIS2 extreme dry conditions prevailed over north-eastern Africa, which was apparently not occupied by humans. The Nile Valley seems to be an exception; numerous field data have been collected suggesting an important population density in Upper Egypt during MIS2. The occupation remains are often stratified in, or at least related to, aeolian and Nile deposits at some elevation above the present-day floodplain. They are rich in lithics and animal bones, mainly fish, illustrating the exploitation of the Nile Valley by the Late Palaeolithic inhabitants. The fluvial processes active during that period have traditionally been interpreted as a continuously rising highly braided river.

In this paper we summarize the evidence thus far available for the Late Pleistocene on the population densities in the Nile Valley, and on the models of Nilotic behaviour. In the discussion we include data on the environmental conditions in Eastern Africa, on the aeolian processes in the Western Desert of Egypt derived from satellite images, 14C and OSL dates, in order to formulate a new model that explains the observed high remnants of aeolian and Nilotic deposits and the related Late Palaeolithic sites. This model hypothesizes that, during the Late Pleistocene, and especially the LGM, dunes from the Western Desert invaded the Nile Valley at several places in Upper Egypt. The much reduced activity of the White Nile and the Blue Nile was unable to evacuate incoming aeolian sand and, as a consequence, several dams were created in the Upper Egyptian Nile Valley. Behind such dams the created lakes offered ideal conditions for human subsistence. This model explains the occurrence of Late Palaeolithic hunter–fisher–gatherers in a very arid environment with very low Nile flows, even in late summer."

Bronze said...

@Open genomes
modern diversity is not necessarily representative of origin or ancient diversity.

Y-dna E-m35 likely originated in north africa or the middle east. It makes no sense for a natufian-related component originating in north east africa along the nile river and somehow avoiding all SSA admixture before moving to the levant when we know SSA admixture extended all the way to north east africa during the much earlier iberomaurusian period.

Samuel Andrews said...

@rob, or like modelling Ukraine Mesolithic as part ehg which makes sense. Just cuz a pop is younger than another doesn't mean it can't be representative of an ancestor.

Samuel Andrews said...

I do doubt natufian is part full blown eef. However I'm open the open to the possibility they have ancestry from something similar

Open Genomes said...

@Rob, yes, the E-V13s (under E-L618) are phylogenetically closer to the Iberomaurusians, than the Natufians and in fact descended from the Iberomaurusians who were an "incomplete" E-M78* from before the present tMRCA. E-M521, similar to E-L618 (the ancestor of E-V13) is also found in just two examples in the Southern Balkans (Greece).

Since neither E-L618* nor E-M521 are found east of the Balkans, and especially not in the Near East, it seems possible that these E-M78 sublcades went directly from North Africa to Europe, before the Neolithic.

postneo said...

@east pole
"Slavic languages are closer to Avestan than any living Iranian language and closer to Sanskrit than Hindi"

You dont know Avestan, Sanskrit or Hindi so this is bullshit. Sanskrit demonstratives are closer to Germanic than slavic. Hindi is distant from Sanskrit in many aspects. but many neighboring languages fill the gap and vice versa.

Anthro Survey said...

A few Iberian models.

"Moor" here is David's Mozabite model w/IBM sans Yoruba.

Interestingly, the "Central European":EEF ratio in these Iberians is considerably higher than it is for Basques. Was much of Iberia rather French-like at one point? I doubt a Visigothic layer explains this, but perhaps Urnfield systems and Celts were more influential than previously thought?

Coming up:making the Anatolia_BA signal disappear.

[1] "distance%=2.38 / distance=0.0238"
Beaker_Central_Europe 57.0
Iberia_ChL 42.6
Moor 0.4
Yoruba 0.0
Anatolia_BA 0.0
Levant_BA 0.0

[1] "distance%=1.6076 / distance=0.016076"
Beaker_Central_Europe 54.5
Iberia_ChL 17.7
Moor 14.8
Anatolia_BA 13.0
Yoruba 0.0
Levant_BA 0.0

[1] "distance%=2.4121 / distance=0.024121"
Beaker_Central_Europe 58.40
Iberia_ChL 18.70
Moor 15.05
Anatolia_BA 7.85
Yoruba 0.00
Levant_BA 0.00

[1] "distance%=2.3032 / distance=0.023032"
Beaker_Central_Europe 55.95
Iberia_ChL 25.95
Anatolia_BA 11.05
Moor 7.05
Yoruba 0.00
Levant_BA 0.00

[1] "distance%=1.7712 / distance=0.017712"
Beaker_Central_Europe 55.0
Iberia_ChL 18.4
Moor 15.1
Anatolia_BA 11.6
Yoruba 0.0
Levant_BA 0.0

[1] "distance%=1.2841 / distance=0.012841"
Beaker_Central_Europe 58.00
Iberia_ChL 17.45
Moor 15.30
Anatolia_BA 8.50
Levant_BA 0.50
Yoruba 0.25

Anthro Survey said...

"Italics?" is a ghost Arza derived a few days back from intersecting clines. In all likelihood, Circum-EastMed ancestry is more complex in Iberians and probably consists of Minoan-like, Anatolia_BA-like, Italian_Jew and Italic-like elements. So, the "Italic" figure is probably more of an upper bound.

[1] "distance%=1.539 / distance=0.01539"
Beaker_Central_Europe 45.9
Italics? 24.4
Iberia_ChL 15.8
Moor 13.8
Yoruba 0.2
Anatolia_BA 0.0
Levant_BA 0.0

[1] "distance%=1.5844 / distance=0.015844"
Beaker_Central_Europe 46.5
Italics? 23.1
Iberia_ChL 16.1
Moor 14.2
Yoruba 0.0
Anatolia_BA 0.0
Levant_BA 0.0

distance%=1.5475 / distance=0.015475"
Beaker_Central_Europe 50.25
Italian_Jew 18.95
Iberia_ChL 18.90
Moor 10.80
Anatolia_BA 0.95
Yoruba 0.15
Levant_BA 0.00
(Moor fraction changes here because Italian Jews have NA ancestry).

Samuel Andrews said...

@Anthro Survey,

Everything makes sense except the Moor scores. When I use Morocan, Iberians score 5-8% not 14%. Also, Moroccan scores might be exaggerated by the possibly more significant European ancestry in Morocco (which includes recent Steppe-influenced Europeans).

Also, Beaker_Netherlands is probably the best reference for the source population of R1b P312. Beaker_Central Europe has a big extra layer of farmer admixture.

Samuel Andrews said...

Back to the Reich white board teasar.

This suggests the South Asian aDNA paper will be out very soon. The map looks like a proposed spread of Indo European languages with special emphasis on Hindi. The paper might include strong claims about the origins of IE languages.

On the map, it looks like Reich is supposing CHG is PIE not Steppe because he shows IE spreading to Anatolia straight from the Caucasus. He probably doesn't have Hittite DNA to confirm this so I wouldn't take it seriously.

ALso, Andronovo is made out to be a descendant of Corded Ware. Is Reich aware of R1a M417 in Ukraine in 4000 BC with heavy farmer admixture? R1a M417 Steppe folk and farmers began mixing with each other not in central Europe but in eastern Europe and long before Corded Ware. That's likely how Andornovo got its farmer ancestry.

Furthermore, his map clearly presents farmer admixed Corded Ware-like folk the ancestors of Indo Aryans. Arguable the square which the Andornovo line hits is referring to admixture with pure Steppe folk somewhere in Central Asia. Nonetheless, we should probably expect early IEs in India to show minor EEF/WHG ancestry. mtDNA in modern SOuth Asians confirm they have some European-derived EEF ancestry.

At least to me, the map clearlly shows that when IEs arrived in India they had both Siberian & 'Central Asian' (Iran Neo) admixture on top of a Corded Ware-like genetic makeup. Then, IEs in central Asia, back migrated into Siberia which can explain the Iran Neo ancestry in Scythians.

Samuel Andrews said...

But....Ancient DNA teasers always leave out important details which lead to false conclusions.

Anthro Survey said...


Yeah, that's probably a better idea----I'll try Beaker_Netherlands out.

The reason I broke it down into "Moor" and Yoruba is because I wasn't sure of what the Berber invasion force nor Roman age Mauri immigrants were like. So, I didn't want to automatically force SSA in there. Trans-saharan slave traffic didn't reach a peak until medieval and early modern times.
Plus, Moroccans in Dave's set potentially have significant Bani Hillal Arab ancestry and that's why I decoupled Mozabites rather than them.

Now, do you get 5-8% for Western Iberians or for Eastern Iberians? My Catalan, Cantabrian and Aragonese runs get a relatively low Moor score, too(~7%). How about your CentralEurope:local EEF ratios?

Anthro Survey said...


"At least to me, the map clearlly shows that when IEs arrived in India they had both Siberian & 'Central Asian' (Iran Neo) admixture on top of a Corded Ware-like genetic makeup. Then, IEs in central Asia, back migrated into Siberia which can explain the Iran Neo ancestry in Scythians."

This is what I'm expecting. When you say Siberian, though, do you mean ENA/Devil's Gate type of ancestry? Or some Central Asian ANE-like UHG(bundled with local Iran_Neo)? I think the latter is going to be a factor and missing this drift makes our current models borderline.

Cossue said...

Re. Iberia, if you draw a line from the southwestern tip of the Iberian peninsula in Portugal till Catalonia, in the NE, and then retire all the Pyrenees region, all the lands to the west of this line -and some to the east of it- were profoundly indo-europeanized, judging by the toponymy (cf. Sims-Williams Patrick, "Ancient Celtic Place-Names in Europe and Asia Minor"), anthroponymy and theonymy recorded by Classical authors and local Latin inscriptions. Here you have, for example, a catalog of native personal and divine names recorded in local inscriptions or in literary references (they are Iberian, Basque, Celtiberian, Lusitanian, etc… although in fact most names can only be told apart in between Basque and Iberian, and Indo-European, which form most of the catalog): (anyone with some knowledge in Celtic and IE languages would surely have some fun with this book).

As for the Moors in Galicia, our medieval records show -abundantly- the presence of Moor POWs and slaves since the 9th century, after the successful campaign of (later) king Ordoño in Andalusia, and till the 13th century, when town constitutions still regulate the selling of Moor slaves. For something like five centuries there were a steady "importation", either through war of by means of commerce, of Moor slaves from the south. Also the records show that these persons tended to change name and to be baptized and acquire its freedom -and to admix with the locals- in a pair of generations.

Samuel Andrews said...

@Anthro Survey,

I haven't done a thorough survey of Iberia. Portugal, Galicia get 7-8% Morocco. Aragon gets 2-3%, Andalusia gets 4-5%.

"Plus, Moroccans in Dave's set potentially have significant Bani Hillal Arab ancestry and that's why I decoupled Mozabites rather than them. "

Good point. Morocco may not be representative of Moors.

Samuel Andrews said...

All eastern EuroHGs are basically intermediate between AfontovaGora3 and IronGates. This includes EHG. Using, Global25 scaled, you can get very reasonable results for Europeans using AfontovaGora3 and IronGates.......













Samuel Andrews said...

EHG-like ancestry in IronGates makes it look like a better proxy for the WHG-like ancestry in Steppe ancestry than it is. EHG probably does have direct ancestry from full-blown WHG on top of merely WHG-related stuff. This method gives more accurate scores for total WHG ancestry in populations. Balts come out 40% WHG.

Anthro Survey said...


Ah, the importations part makes sense, but to what do you attribute lower Moorish ancestry in Catalonia/Valencia? It's still a bit counter-intuitive. Did those areas undergo massive resettlement by folks from north Catalonia and Languedoc(at the time, Occitan and Catalan areas weren't as culturally/linguistically differentiated)?

Re/Galicia---It gets considerably higher Beaker-like ancestry than other Iberian areas when you account for Moorish and East-Med ancestry. Bit of a surprise for me, but perhaps it really did have a heavy Celtic character prior to Roman conquest. Do archaeologists agree?

Btw, do you think Galicia's demographics were impacted by Roman cosmopolitanism(i.e. East Med gene flow)? It was no backwater as evidenced by walled Roman cities like Lugo. All that Anatolia_BA can't be from converted Jews.

André de Vasconcelos said...


Slaves would probably not explain the relatively high levels of E-M81 in the west. It wouldn't be common at all for a slave man (even if recently freed) to father a child with a native Iberian woman, this rarely the case when slavery is concerned. Also men didn't usually marry above their social rank, women did. I'd bet this is the case even today

I find it much more likely that the latest NA admixture event has something to do with Mozarabs (and/or Christian Berbers) moving in from the south into the north, as they were more educated than their religious bretheren in the north and thus their arrival generally appreciated, bringing with them their art, architecture and genes. It's also in line with the recent paper that explained the latest admixture event to happen after the 711 invasion, but closer to 900-1000AD. This could have been more proiminent in the west because of Ibn Marwan, his alliance with Asturias and his domains in modern Southern Portugal following his muladi/mozarab revolts against Cordoba.
There's also the Almanzor Offensive, but I don't consider violence and rape - particularly in a short period of time - a good way to spread genes to the point we see today, particularly since its territorial consequences were rather short-lived and expulsions were not uncommon.

Of course previous events would also help explain its geographic distribution, namely maritime/atlantic contacts during the early middle ages (pre-muslim), movements of NA Christians fleeing war (Byzantine, Umayyad, etc) into Iberia, Roman Era migrants/traders, etc..

huijbregts said...

nMonte does a random walk , so it can get caught in a local minimum. Actually that happens quite often, but usually the local minimum seems to be close to the global minimum.
A more important problem is that nMonte can generate 'virtual populations'. For instance, when you have a German target and you offer it a French, a German and a Belarusian sample, nMonte may neglect the German sample and generate a virtual German by mixing French and Belarusian. For this combination nMonte can itself choose the optimal mixture percentages, so chances are that this combination will return a better fit than the actual German sample.
Also problems may arise when you use single samples which are outliers in their own population.

capra internetensis said...


I am using the Global25 scaled population averages that Davidski linked in this post. And yes I am rounding the percentages above a few percent.


The results do vary from run to run, sometimes by a fair bit, e.g. running the same samples again I get 52% Mota, 24% LevantN, 16% Iberomaurusian, 8% Dinka - distance 6.3%.

capra internetensis said...

well shit, ran the same populations without Iberomaurusian and got distance 6.6%? then 7.1% and totally different proportions on the next try. sorry guys, probably should ignore all previous results.


Oranian is the main alternative name.


Thank you. Does it have a random starting value for each population?

Arza said...

@ capra

[1] "penalty= 0.001"
[1] "Ncycles= 20"

My guess is that you have some old version of the file with penalty and lowered Ncycles saved in R session. Check variables in file, run source('nMonte3.R'), and if it helps run q() and y to save the workspace and then start R again.

Josep Coderch said...

@André de Vasconcelos
"It's also in line with the recent paper that explained the latest admixture event to happen after the 711 invasion, but closer to 900-1000AD."
What paper are you talking about?
Thanks in advance.

André de Vasconcelos said...


This one

capra internetensis said...


It's all Ncycles=1000 and pen=0.001, which are the default settings.

Anthro Survey said...


Ran Catalans from Catalonia using the same setup as above to keep things consistent. Valencian Catalans behaved similarly.

[1] "distance%=1.7742 / distance=0.017742"
Beaker_Central_Europe 59.25
Iberia_ChL 23.25
Anatolia_BA 9.25
Moor 8.25
Levant_BA 0.00
Yoruba 0.00

Again, note the depressed Beaker:nativeEEF ratio compared to Galicians and other West Iberians. Theirs is 3 and a bit above, while Catalan, Cantabrian, Valencian and Andalusian is 2-2.5. Basques(Sp and French) and French_South(from south Gascony) get 1.2-1.5 or so.

Assuming a beaker cluster w/a DF27+ male is a slightly better proxy than Beaker_C_Europe and assuming Provencal MLN samples are better proxies for Catalan EEFs, as opposed to Iberia_Chl, we get this:

[1] "distance%=1.6363 / distance=0.016363"
Beakers_DF27_site 56.45
France_MLN 24.70
Anatolia_BA 12.30
Moor 6.15
Yoruba 0.40
Levant_BA 0.00

Compare to Galegos:

[1] "distance%=1.4749 / distance=0.014749"
Beakers_DF27_site 53.7
Iberia_ChL 16.9
Moor 14.9
Anatolia_BA 14.2
Yoruba 0.3
Levant_BA 0.0

2.3 vs 3.1. So, I'm beginning to think those Galego Celticist larpers have a solid point, after all. Though, their bag pipes are thought to share phylogeny with East Med variants as opp to Scottish counterparts. Andre probably knows more about this.

Alogo said...

Damn, this sort of population was exactly what Maghrebis needed to start getting sane distances. Shame no IAM too.

Even if you throw all ancients and moderns at them, Mozabites, Sahrawi, Algerians, Moroccans and Tunisians get a good chunk of it and if you remove it their distances increase up to ridiculous levels again. Libyans are so-and-so and Egyptians could care much much less if at all, apparently.

Unsurprisingly and with a quick look, it seems you can use the Iberomaurusians as a decent North African substitute for Iberians too. Playing with it for a bit, I also wonder whether ESP005 (as opposed to ATP9) doesn't have a very small amount of that sort of ancestry, though at rather insignificant levels if at all.

Arza said...

@ capra
Got it. One after another.



Set penalty to 0. It not only shifts the proportions, but apparently it also introduces some randomness.

Anthro Survey said...


Yeah, noticed the distances thing a while back. IAM should be relatively similar to IBM but with extra Natufian.

As you can see above, I've used an SSA-decoupled and normalized Mozabite consisting of IBM and other components seen Dave's model. Nil Yoruba.

When I don't use Moor but just IBM, Iberia_Chl, Levant, etc., Yoruba signal is still nil. I take it most of the SSA Iberians get in calcs is Iberomaurisian, not "proper SSA".

Modern Egyptians are essentially an outgrowth of the Middle East. Guessing that serial migrations from the Fertile Crescent diluted whatever IBM-like they had.

Cossue said...


Yep, I agree, those a factors that must be counted with. And I'll add a possible late Iron Age (Carthaginian) admixture in the south and along the Atlantic façade...

"Slaves would probably not explain the relatively high levels of E-M81 in the west. It wouldn't be common at all for a slave man (even if recently freed) to father a child with a native Iberian woman, this rarely the case when slavery is concerned. Also men didn't usually marry above their social rank, women did. I'd bet this is the case even today"

But precisely we have proofs that this wasn't true there and then: there is a 12th century document copied in the Cartulary of the monastery of Sobrado dos Monxes, northern Galicia, with the genealogy of several Moor serfs. Here are some excerpts, translated by me (by the way, "Galician" traduces the word galego/galega in the original):

“Lord Diego Vazquez brought Pedruchi, stonemason, and this one bore Martin Porra, that was named Lupi previously to his baptism, and he was son to a woman named Cornadesa. This Martin Porra had a Galician woman, of free origin, and he engendered with her Maria Martin, and Peter, and John and another little girl”

“Brother Men Vazquez brought Ali Gordo from the town of Toro, and this Ali had a wife named Fatima Regañada, and both died being pagans; they had also a daughter named Hobonam, that after baptism she was called Maria Oanes, and a son that was named Miguel in baptism”

“Maria Perez had a daughter named Maria Oanes with Joan Pombo, Galician stonemason”

“From Fernando Negro, who was before called Mafumate, is born Martin Fernandez and Elvira Fernandez. From Elvira Fernandez is born Pedro de Meira, son of a Galician father. From Martin Fernandez and from a Galician woman from Regueira is born a young kid”

“Joan Zada, carpenter, came from Portugal and had a son with a Galician woman, called Pedro Mouro”

Etc. This monks and their noble patrons were literally breeding their serfs, because that was cheaper and safer than capturing or buying more (apparently money knows nothing about race or religion, go figure). Again, I want to make clear that there are dozens of deeds, charters, laws, mentioning the presence or arrival of Moor serfs and prisoners into Galicia. I'm not cherry-picking.

I doubt that they had large numbers of Muslim serfs in Catalonia, because the proximity to Muslim forces and strongholds would have converted them in a security risk: in 1140, Lleida, some 130 km from Barcelona and not far from the Pyrenees, was still a Muslim stronghold (it was reconquered in 1149). At this same time, Santiago de Compostela was some 400 km North of Santarém, which was reconquered to the Moors in 1147 by the Portuguese. Galician and Portugal north of the Douro were, for most of the Middle Ages, beyond reach of the Muslim armies. And the large distances, rivers, and mountain ranges probably dissuaded many serfs from fleeing south.

As for the Celticity of Galicia, well, linguistically is rather clear that the pre-Latin toponymy, hydronomy, anthroponomy, theonomy, etc... of Galicia, and of all the NW quarter of Iberia, is close to 100% Indo-European. Archaeologically the Castro culture is also very Atlantic, but the Mediterranean influences are multiple and notorious:

Arza said...

@ huijbregts

Bug report. Or at least I think so. ;)

eval2 <- sum(colM2^2) + pen*sum(matAdmix[b, ]^2)

In this line you're applying random penalty.

Let's say that you have two populations in the model, A and B. Both are equidistant to the target so the value of "pen*sum(matAdmix[b, ]^2)" is identical for both of them.


In the ideal case as above the penalty is equally applied to both samples.

1/10 + 1/5 + 1/1 = 1.3

But as the order of probing of different samples is randomized it may happen that the order will look like this:


So one sample is mostly probed when we are far from the target and the other when we are close.

A - 2.2
B - 0.4

In this scenario B is receiving penalty when it is insignificant when compared to the distance.
On the other hand A is receiving penalty when the distance is approaching to 0, so suddenly the significance of penalty greatly increases.

Josep Coderch said...

@Anthro Survey
Well if I had to guess I'd say that all the populations living near the Pyrenees, being the most important mountainous area in the Iberian Peninsula, retain the most ancient ancestries: mesolithic and neolithic; while the rest were the most affected by later arrivals: indo-europeans and muslims. During the reconquista the genetics of the north were brought south and maybe all of this explains the regional variation in the Peninsula existing today.

It's a fact that north african-like ancestry is a bit higher in the west of Iberia (both north and south) but a few texts saying that there were muslim slaves there doesn't convince me because the bulk of the arab/berber population was in Granada, València and Aragón. I'll take it in lack of a better explanation though.

Arza said...

@ Anthro Survey, Alogo
In the updated preprint Guanches in the ADMIXTURE score up to 84% IAM.

Anthro Survey said...


Yeah, they look sort of like KEB under construction in 2D. It's sensible considering the islands' more remote location.

Agree re/Pyrenees and maybe this range extended all the way up to the Garonne. Ellsyces' territory and Iberian coast probably took second place. Portugal, Meseta, Galicia, etc. were somehow more appealing to Central European migrants.

Don't forget about the potential demographic impact of Roman cosmopolitanism. West_Asian ancestry from different sources(from Greek speakers mostly who wouldn't have resembled modern continental Greeks) must have been ubiquitous in metropolitan regions like Baetica. We have DNA from Colegno, a trading outpost in Northern Italy. ~3 of the locals are Cypriot-like, while the other ~3 seem more French leaning.

Anthro Survey said...

@Andre, Josep and Cossue

Re/Serfs----Would or wouldn't a good fraction of those have been essentially Iberian-like Muwallads? Their importance can't be understated, but maybe their share of population varied by region and century.

Side note---If my models hold water, it goes to show as we've seen time and again that 2D PCA data can be deceptive in hiding important underlying structure. Iberians may cluster together, but that's not quite all there's to it. Basques get less steppe and/or Europe_MLBA than others when we account for later layers of admixture.

capra internetensis said...


Thanks, pen=0 is giving consistent results. If both Dinka and Natufian are included Luxmanda takes no Iberomaurusian
Luxmanda - 40% Mota, 34% Natufian, 20% Dinka, 6% LevantN - distance 4.4%
but without Dinka
Luxmanda - 62% Mota, 32% Natufian, 5% LevantN, 1.6% Iberomaurusian - distance 5.1%
without Natufian
Luxmanda - 42% Mota, 33% LevantN, 18% Dinka, 6.4% Iberomaurusian - distance 5.4%
without either
Luxmanda - 61% Mota, 31% LevantN, 8% Iberomaurusian - distance 5.8%

so anyway, no need for Iberomaurusian. Darn.

On the PCAs one of the IAMs seems to be an outlier, further from ENF than the other 3, so I guess most if not all of them already have some ENF.

Cossue said...

Josep, Anthro Survey,
I don't imply that all the NA admixture in Iberia is due to slavery, but that slavery and the capture of prisoners of war is a well documented centuries long factor in Galicia (and probably also León and N Portugal). Here's another selection:

"mancipia ex Hysmaelitarum Terra captiva duximus L, quibus precipimus expleri obsequia Ipsius Sedis" = "we give to this bishopric 50 serfs we brought captives from the lands of the Ismaelites" (Old Cartulary of Lugo, 897)
"donamus etiam glorie uestre ex mancipiis quos sca. intercessione uestra de gente hismaelitarum cepimus; nominibus Froilanum, Leodericum cognomento Abdela, Froritum cognomento Abderahamam cum sua muliere Maria et sua filia Guntina, Zahit, Zahim, Scahit, Zahaton, Iausar, Lallus, Fetta, Melchi, Zahit, Aloitus, Fare, Adosinda cognomento Anna, Teodegundia cognomento Anza, Carrataim, Belita, Rahama, Kerita, Aissima cepta cum filia sua. item et alios Zahat, Eikar, Abdel, Gatel, Calaph. item Cahat, Alfarach, Abuzahat, Feta et Alazath." = "We give to you these serfs that we capture of the Ismalite people, called Froila, Leoderico alias Abadela, Frorito alias Abderahamam with his wife Maria and his daugter Gunita, Zahit, Zahim, Scahit, Zahaton, Iausar, Lallus, Fetta, Melchi, Zahit, Aloitus, Fare, Adosinda alias Anna, Teodegundia alias Anza, Carrataim, Belita, Rahama, Kerita ..." (Catulary A of Santiago, 911)

"mancipios et mancipellas quos fuerunt ex gentes mahelitarum et agarini, id sunt: Petro, Martino, Domengu, Halephe. item post Alveidar, Maria, Gigenia, Marina, Semza" = "serfs who were of the people of the Ismaelites and Agarenes..." (Cartulary of Celanova, 1029)

"In Ribas Iº kasal et IIIes mauros et IIIes mauras, totas suas equas bravas et totas meas vaccas" = "In Ribas a hamlet, with 3 Moor men and 3 Moor women, and all the wild mares and all my cows" (Pontevedra, 12th century)

"domatis aliis Xm et vacciis XXXª et ovibus Cm et ethiopibus XIIIIcim, inter sarracenos et sarracenas" = "oxen another 10, and cows 30, and sheep 100, and Ethiopians 14, Saracen men and women" (Monastery of Oseira, 1154)

“Iten se alguun extranyo uender mouro ou moura, de in portagen xij dñ” = "if a foreigner sells a Moor, man or woman, they should pay a tax of 12 diñeiros" (Foros do Bo Burgo de Caldelas, 1230)

"grad'a Deus, con mia espada
e con meu cavalo louro,
ben da vila da Graada
tragu'eu o our'e o mouro.” =
"Thanks God, with my sword
and my blonde horse
well from the town of Granada,
I bring gold and Moors"
(Pero Gomes Barroso, 13th century)

As for the nature of these people, I guess that it changed along the centuries. During the first centuries of Muslim rule, Christians represented most of the rural population in southern Iberia, and so a _Muslim_ captive were probably mostly ethnically Arab/Berber. Later this wasn't probably true. In the other hand, since the intervention of the Berber Almohads and Almoravids during the last years of the eleven centuries, most captured soldiers were probably Berber soldiers; also, they brought African slaves with them. But peasants were probably mostly descendants of Christians. The miniatures of the 13th century "Cantigas de Santa Maria" show that the ranks of the Muslims were certainly multi-ethnic and more variegated than the ranks of the Christians.

André de Vasconcelos said...

@Josep, Anthro, Cossue

The Pyrenees might be a critical geographical element in Iberia, but its role is mostly separating us from the rest of Europe, rather than being where higher mesolithic/neolithic peoples dwelt. When it comes to our internal geographical barriers, the Sistema Central and the Sistema Iberico seem to be the dividing barrier that historically separated north and south, with the northern areas generally being more prone to WE/CE influence and the south to Mediterranean ones - eventhough it lost its importance with the repopulations of the Reconquista as the latest paper showed.
But I have no idea if we'll see any significant genetic differences when we get to see iron age samples, it could be that these are slight even if culturally the differences were important.

Cossue, I don't doubt what you say, it's perfectly plausible, but the West/East difference seems to be "too big" for slaves to be the main culprit. Besides we have no idea what these serfs/slaves were like, they could have been muladi, actual berbers or a mix of these. And speaking of which, when I said that 'Mozarabs could have played an important role' this is also speculation because we don't really know their genetic profiles. They were latin-speaking christians in muslim territories, but they could have NA ancestry after mixing with NAs (be it christian or muslim). Or maybe they didn't, and were just like those of the north and the NA ancestry in NW Iberia wasn't their responsibility. Can't really say much until we get more info

As for Galicia having higher Europe_MLBA relation to EEF, I'm a bit skeptical. Most models I've seen do not point to this, and even David's qpAdm of Iberia showed something very different
Here Galicians actually score the 2nd lowest Steppe_EBA:EEF ratio of all the IE-speaking sampled populations. Sure I'd love to see a new model with newer samples, things might change, but I still don't find NW Iberians particularly more related to Steppe populations than others' (and I'm from NW Iberia myself). If the difference exists, it shouldn't be too big

This thread is getting buried under the Reich and Ural threads, oh well

Cossue said...

I agree, there's a lot of things that we don't know, but for this NA admixture in Iberia with a W-E gradient we are running out of opportunities: M81 is relativelly recent, and we see that the main admixture event was around 800 AD.

Re. M-81, its distribution its not even regular locally, and in my opinion this points also to it being a recent importation. For example, in this local Galician study, they found that of 292 Galician males, 12 were M-81, but of these just 3 were found in coastal areas, from Vigo till Asturias, whilst 9 where found in the extensive Minho valley, from its mouth to the eastern mountains, which represents half the surface of Galicia, give or take. So:

Minho valley: 9/126 (7%)
A Coruña province + Rías Baixas + Mariñas de Lugo (~Coastal Galicia): 3/166 (2%)

Anyway, I'm far from an expert in statistics, so maybe this numbers have little significance.

Ah, well, I'm sure we'll pick again this conversation :-)

E-V65 said...

Thank you for the data, I wanted to ask about the tiny amount of E-V65 haplogroup (0.68 %) found among continental Greeks by Cruciani et al 2007.
is it close to iberomurusians? i.e. moved with L618s to Greece in Mesolithic?

best regards

huijbregts said...

If I understand you correctly, your comment is not about a bug in the formula, but about uneven sampling during the random walk.
You indicate a scenario where the sampling of two populations is jointly skewed. IF this scenario materializes, you may indeed get poor results.
As you will have noticed, nMonte samples with replacement. As a consequence, this particular scenario has a low probability.
But yes it may happen, such is the nature of stochastic algorithms.

Arza said...

@ huijbregts
If an introduction of show_me_randomized_results=True switch wasn't your intention it's a bug rather than a feature. Applying penalty this way causes that the outcome is highly dependant on the first batch and on the order in which samples are offered to the algorithm. Both are random, so the effect of penalty is also random between the runs.



without penalty:

with penalty set to 0.001:

Anthro Survey said...


The qp run is legit, but:
Let me emphasize again that I don't mean overall steppe vs overall EEF signal in Galicia. In this regard, it doesn't differ much from the rest.

I'm talking specifically about ratio of Europe_MLBA(Bronze and Iron age extra-Iberian migrants) vs native EEF. So, places like Portugal and Galicia were potentially more French-like in the times of Hannibal Barca than their Pyreneean or Levante counterparts.

Berber and Circum-East Med admixtures brought additional EEF-like ancestry, after all, diluting the original layers. Qpmodels aren't really designed to extricate composite layers with overlapping ancestry. In fact, if your left pops are too similar, you'll be prone to get crappy models with high standard errors. This is why you'll never see Dave make Levant_N and Anatolia_N as left pops in the same qp run.

Josep Coderch said...

@André de Vasconcelos
What I meant about the Pyrenees is that since mountainous regions are harder to conquer the later invading waves such as indo-europeans and middle eastern-like peoples left there a lesser genetic impact. The same is true for the french side, regions closer to the french Pyrenees being known for higher neolithic ancestry and also higher Y-DNA R1b.

Here is the map I wanted to post yesterday but that I couldn't find about the areas colonized by arabs and berbers:
Clearly the northwest of Iberia was not settled by them so here comes my concern, trying to make sense of historic and arqueological evidence which points out to higher presence of arabs/berbers in the mediterranean coast and genetic evidence which points to higher NA ancestry in the atlantic coast. For this to happen the expulsion of those descended from muslim settlers in the east must have been very effective and the taking and breeding of serfs in the west must have been be very effective too, which overall seems possible but unlikely to me. But anyway as I said before I take your explanation because it seems the most plausible.

As a curiosity while toponyms of muslim origin are very common in the southeast, surnames like moro/mouro, prieto, negro, etc. are instead very common in the northwest.

Josep Coderch said...

@Anthro Survey
Yes, in these times ( indo-european ancestry was higher where indo-european tribes lived, and lower where iberian tribes were present. What reduced the indo-european ancestry in the former is a greater admixture with west asian and north african peoples that came later.

Also the way indo-europeans settled within the territories of the Iberian Peninsula was different. In the center and northwest the settling was in a single (or maybe more) pulse and massive, thus changing the culture and language of the area. In the territories governed by iberian tribes the settling was rather minor, dominated by males and over a longer period of time, thus having a strong impact in the Y-DNA of the population but without changing their language.

Anthro Survey said...


"What reduced the indo-european ancestry in the former is a greater admixture with west asian and north african peoples that came later....

Precisely what I'm thinking right now.

"...dominated by males and over a longer period of time, thus having a strong impact in the Y-DNA of the population but without changing their language."

Yeah a more gradual inflow of Central Europeans into those regions likely meant that no single group at any point in time was able enforce their IE language on the population so they basically had to assimilate. It's also possible, but speculative, that non-Celtic populations were disproportionately more "West Med" look-wise (to their Iberia_Chl ancestry) than Celtic counterparts. Sudden, massive pulses allow foreign alleles to be better established in the new population.

As for non-Celtic areas, Catalonian Iberia seems to have had a more LaTene-like set of weaponry and some parallel Indo-European speech(Sorothaptic hypothesis). So, I'd say Central/NorthWest>Catalonia>Valencia & South > Vasconia in terms of MBLA input.

On the whole, though, seems possible that Hannibal's armies levied from Iberia, Transalpine and Cisalpine Gaul were a lot more similar and French-shifted than previously thought. We'll see.

Anthro Survey said...

Wasn't the Catalonian portion of the Reconquista rather aggressive and entailed far greater degree of resettlement of people from north Catalonia and even Languedoc in many cases?

Josep Coderch said...

Repopulation in Catalonia was not very strong because the muslim/converts population was little so few were expelled. What did happen was that during the muslim invasion people fled to the Pyrenees to take refuge in the mountains and after christians reconquered they returned to the south, so basically the lands were resettled with the same people who inhabited the lands before the muslim conquest. I doubt there was any genetic change at all other than the very minor arabs/berbers that stayed.
I'm not an expert but this is what I recall of what I've read.

Alogo said...

Thanks, nice find.

From a quick look, there they appear as a mix of IAM, Anatolian Neolithic and a little something steppe-related (later "European" ancestry, something Beaker-related to go along with the non V88 R1b the Guanche Y-DNA study found?). From what I recall from the recent Guanche study, they seem to have had slightly less "SSA proper" compared to modern Maghrebis and slightly more European-related ancestry on average. If we see them as a more isolated population that remained more untouched by later events (rather than e.g. just representing a subset with weird founder effects at that) like Anthro mentioned, maybe they'd also be a somewhat better fit for the northwest african part in Iberia.

Much like David's models in OP, the broad strokes seem like IBM + Levant/Natufian-like -> IAM + Iberia? -> KEB + some extra later Euro + sub-Saharan -> more or less the Maghreb (the more East you go, the more something later Near Eastern i.e. extra Caucasus/Iran related seems to appear too)?

I used the basic model I had used in another thread for Extremadura (which doesn't seem to apply equally well everywhere obviously but for consistency's sake) for the rest of Iberia and close regions with just Iberomaurusian and IBM + Mozabite and Sahrawi:

One curious thing is that in this particular model, Galicia, unlike the rest, doesn't seem to get any Mozabite/Sahrawi when IBM is present (hence just one model in the pastebin instead of two). I'm not going to read too much into that other than maybe the Guanche thing above.

General observations from this sort of thing:

- Iberia_BA peaks in Basques and is generally higher in the East
- England_Roman peaks in the Northwest and Catalonia but is generally quite similar except for the Basques where it's lower
- Anatolia_BA peaks in the south and the Baleares
- Levant_BA peaks in the southeast and Galicia
- North African stuff (IBM or IBM + Mozabite-Sahrawi), after Canarians, peaks in the West
- Yoruba after the expected peak in the Canarias again has trace amounts elsewhere

Not too different from what you've all been discussing, I think.

huijbregts said...

nMonte performs a successive approximation.
After the first few cycles both the temporary result and the penalty are way off, but after a few hundred cycles both should be good approximations.
It is important that there are no systematic differences in the random samples during the random walk. this is guaranteed by sampling with replacement.

Anthro Survey said...


Oh, for Catalonia proper, yeah, but does this hold for the southern Catalo-sphere, though?---Valencia and Baleares.


Good stuff. Your models confirm what I've been getting. England_Roman:Iberia_BA ratios are clearly highest in former Celtic zones(Galicia, Portugal, Castilla and Extremadura) and lowest in Basques, suggesting a higher degree of 'Central European' influence in greater Gallaecia. Aragon and Catalonia occupy an intermediate position, former leaning towards Basques and latter towards NW. Man, 2D PCA data really belies this and Maju kinda has a point. As for the Moorish influence---I think your models using only IBM are more robust even if it's difficult to directly assess NA input in this way. Use of Mozabite forces in extra SSA admixture for which the models attempt to compensate in hard-to-predict ways. Though, the said ratios are still similar and fits are good nonetheless. Yeah, Anatolia_BA peaks in Roman era metropolitan zones.

As for Guanche-----Keb should be a better proxy, imo. Seems Guanche were a bit TOO isolated, as suggested by their more conservative PCA position and ADMIXTURE reads. KEB might be more reflective of the sum total eastern input in the area right up to the Roman times(and, hence, of Berber conquerors).

Arza said...

@ huijbregts
You made 3 errors.

1. You're applying penalty in unfair conditions that depend on the order of sampling (ratio of penalty/distance changes over time).

2. You're not applying penalty belonging to the sample that is already in the mix when you compare the old one with the new one. Applying it would create equal conditions for both samples and it would mitigate point 1.

3. The most important one - you are applying penalty from the previous step (b-1, via eval1) to the sample that is already in the mix (b). Basically sample in the mix receives completely random penalty!

Arza said...

bugfix pull request

# iniatialize objective function
colM1 <- colMeans(matAdmix)
sumcolM1 <- sum(colM1^2)
#eval1 <- (1+pen) * sum(colM1^2)
# Ncycles iterations
for (c in 1:Ncycles) {
__# fill batch data
__dumPop <- sample(1:Ndata, Nbatch, replace=T)
__dumAdmix <- dif2targ[dumPop,]
__# loop thru batch
__# penalty is squared distance of sample to target
__# objective function =
__#__ squared dist of batch mean to target + coef*penalty
__# minimize objective function
__for (b in 1:Nbatch) {
____eval1 <- sumcolM1 + pen*sum(matAdmix[b, ]^2)
____# test alternative pop
____store <- matAdmix[b,]
____matAdmix[b,] <- dumAdmix[b,]
____colM2 <- colMeans(matAdmix)
____sumcolM2 <- sum(colM2^2)
____eval2 <- sumcolM2 + pen*sum(matAdmix[b, ]^2)
____# conditional adjust
____if (eval2 <= eval1) {
______matPop[b] <- dumPop[b]
______#colM1 <- colM2
______#eval1 <- eval2
______sumcolM1 <- sumcolM2
____} else {matAdmix[b,] <- store}
__} # end batch
} # end cycles

Matt said...

Had a chance to pick up Global25 data, focusing on visualization and distances rather than fits.

A few plots of raw distance from the Natufian average vs the Iberomaurasian average:

African populations are deciedly closer to Iberomaurasian and West Eurasians to Natufians. But I was surprised to see that in the dimensions defined by G25 (scaled), Papuans and Native Americans are almost equidistant to the two populations (very slightly closer to Natufian).
There doesn't seem like a hell of a lot of distance between East and West Africa in terms of how far populations are "above the line" (e.g. more related to Iberomaurasian than Natufian).

Looking at comparisons using Somali,Natufian+Iberomaurasian or Natufian,Yoruba+Iberomaurasian, it looks like Iberomaurasian is slightly closer to modern North Africans than simple Natufian+Somali / Natufian+Yoruba mixes, but also very slightly closer to early generalized Eurasians (Ust Ishim, Tianyuan, Goyet): This is very, very slight.

Similar thing comparing the best fit nMonte combination of Yoruba and ancient ME (Levant / Natufian / Iran_N allowed) to real Iberomaurasian:

(Pastebin for simple nMonte fits:

Arza said...

Test case as in the comments above:
[1] "penalty= 0.001"
[1] "Ncycles= 1000"
[1] "distance%=0"

Capra's model:

Ethiopia_4500BP _____Dinka _Iberomaurusian __Levant_N
_______23.81712 __29.86112 _______30.65544 __41.96868

[1] "penalty= 0.001"
[1] "Ncycles= 1000"
[1] "distance%=7.9267"

^^^ Levant_N receives biggest penalty so its score drops, Dinka disappears because it was a counterbalance for Levant_N.

All fully repeatable between the runs.

Some results from North European spreadsheet (all modern samples used):
[1] "penalty= 0.001"
[1] "Ncycles= 1000"




Josep Coderch said...

@Anthro Survey
For Balears almost all the settlers came from Catalonia and in València there were catalans, aragonese and castilians, in this order from the most numerous to the least. Foreigners were present but very minor everywhere.

Repopulations after the reconquista is another matter worth discussing about because according to the last paper the genetics going north to south are very homogeneous but according to historic records only 4% of the total population was expelled so even if we assume there was a 100% replacement it isn't much to change the overall genetic makeup of the population. València was where the expulsion was hardest felt though, losing 1/3 of its population.

Btw catalans settling the Balearic Islands came from the coast of Girona (earliest roman presence in Iberia) while those going to València came from Lleida (westernmost part of Catalonia), which might explain in part why Balears is more Anatolia_BA-like and València more Iberia_ChL-like. Other than this it is also important to considerate the old fenician presence in Balears and the western catalan and aragonese repopulation of València.

Unknown said...

>Did you include "KHOISAN" as a possible source population for North Africans? Please try that if you haven't... I bet the fits improve

BBayA sample should be used for that, instead of modern Khoisan with Eurasian and Bantu admixtures.

Unknown said...


Yemenite_Mahra 74.00
Iberomaurusian 17.75
Iberia_EN 8.25

Based on this it looks like Soqotris and Mehris are mostly Natufian and most basal modern populations?

Could you test if Mahra takes any South Asian or prehistoric and modern East African?

Ryukendo K said...

@ Matt

Don't you think this is because its PCA, so you may get fst-like effects, i.e. drift is represented in a nonlinear way and cannot be summed?

Anthro Survey said...


Ah, I see. Yeah, I recall reading something about extensive re-population of Valencia region once. Nice to see your expertise in the matter corroborate it.
How about settlers from further north(Languedoc, etc.)? Not as significant? I recall looking at a surname map of Spain once and some rather Occitan-specific surnames popped up in Valencia. I'll try to remember and/or pull it up again.

Btw, we ought to partake in some sort of a mini Gallo-Iberian project in the future now that we have a steady stream of ancient DNA coming from France, North Italy, and Iberia.

Matt said...

@Ryukendo, can you expand on that a little? (What is "like fst"?)

huijbregts said...

I will study #2 and #3.
#1 is not true. The order of sampling is completely random. The penalty changes over time as the temporary estimation converges to the final estimation. This is how it should be.

Unknown said...

New study came out that says Basal Eurasians diverged 80,000 years ago and Neolithic Anatolians only had 10% BE. They also released a new tool that is supposed to be more accurate than simple tree-like models such as D and F stats.

Unknown said...

Little correction: EEF had ~10% BE, not earlier Anatolians.

huijbregts said...

Your issues are all about the question whether the samples are penalized in the correct way.
I wonder whether you understand the idea behind the nMonte algorithm.
There are no bad samples that should be penalized. It is the mixture that can have bad frequencies. And it is the mixture that should be optimized and penalized. That is quite different from 'sample in the mix receives completely random penalty'.

Unknown said...

>"But I can tell you that on such global plots Denisovans and Neanderthals cluster with Sub-Saharan Africans in dimensions 1 & 2."

It's possible to make Neanderthals and Denisovans plot pretty far from Africans.

Juan R. said...

From Western Andalusia in South of Spain. Next is my G25.